Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't reallocate pcre2_match_data for every exec, you just need it once in comp #18

Closed
wants to merge 2 commits into from

Conversation

avar
Copy link
Contributor

@avar avar commented Apr 11, 2017

See the linked mailing list thread on the second patch. This presumably makes things faster, maybe my use of malloc/free here should be amended to use some xmalloc/perl malloc function.

This enables us to carry around more info around with our pattern, to
come in the next commit.
As pointed out in a mailing list thread on pcre-dev[1] the match data
should be allocated once per pattern and reused for each match, not
allocated N times for each exec. I haven't benchmarked things but this
doubtless speeds things up.

1. https://lists.exim.org/lurker/message/20170410.141338.5219e80f.en.html
@rurban
Copy link
Owner

rurban commented Apr 11, 2017

Yes, this is how we should do it.
I already started with something like that in Hyperscan.

We also need a 2nd compiled pcre2_code in pprivate for ascii/utf8 matches. The compiler doesn't know if the pattern will be used in ascii or utf8 context, so it will make a guess, and recompile if the code is missing. See #15

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling fb668c1 on avar:avar/match-data-comp-alloc into 30207d5 on rurban:master.

@rurban
Copy link
Owner

rurban commented Apr 12, 2017

See the match-data-comp-alloc branch, and #19
There is one regression with t/perl/regexp.t 1840

@avar
Copy link
Contributor Author

avar commented Apr 12, 2017

@rurban Yeah no idea what that regression is about. FWIW I tested this on 5.14.0, ancient, I know. I don't have time to debug this, just hacked this up since it seemed like an easy win. So just closing this request since it has a regression, but maybe the idea can be salvaged in some other form.

@avar avar closed this Apr 12, 2017
@avar
Copy link
Contributor Author

avar commented Apr 12, 2017

On a closer look this seems to be a general issue with re::engine::PCRE2's bugs related to mixed UTF8 & non-UTF8 patterns. Maybe it's worth merging this after all, since there's still a bunch of regressions related to that, and this one extra failure will be fixed when the general bug is fixed, i.e. we'll have both UTF-8 & non-UTF-8 pattern & match data in the pprivate struct & switch between them in some cases.

@rurban
Copy link
Owner

rurban commented Apr 12, 2017

I keep it as proper branch, and I'm investigating. Strange side effect.

@rurban
Copy link
Owner

rurban commented Apr 12, 2017

Yes, most likely. I'm looking into how study is implemented today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants