-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regexp.exec is expensive after the es6 branch #201
Comments
Could you provide some test cases? The heavy usage of runes() seems a bit odd as it would only be used on the first match, all subsequent matches would be done with a regexp2Wrapper which does not use this method. I've implemented caching of the runes and posMap only to find out that it makes very little difference in the benchmarks. In order for it to be noticeable it has to be a really long string and a really long match string. |
Is the test case provided, not good enough, I can come up with something bigger if needed ;) . For me at least profilling that benchmarks shows
Looking at the code, every time exec is called it will call |
For some reason I completely missed the test case... Anyway, I've pushed a few commits that should bring it on par with es5 (and in some cases improve it). My benchmarks show this (a695b0c vs master):
|
I found one more ... performance problem with the new regexp that is being hit by k6 users ...
Apperantly the way
matchAll
is being polyfilled is to callexec
repeatedly which is fine with small inputs or small calls to it, but with bigger inputs (especially combined with multiple matches) it means that goja will callrunes()
multitude of times generating the[]rune
representation of the string a lot :(.This is even bigger problem for k6 as the way it works is to run multiple (hundreds, sometimes thousands) goja VMs each running the same script, which likely will hit the regex at the same time. One particular case(that provoked this investigation of mine) is that this goes from 200-300 VMs can do this in parallel over half a second to the same can't actually finish doing it over 1-2 minutes and instead eat all the ram :(
You can see result from benchmarks here compared to before es6.
I don't really have any great ideas :D, but here they are in .. some order:
[]rune
thing is needed in order to run utf16 regexp expressions ... but I would expect most wouldn't actually need that ... or maybe I am just completely misunderstanding the reason. If I am not though, checking whether it is required is likely going to be a lot less expensive ?runes()
method a lot, maybe it will be a good idea to cache it's output? either in the string representation or somewhere else?matchAll
will remove at least one of the cases where this is a possible problem, but at least for k6, this was actually provoked by a client code that was just using exec and implementing something like matchAll, so that won't help in that case, but arguably is the most straight forward fixp.s. I have generally been in the camp of "people are overusing regexes" and this past few issues have not changed that ... but apperantly people just want to use regexes :(
The text was updated successfully, but these errors were encountered: