-
-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in props parsing #3193
Comments
Sorry I accidentaly hit post before text was finished. Please see modified text above. |
Hi @ed2050, thanks for reporting this issue! Actually, we originally used shlex for parsing props, but replaced it with regex for performance reasons: #341. Since the Just for the record: A trivial workaround would be to use quotes around the value, like |
I just experimented with this simple implementation (see "props" branch): @staticmethod
def _parse_props(text: Optional[str]) -> Dict[str, Any]:
dictionary: dict[str, Any] = {}
for token in shlex.split(text or ''):
words = token.split('=', 1)
dictionary[words[0]] = True if len(words) == 1 else words[1]
return dictionary This is around 50% slower for long strings like
|
I think I fixed the problem by adding "=" to the list of allowed characters in an unquoted props string. |
This post contains a modest improvement if you want even more speedup to your regex solution (2x on simple strings). _parse_props code at end. TimingsI did timing tests for shlex vs nicegui regex. Looks like a 25x speed difference on my machine: > python3 -m timeit -s 'import shlex ; text = """src=foo=bar.jpg href="foo bar.jpg" rab=oof"""' -c 'shlex.split (text)'
5000 loops, best of 5: 49 usec per loop
> python3 -m timeit -s 'import nicegui.element as e; regex = e.PROPS_PATTERN ; text = """src=foo=bar.jpg href="foo bar.jpg" rab=oof""" ' -c 'regex.search (text).groups ()'
200000 loops, best of 5: 2 usec per loop Interesting that your tests with current regex version have closed that gap. Only about 2:1 difference for long strings and 5:1 for short. But you prob measured complete function not just splitting the string. ImprovementSpeeding up shlex doesn't appear to be possible. At least without lib hacking. If you really want a speedup, you can handle simple cases separately. Checking a string for a quote char is very fast: 25 ns for a 100 char string on my machine. It's much faster to check for quote and use str.split () if none present. Otherwise fall back to regex as desired. if '"' not in text :
words = text.split ()
else :
words = [ m.groups () for m in PROPS_PATTERN.finditer (text) ] This speeds up splitting simple cases by 10x: # has quotes, parse with regex
> python3 -m timeit -s 'import nicegui.element as e; text = """src=foo=bar.jpg href="foo bar.jpg" rab=oof""" ; t2 = "foo=bar rab=oof" ; regex = e.PROPS_PATTERN ; quote = """_"_""" [1]' -c 'quote in text and regex.search (text).groups () or text.split ()'
200000 loops, best of 5: 1.98 usec per loop
# no quotes, parse with str.split
> python3 -m timeit -s 'import nicegui.element as e; text = "foo=bar rab=oof" ; regex = e.PROPS_PATTERN ; quote = """_"_""" [1]' -c 'quote in text and regex.search (text).groups () or text.split ()'
2000000 loops, best of 5: 186 nsec per loop Is it worth it? Well... maybe. By the time you add all the other code in _parse_props, the difference is only 2x on simple strings (1.7 us per call vs 3.3 us). Significant, but _parse_props is probably not a bottleneck in the overall app. But you were concerned about a 2x difference between shlex and regex, so maybe. ImplementationHere's an implemention of _parse_props with this speedup. It's 2x faster on simple strings (no quotes) by skipping the regex. On strings with quotes, speed is unchanged. def _parse_props (text) :
props = {}
# first split text into list of pairs
if '"' in text :
# get match groups from regex and keep values that aren't None
pairs = [ [ x for x in w.groups () if x ] for w in e.PROPS_PATTERN.finditer (text) ]
else :
pairs = [ t.split ('=', 1) for t in text.split () ]
# now transform pairs into props dict
for key, *value in pairs :
value = value and value [0] or '' # handle empty values
if value and value.startswith ('"') and value.endswith ('"') :
value = json.loads(value)
props [key] = value or True
return props |
That's unfortunate. I hate when the clear, simple, obvious solution isn't fast enough. |
Thanks for your detailed investigation, @ed2050! |
Description
There's a bug in props parsing. Looks like a flaw in PROPS_PATTERN regex of element.py.
Equal signs are parsed incorrectly. Equals in the value portion of "key=value" are split and key is thrown away. This can be triggered by passing a filename with equal sign in the value.
I suggest using shlex instead of a regex, as explained below:
Example
The src attribute is thrown away and the filename is incorrectly split into a key=value pair, which is then added to the
<img>
tag.This behavior is incorrect because the docs specify that props takes "whitespace-delimited list of either boolean values or key=value pair to add". But values are not just whitespace delimited in the code. They are also split on equal sign, even when = is part of the value.
Suggestions
Regex is probably the wrong approach for this task. You're better off using string split so you can do
key, value = text.split ('=', 1)
after splitting the arg on whitespace. But that gets complicated by first detecting matching quotes for values that contain whitespace.A better approach is to use shlex to split the first stage. Shlex respects and removes quotes. For instance:
In each case shlex removes the quotes, making second step trivial.
The text was updated successfully, but these errors were encountered: