Bug in props parsing #3193

ed2050 · 2024-06-09T14:36:25Z

Description

There's a bug in props parsing. Looks like a flaw in PROPS_PATTERN regex of element.py.

Equal signs are parsed incorrectly. Equals in the value portion of "key=value" are split and key is thrown away. This can be triggered by passing a filename with equal sign in the value.

I suggest using shlex instead of a regex, as explained below:

def props (add) :
    for token in shlex.split (add) :
        key, value = token.split ('=', 1)
        ... now do whatever props does with key and value

Example

imgfile = '007_eXfm=YuqZ.jpg'
ui.element ('img').props (f'src={ imgfile }')
# produces img with no src attribute : 
# <img 007_eXfm="YuqZ.jpg">

# regex in props() drops the "src" part :
import nicegui.element as e
e.PROPS_PATTERN.search ('src=foo=bar.jpg').groups ()
> ('foo', None, None, 'bar.jpg')

The src attribute is thrown away and the filename is incorrectly split into a key=value pair, which is then added to the <img> tag.

This behavior is incorrect because the docs specify that props takes "whitespace-delimited list of either boolean values or key=value pair to add". But values are not just whitespace delimited in the code. They are also split on equal sign, even when = is part of the value.

Suggestions

Fix the code to split on whitespace as described in docs
Don't throw away key when value contains =
Update the docs to say that props takes a "whitespace-delimited string of ...". Saying it takes a list suggests a python list, which is misleading.

Regex is probably the wrong approach for this task. You're better off using string split so you can do key, value = text.split ('=', 1) after splitting the arg on whitespace. But that gets complicated by first detecting matching quotes for values that contain whitespace.

A better approach is to use shlex to split the first stage. Shlex respects and removes quotes. For instance:

>>> shlex.split ('src=foo=bar.jpg')
['src=foo=bar.jpg']

>>> shlex.split ('src="foo bar.jpg"')
['src=foo bar.jpg']

>>> shlex.split ("src='foo bar.jpg'")
['src=foo bar.jpg']

In each case shlex removes the quotes, making second step trivial.

def props (add) :
    for token in shlex.split (add) :
        key, value = token.split ('=', 1)
        ... now do whatever props does with key and value

The text was updated successfully, but these errors were encountered:

ed2050 · 2024-06-09T14:44:27Z

Sorry I accidentaly hit post before text was finished. Please see modified text above.

falkoschindler · 2024-06-10T11:12:52Z

Hi @ed2050, thanks for reporting this issue!

Actually, we originally used shlex for parsing props, but replaced it with regex for performance reasons: #341. Since the .props() method is potentially called thousands of times, performance does matter quite a lot. So even though I'd like to fix the bug with equal signs in prop values, it might be better to improve the regex rather than switching to shlex. But maybe there's also a way to reach similar performance with shlex. I'm totally open for suggestions.

Just for the record: A trivial workaround would be to use quotes around the value, like .props(f'src="{ imgfile }"').

falkoschindler · 2024-06-10T11:43:32Z

I just experimented with this simple implementation (see "props" branch):

@staticmethod
def _parse_props(text: Optional[str]) -> Dict[str, Any]:
    dictionary: dict[str, Any] = {}
    for token in shlex.split(text or ''):
        words = token.split('=', 1)
        dictionary[words[0]] = True if len(words) == 1 else words[1]
    return dictionary

This is around 50% slower for long strings like 'dark color=red label="First Name" hint="Your \"given\" name" input-style="{ color: #ff0000 }"', and even 80% slower for short strings like 'dark':

       short  long
shlex  0.030  0.260
regex  0.006  0.130

falkoschindler · 2024-06-10T11:52:40Z

I think I fixed the problem by adding "=" to the list of allowed characters in an unquoted props string.

ed2050 · 2024-06-10T13:14:05Z

This post contains a modest improvement if you want even more speedup to your regex solution (2x on simple strings). _parse_props code at end.

Timings

I did timing tests for shlex vs nicegui regex. Looks like a 25x speed difference on my machine:

> python3 -m timeit -s 'import shlex ; text = """src=foo=bar.jpg href="foo bar.jpg" rab=oof"""' -c 'shlex.split (text)'
5000 loops, best of 5: 49 usec per loop

> python3 -m timeit -s 'import nicegui.element as e; regex = e.PROPS_PATTERN ; text = """src=foo=bar.jpg href="foo bar.jpg" rab=oof""" ' -c 'regex.search (text).groups ()'
200000 loops, best of 5: 2 usec per loop

Interesting that your tests with current regex version have closed that gap. Only about 2:1 difference for long strings and 5:1 for short. But you prob measured complete function not just splitting the string.

Improvement

Speeding up shlex doesn't appear to be possible. At least without lib hacking.

If you really want a speedup, you can handle simple cases separately. Checking a string for a quote char is very fast: 25 ns for a 100 char string on my machine. It's much faster to check for quote and use str.split () if none present. Otherwise fall back to regex as desired.

if '"' not in text :
    words = text.split ()
else :
    words = [ m.groups () for m in PROPS_PATTERN.finditer (text) ]

This speeds up splitting simple cases by 10x:

# has quotes, parse with regex
> python3 -m timeit -s 'import nicegui.element as e; text = """src=foo=bar.jpg href="foo bar.jpg" rab=oof""" ; t2 = "foo=bar rab=oof" ; regex = e.PROPS_PATTERN ; quote = """_"_""" [1]' -c 'quote in text and regex.search (text).groups () or text.split ()'
200000 loops, best of 5: 1.98 usec per loop

# no quotes, parse with str.split
> python3 -m timeit -s 'import nicegui.element as e; text = "foo=bar rab=oof" ; regex = e.PROPS_PATTERN ; quote = """_"_""" [1]' -c 'quote in text and regex.search (text).groups () or text.split ()'
2000000 loops, best of 5: 186 nsec per loop

Is it worth it? Well... maybe. By the time you add all the other code in _parse_props, the difference is only 2x on simple strings (1.7 us per call vs 3.3 us). Significant, but _parse_props is probably not a bottleneck in the overall app. But you were concerned about a 2x difference between shlex and regex, so maybe.

Implementation

Here's an implemention of _parse_props with this speedup. It's 2x faster on simple strings (no quotes) by skipping the regex. On strings with quotes, speed is unchanged.

def _parse_props (text) :

    props = {}
    # first split text into list of pairs
    if '"' in text :
        # get match groups from regex and keep values that aren't None
        pairs = [ [ x for x in w.groups () if x ] for w in e.PROPS_PATTERN.finditer (text) ]
    else :
        pairs = [ t.split ('=', 1) for t in text.split () ]

    # now transform pairs into props dict
    for key, *value in pairs :
        
        value = value and value [0] or ''    # handle empty values
        if value and value.startswith ('"') and value.endswith ('"') :
            value = json.loads(value)
        props [key] = value or True
        
    return props

ed2050 · 2024-06-10T13:14:59Z

@falkoschindler

Actually, we originally used shlex for parsing props, but replaced it with regex for performance reasons: #341.

That's unfortunate. I hate when the clear, simple, obvious solution isn't fast enough.
Regexes are tricky. I've used and abused them a lot. Very easy for bugs to sneak in.

falkoschindler · 2024-06-13T12:27:54Z

Thanks for your detailed investigation, @ed2050!
We'll leave the implementation as it is for now, since it is working fine and the performance is good enough. Using a simple .split() for strings without quotes is an interesting idea though. 👍🏻

falkoschindler added help wanted Extra attention is needed bug Something isn't working labels Jun 10, 2024

falkoschindler closed this as completed in be3803b Jun 10, 2024

falkoschindler removed the help wanted Extra attention is needed label Jun 10, 2024

falkoschindler added this to the 1.4.27 milestone Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in props parsing #3193

Bug in props parsing #3193

ed2050 commented Jun 9, 2024 •

edited

Loading

ed2050 commented Jun 9, 2024

falkoschindler commented Jun 10, 2024

falkoschindler commented Jun 10, 2024 •

edited

Loading

falkoschindler commented Jun 10, 2024

ed2050 commented Jun 10, 2024 •

edited

Loading

ed2050 commented Jun 10, 2024

falkoschindler commented Jun 13, 2024

Bug in props parsing #3193

Bug in props parsing #3193

Comments

ed2050 commented Jun 9, 2024 • edited Loading

Description

Example

Suggestions

ed2050 commented Jun 9, 2024

falkoschindler commented Jun 10, 2024

falkoschindler commented Jun 10, 2024 • edited Loading

falkoschindler commented Jun 10, 2024

ed2050 commented Jun 10, 2024 • edited Loading

Timings

Improvement

Implementation

ed2050 commented Jun 10, 2024

falkoschindler commented Jun 13, 2024

ed2050 commented Jun 9, 2024 •

edited

Loading

falkoschindler commented Jun 10, 2024 •

edited

Loading

ed2050 commented Jun 10, 2024 •

edited

Loading