Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPE on specific regex #105

Open
ludovicianul opened this issue Mar 8, 2024 · 3 comments
Open

NPE on specific regex #105

ludovicianul opened this issue Mar 8, 2024 · 3 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@ludovicianul
Copy link

Describe the bug
I get a NPE with the following regex ^[^\p{C}\p{Z}\s]*[^\p{C}]+[^\p{C}\p{Z}]*$ which didn't happen on version 1.4.

To Reproduce
Steps to reproduce the behavior:

  1. regex: ^[^\p{C}\p{Z}\s]*[^\p{C}]+[^\p{C}\p{Z}]*$
  2. RgxGen.parse("^[^\\p{C}\\p{Z}\\s]*[^\\p{C}]+[^\\p{C}\\p{Z}]*$").generate()
  3. See error below
	at com.github.curiousoddman.rgxgen.nodes.SymbolSet.ofUnicodeCharacterClass(SymbolSet.java:72)
	at com.github.curiousoddman.rgxgen.parsing.dflt.DefaultTreeBuilder.createUnicodeSymbolSetNode(DefaultTreeBuilder.java:463)
	at com.github.curiousoddman.rgxgen.parsing.dflt.DefaultTreeBuilder.handleEscapedCharacter(DefaultTreeBuilder.java:410)
	at com.github.curiousoddman.rgxgen.parsing.dflt.DefaultTreeBuilder.handleBackslashInsideSquareBrackets(DefaultTreeBuilder.java:671)
	at com.github.curiousoddman.rgxgen.parsing.dflt.DefaultTreeBuilder.handleSquareBrackets(DefaultTreeBuilder.java:631)
	at com.github.curiousoddman.rgxgen.parsing.dflt.DefaultTreeBuilder.parseGroup(DefaultTreeBuilder.java:228)
	at com.github.curiousoddman.rgxgen.parsing.dflt.DefaultTreeBuilder.build(DefaultTreeBuilder.java:732)
	at com.github.curiousoddman.rgxgen.parsing.dflt.DefaultTreeBuilder.get(DefaultTreeBuilder.java:741)
	at com.github.curiousoddman.rgxgen.RgxGen.<init>(RgxGen.java:65)
	at com.github.curiousoddman.rgxgen.RgxGen.parse(RgxGen.java:59)
	at com.github.curiousoddman.rgxgen.RgxGen.parse(RgxGen.java:48)

Expected behavior
Property generate a value.

Environment (please complete the following information):

  • MacOS
  • JDK 21
  • RgxGen Version 2.0

Additional context
It works with RgxGen 1.4.

@ludovicianul ludovicianul added the bug Something isn't working label Mar 8, 2024
@curious-odd-man
Copy link
Owner

I will look into that. Though I am curious which results did you get in 1.4, because character classes were not supported in 1.4 at all. So the results could not have been correct anyway.

@curious-odd-man
Copy link
Owner

Ok. I see that this category (\p{C}) is not yet supported :

// OTHER(keys("C", "Other"), "invisible control characters and unused code points.", asList(C1_CONTROLS, UNUSED_CODEPOINT_1, UNUSED_CODEPOINTS_2, UNUSED_CODEPOINTS_3, UNUSED_CODEPOINTS_4, UNUSED_CODEPOINTS_5, UNUSED_CODEPOINTS_6, range('׵', '؅'), range('؜', '؝'), range('܎', '܏'), UNUSED_CODEPOINTS_7, UNUSED_CODEPOINTS_8, UNUSED_CODEPOINTS_9, UNUSED_CODEPOINTS_10, UNUSED_CODEPOINTS_11, UNUSED_CODEPOINTS_12, UNUSED_CODEPOINTS_13, UNUSED_CODEPOINTS_14, UNUSED_CODEPOINTS_15, UNUSED_CODEPOINTS_16, UNUSED_CODEPOINTS_17, UNUSED_CODEPOINTS_18, UNUSED_CODEPOINTS_19, UNUSED_CODEPOINTS_20, UNUSED_CODEPOINTS_21, UNUSED_CODEPOINTS_22, RANGE_470, RANGE_16, RANGE_147, RANGE_296, RANGE_490, RANGE_610, RANGE_724, RANGE_80, RANGE_361, RANGE_122, RANGE_117, RANGE_542, RANGE_607, RANGE_280, RANGE_585, RANGE_428, RANGE_515, RANGE_659, RANGE_161, RANGE_256, RANGE_364, RANGE_579, RANGE_104, RANGE_531, RANGE_204, RANGE_439, RANGE_645, RANGE_25, RANGE_156, RANGE_479, RANGE_668, RANGE_174, RANGE_241, RANGE_390, RANGE_408, RANGE_297, RANGE_726, RANGE_276, RANGE_462, RANGE_725, RANGE_118, RANGE_546, RANGE_0, RANGE_281, RANGE_604, RANGE_400, RANGE_106, RANGE_700, RANGE_460, RANGE_567, RANGE_20, RANGE_119, RANGE_565, RANGE_169, RANGE_285, RANGE_322, RANGE_309, RANGE_442, RANGE_573, RANGE_632, RANGE_459, RANGE_202, RANGE_544, RANGE_70, RANGE_209, RANGE_292, RANGE_465, RANGE_414, RANGE_543, RANGE_728, RANGE_344, RANGE_633, RANGE_28, RANGE_369, RANGE_615, RANGE_634, RANGE_630, RANGE_523, RANGE_303, RANGE_472, RANGE_224, RANGE_732, RANGE_600, RANGE_595, RANGE_489, RANGE_448, RANGE_519, RANGE_45, RANGE_382, range('᠎', '᠏'), RANGE_349, RANGE_162, RANGE_525, RANGE_669, RANGE_363, RANGE_706, RANGE_79, RANGE_316, RANGE_463, RANGE_183, RANGE_94, RANGE_454, RANGE_380, RANGE_271, RANGE_550, RANGE_157, RANGE_590, RANGE_242, RANGE_320, RANGE_628, RANGE_267, RANGE_260, RANGE_636, RANGE_548, RANGE_416, RANGE_15, RANGE_614, RANGE_66, RANGE_192, RANGE_362, RANGE_652, RANGE_308, RANGE_172, RANGE_19, RANGE_686, range('⁠', ''), RANGE_33, RANGE_223, RANGE_249, RANGE_568, RANGE_228, RANGE_625, RANGE_667, RANGE_438, RANGE_443, RANGE_480, RANGE_293, RANGE_176, RANGE_617, RANGE_279, RANGE_405, RANGE_187, RANGE_377, RANGE_464, RANGE_655, RANGE_627, RANGE_355, RANGE_431, RANGE_96, RANGE_148, RANGE_422, RANGE_510, RANGE_649, RANGE_397, RANGE_87, RANGE_597, RANGE_401, RANGE_721, RANGE_730, RANGE_447, RANGE_717, RANGE_284, RANGE_181, RANGE_381, RANGE_58, RANGE_103, RANGE_505, RANGE_613, RANGE_421, RANGE_245, RANGE_714, RANGE_240, RANGE_307, RANGE_709, RANGE_287, RANGE_478, RANGE_656, RANGE_152, RANGE_123, RANGE_373, RANGE_12, RANGE_44, range('퟼', ''), RANGE_676, RANGE_93, RANGE_321, RANGE_641, RANGE_646, RANGE_60, RANGE_315, RANGE_62, RANGE_467, RANGE_334, RANGE_622, range('﻽', '＀'), RANGE_589, RANGE_27, RANGE_227, RANGE_368, RANGE_497, range('￯', '')), new char[]{'­', '΋', '΍', '΢', '԰', 'ՠ', 'ֈ', '֐', '۝', '࠿', '࡟', 'ࢵ', '࣢', '঄', '঩', '঱', '৞', '਄', '਩', '਱', '਴', '਷', '਽', '੝', '઄', '઎', '઒', '઩', '઱', '઴', '૆', '૊', '଀', '଄', '଩', '଱', '଴', '୞', '஄', '஑', '஛', '஝', '௉', 'ఄ', '఍', '఑', '఩', '౅', '౉', '౗', '಄', '಍', '಑', '಩', '಴', '೅', '೉', '೟', '೰', 'ഄ', '഍', '഑', '൅', '൉', '඄', '඲', '඼', '෕', '෗', '຃', 'ຉ', 'ຘ', 'ຠ', '຤', '຦', 'ຬ', '຺', '໅', '໇', '཈', '྘', '྽', '࿍', '჆', '቉', '቗', '቙', '኉', '኱', '኿', '዁', '዗', '጑', 'ᜍ', '᝭', '᝱', '᤟', '᩟', '᷺', '὘', '὚', '὜', '὞', '᾵', '῅', '῜', '῵', '῿', '₏', '⯉', 'Ⱟ', 'ⱟ', '⴦', '⶧', '⶯', '⶷', '⶿', '⷇', '⷏', '⷗', '⷟', '⺚', '぀', '㆏', '㈟', 'ꞯ', '꧎', '꧿', '꬧', '꬯', '﬷', '﬽', '﬿', '﭂', '﭅', '﹓', '﹧', '﹵', '￧', '�'}),

I will try to add it.

@ludovicianul
Copy link
Author

ludovicianul commented Mar 11, 2024

With 1.4 I get something like the below which passes the matches test:

System.out.println("%u@O2L1K\"jn:;3e&(e<QN5q4HeN'A:(\\c^F.03-vyvKW5zW/u~h,(aSyGHPKIA);XNY#v@[hEh(Og=NEVq;0+VN#sv\\wOVE<t[RNo(Tg'nFk)\\GZ]o%|~d3KNi?J\"l2bGO#7E*BQo 8qS25\\A<8yLO<JmmP)ABDr^#kZ[$8B^R=hFMBec,;IdXF!Q[vVe|O_5]1#-$d(N#+2Vh'HbB[#M8]T6,P6VJ3sT|NGOe=5k'OO*Asrds%*@XAy".matches("^[^\\p{C}\\p{Z}\\s]*[^\\p{C}]+[^\\p{C}\\p{Z}]*$"));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants