Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug option #116

Closed
paulochf opened this issue Feb 14, 2022 · 3 comments
Closed

Debug option #116

paulochf opened this issue Feb 14, 2022 · 3 comments

Comments

@paulochf
Copy link

Is your feature request related to a problem? Please describe.
I tried to use the library to parse HQL DDLs, but some of them I got the error below

---------------------------------------------------------------------------
DDLParserError                            Traceback (most recent call last)
/var/folders/gv/rh_2w83x16s3t1bkll6gd9ym0000gq/T/ipykernel_7778/1646707155.py in <module>
      2     print("="*40, tbl_name)
      3     if tbl_file["parse"] != "PARSE" or tbl_name not in {"silver.hvc_general", "silver.hvc_relog", "silver.hvc_telemetry"}: continue
----> 4     contents = parse_from_file(tbl_file["path"], output_mode="hql")
      5 
      6 

~/.pyenv/versions/3.7.10/envs/lab/lib/python3.7/site-packages/simple_ddl_parser/ddl_parser.py in parse_from_file(file_path, **kwargs)
    203     """get useful data from ddl"""
    204     with open(file_path, "r") as df:
--> 205         return DDLParser(df.read()).run(file_path=file_path, **kwargs)

~/.pyenv/versions/3.7.10/envs/lab/lib/python3.7/site-packages/simple_ddl_parser/parser.py in run(self, dump, dump_path, file_path, output_mode, group_by_type, json_dump)
    270             Dict == one entity from ddl - one table or sequence or type.
    271         """
--> 272         self.tables = self.parse_data()
    273         self.tables = result_format(self.tables, output_mode, group_by_type)
    274         if dump:

~/.pyenv/versions/3.7.10/envs/lab/lib/python3.7/site-packages/simple_ddl_parser/parser.py in parse_data(self)
    184 
    185         for num, self.line in enumerate(lines):
--> 186             self.process_line(num != len(lines) - 1)
    187         if self.comments:
    188             self.tables.append({"comments": self.comments})

~/.pyenv/versions/3.7.10/envs/lab/lib/python3.7/site-packages/simple_ddl_parser/parser.py in process_line(self, last_line)
    216         self.set_default_flags_in_lexer()
    217 
--> 218         self.process_statement()
    219 
    220     def process_statement(self):

~/.pyenv/versions/3.7.10/envs/lab/lib/python3.7/site-packages/simple_ddl_parser/parser.py in process_statement(self)
    220     def process_statement(self):
    221         if not self.set_line and self.statement:
--> 222             self.parse_statement()
    223         if self.new_statement:
    224             self.statement = self.line

~/.pyenv/versions/3.7.10/envs/lab/lib/python3.7/site-packages/simple_ddl_parser/parser.py in parse_statement(self)
    227 
    228     def parse_statement(self) -> None:
--> 229         _parse_result = yacc.parse(self.statement)
    230         if _parse_result:
    231             self.tables.append(_parse_result)

~/.pyenv/versions/3.7.10/envs/lab/lib/python3.7/site-packages/ply/yacc.py in parse(self, input, lexer, debug, tracking, tokenfunc)
    331             return self.parseopt(input, lexer, debug, tracking, tokenfunc)
    332         else:
--> 333             return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
    334 
    335 

~/.pyenv/versions/3.7.10/envs/lab/lib/python3.7/site-packages/ply/yacc.py in parseopt_notrack(self, input, lexer, debug, tracking, tokenfunc)
   1061                 if not lookahead:
   1062                     if not lookaheadstack:
-> 1063                         lookahead = get_token()     # Get the next token
   1064                     else:
   1065                         lookahead = lookaheadstack.pop()

~/.pyenv/versions/3.7.10/envs/lab/lib/python3.7/site-packages/ply/lex.py in token(self)
    384                     tok.lexpos = lexpos
    385                     self.lexpos = lexpos
--> 386                     newtok = self.lexerrorf(tok)
    387                     if lexpos == self.lexpos:
    388                         # Error method didn't change text position at all. This is an error.

~/.pyenv/versions/3.7.10/envs/lab/lib/python3.7/site-packages/simple_ddl_parser/ddl_parser.py in t_error(self, t)
    193 
    194     def t_error(self, t):
--> 195         raise DDLParserError("Unknown symbol %r" % (t.value[0],))
    196 
    197     def p_error(self, p):

DDLParserError: Unknown symbol "'"

It was hard to find the problem in a big DDL. After finding another shorter example, I could isolate the cause and figured out that the following comment was the issue.

column_name STRING COMMENT 'yada yada yada don’t bla bla bla',   -- the problem was the single stylized quote (&rsquo; in HTML) in "don't".

Hence, some debugging parameters to see which error lexer/yaccer got might be helpful. For example, I could see ply lets you do that.

Describe the solution you'd like
parse_from_file(tbl_file["path"], output_mode="hql", debug=True)

Describe alternatives you've considered
Show the character it caused the problem

Additional context
Add any other context or screenshots about the feature request here.

@xnuinside
Copy link
Owner

@paulochf hi, thanks for reporting the issue, I will take a look today later

@paulochf
Copy link
Author

No rush, @xnuinside . As I showed, I figured out the problem. I only wanted to leave the suggestion.

@xnuinside
Copy link
Owner

@paulochf released in 0.26.0 version where debug statement is updated. Thank you for you suggestion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants