Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect JSON wrapper break JQ #120

Closed
andy65535 opened this issue Mar 11, 2022 · 3 comments
Closed

Incorrect JSON wrapper break JQ #120

andy65535 opened this issue Mar 11, 2022 · 3 comments

Comments

@andy65535
Copy link

andy65535 commented Mar 11, 2022

Describe the bug

Commonly used Linux command line JQ tool doesn't work with JSON which prepared by sdp, because jq expect that wrapper is double quote.
Example of real GCP BigQuery DDL:

DDL:

$ cat sample_ddl.sql

CREATE  TABLE `dataset.table`
(
    Updated_Date           DATE   NOT NULL OPTIONS (description =Description1),
    Lead_Store_Id          INT64  NOT NULL OPTIONS (description =Description2),
)
PARTITION BY LeadCreatedDate
CLUSTER BY Lead_Store_Id

OUTPUT:

$ sdp -o bigquery --no-dump sample_ddl.sql
Start parsing file sample_ddl.sql

File with result was saved to >> schemas folder
[{'alter': {},
  'checks': [],
  'cluster_by': ['Lead_Store_Id'],
  'columns': [{'check': None,
               'default': None,
               'name': 'Updated_Date',
               'nullable': False,
               'options': [{'description': '=Description1'}],
               'references': None,
               'size': None,
               'type': 'DATE',
               'unique': False},
              {'check': None,
               'default': None,
               'name': 'Lead_Store_Id',
               'nullable': False,
               'options': [{'description': '=Description2'}],
               'references': None,
               'size': None,
               'type': 'INT64',
               'unique': False}],
  'constraints': {'checks': None, 'references': None, 'uniques': None},
  'index': [],
  'partition_by': {'columns': ['LeadCreatedDate'], 'type': None},
  'partitioned_by': [],
  'primary_key': [],
  'schema': '`dataset',
  'table_name': 'table`',
  'tablespace': None}]

JQ run:

$ sdp -o bigquery --no-dump sample_ddl.sql |  egrep -v 'Start parsing file |File with result was saved to ' | jq .
parse error: Invalid numeric literal at line 2, column 10

Followed JSON is OK:

[{"alter": {},
  "checks": [],
  "cluster_by": ["Lead_Store_Id"],
  "columns": [{"check": "None",
               "default": "None",
               "name": "Updated_Date",
               "nullable": "False",
               "options": [{"description": "=Description1"}],
               "references": "None",
               "size": "None",
               "type": "DATE",
               "unique": "False"},
              {"check": "None",
               "default": "None",
               "name": "Lead_Store_Id",
               "nullable": "False",
               "options": [{"description": "=Description2"}],
               "references": "None",
               "size": "None",
               "type": "INT64",
               "unique": "False"}],
  "dataset": "`dataset",
  "index": [],
  "partition_by": {"columns": ["LeadCreatedDate"], "type": "None"},
  "partitioned_by": [],
  "primary_key": [],
  "table_name": "table`",
  "tablespace": "None"}]

Here is few issues:

  1. JQ expect that all values wrapped (None, False, True - as well)

  2. JQ expect that wrapper will be double quote, not single quote.
    JSON validator https://jsonformatter.curiousconcept.com/ said that points 1 and 2 are RFC violation.

  3. Incorrect parsing if table name wrapped by back quote: "CREATE TABLE `dataset.table`". As you can see from output, parser keeped back quote at the beginning (but sometimes at the ending) of the table name:

 'schema': '`dataset',
@xnuinside
Copy link
Owner

@andy65535 Hi! This is because in stdout output is not JSON, but python dict (and because of that you see None, in JSON it must be null). If you want JSON - remove flag --no-dump and it will dump json to the file. I didn't thought about use case that someone need json in stdout :) I will add in next release flag --json to get output in stdout as json, not Python object.

@andy65535
Copy link
Author

andy65535 commented Mar 26, 2022

@xnuinside thank you for replay!
I expected that stdout will be the same as output without --no-dump or with -t parameter. I use stdout as far I need specify another filename instead of default behavior (or -t what create folder for outputs).
Anyway, I removed --no-dump and JSON now formatted as expected.
But point 3 about back quotes from my initial report still in place:

CREATE TABLE `dataset.table`

parser leave back quotes and final JSON become:

...
    "table_name": "table`",
...                     ^
    "dataset": "`dataset"
...             ^

@xnuinside
Copy link
Owner

@andy65535 hi, I'm ready to kill myself for the my previous answer to you! first of all ))) you can get json output with python code )) (I really need spend time on documentation improvement) - you can do it like this (sample added in README): https://github.com/xnuinside/simple-ddl-parser#get-output-in-json

second - to get output without artifacts like ` at the end you can use flag "normalize_names=True" like here:

result = DDLParser(ddl, silent=False, normalize_names=True).run(group_by_type=True)

so if you will run parser like:

result = DDLParser(ddl, normalize_names=True).run(json_dump=True, group_by_type=True)

You will get required output.

by default (without flag normalize_names=True) parser leave as is all symbols to get possible reconstruct DDL from parser output

Again, one more time SORRY for my wrong answer )

With CLI is not possible to use normalize_names flag - I will work in next release to update the CLI to keep all arguments & settings for parser

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants