Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to prevent echo '{ "a": "0123" }' | mlr --json cat --> {"a": 0123 } ? #178

Closed
pozix604 opened this issue Jul 18, 2018 · 14 comments
Closed
Labels
go-port Things which will be addressed in the Go port AKA Miller 6

Comments

@pozix604
Copy link

pozix604 commented Jul 18, 2018

Miller is not respecting that strings might have numbers (that are actually arbitrary IDs) inside. For example,

echo '{ "a": "0123" }' | mlr --json cat

produces

{ "a": 0123 }

Then, if you try to take that and put it back into miller, it results in an error.

$ echo '{ "a": "0123" }' | mlr --json cat
{ "a": 0123 }

$ echo '{ "a": "0123" }' | mlr --json cat | mlr --json cat
mlr: Unable to parse JSON data: Line 1 column 0: Unexpected `0` before `1`

How do I suppress this turning of a string into a number?

@johnkerl
Copy link
Owner

Looks like #151 but it's not:

$ cat j
{"a":"0123"}
$ cat d
a=0123
$ mlr --json cat j
{ "a": 0123 }
$ mlr --dkvp cat d
a=0123

Looks like a simple oversight; I'll check it out.

Thanks for letting me know!!!

@johnkerl
Copy link
Owner

$ cat j
{"a":"0123"}
{"a":"1123"}
{"a":"x123"}

$ cat d
a=0123
a=1123
a=x123
$ mlr --dkvp put '$t=typeof($a)' d
a=0123,t=int
a=1123,t=int
a=x123,t=string

$ mlr --json put '$t=typeof($a)' j
{ "a": 0123, "t": "int" }
{ "a": 1123, "t": "int" }
{ "a": "x123", "t": "string" }

$ mlr --d2j put '$t=typeof($a)' d
{ "a": 0123, "t": "int" }
{ "a": 1123, "t": "int" }
{ "a": "x123", "t": "string" }

$ mlr --j2d put '$t=typeof($a)' j
a=0123,t=int
a=1123,t=int
a=x123,t=string
$ mlr --dkvp put -S '$t=typeof($a)' d
a=0123,t=string
a=1123,t=string
a=x123,t=string

$ mlr --json put -S '$t=typeof($a)' j
{ "a": 0123, "t": "string" }
{ "a": 1123, "t": "string" }
{ "a": "x123", "t": "string" }

$ mlr --d2j put -S '$t=typeof($a)' d
{ "a": 0123, "t": "string" }
{ "a": 1123, "t": "string" }
{ "a": "x123", "t": "string" }

$ mlr --j2d put -S '$t=typeof($a)' j
a=0123,t=string
a=1123,t=string
a=x123,t=string

@johnkerl
Copy link
Owner

OK, sorry for the delay.

This is not a pretty story.

  • All key-value pairs within record structures are string-to-string. This was done particularly for performance: when looping over a 30-column table, often we want to only do ops on a few columns. So the record-reader simply identifies starts and ends of strings and doesn't type-convert (or even touch) the values unless necessary.
  • Within the put/filter DSL there is typing but things are written back to the record structures, i.e. any typing is discarded.
  • Output-quoting is an issue for CSV/TSV, and JSON. For CSV there are arguments like --quote-all. For JSON I made some code which is quote-unless-it-looks-numeric

Options:

  • Tracking the double-quotedness of a particular field from input-parsing, through data processing, through to output-formatting would be a rather significant effort.
  • Type-detection for all field values is something I am reluctant to do, for the sake of performance.
  • Perhaps a --quote-all or --quote-named-fields {X,Y,Z} argument would be a workaround ... :^/

@johnkerl
Copy link
Owner

Appears to be related to #211

@dbabits
Copy link

dbabits commented Mar 16, 2020

I'm facing a related issue where some of my integers must be quoted in the output JSON, and others must be not (downstream system requirement).
--quote-named-fields {X,Y,Z} sounds like the ideal solution.
Any plans on implementing it?

@johnkerl
Copy link
Owner

Hi @dbabits -- the best I can see in the near term is perhaps a --quote-all or --quote-named-fields {X,Y,Z} argument as be a workaround ... :^/

@dbabits
Copy link

dbabits commented Mar 16, 2020

@johnkerl thanks.
Looks like --quote-named-fields is not implemented yet?

> c/mlr --quote-named-fields {X,Y,Z}
mlr: option "--quote-named-fields" not recognized.

c/mlr --version
Miller 5.6.2

@johnkerl
Copy link
Owner

Correct. I can do that though.

@dbabits
Copy link

dbabits commented Mar 17, 2020

that would be much appreciated, thanks.
Just to clarify, my particular case is json output, and field names look like this: tags:X, tags:Y resulting in a nested JSON structure

@Anth0nyME
Copy link

Anth0nyME commented Jul 2, 2020

I have similar problem.

Input (CSV):

id,000000E020206941
name,v3700v2
location,local
total_free_space,1045480939008

Output (JSON):

[
{
"id": 000000E020206941,
"name": "v3700v2",
"location": "local",
"total_free_space": 1045480939008
}
]

Id: string is wrongly treated as number..
Anyone have working solution?

@johnkerl
Copy link
Owner

This will be addressed by the Go port.

@johnkerl johnkerl added the go-port Things which will be addressed in the Go port AKA Miller 6 label Sep 15, 2020
@johnkerl
Copy link
Owner

This is working in the Go port:

$ echo '{ "a": "0123" }' | mlr --json cat
{
  "a": "0123"
}

@aborruso
Copy link
Contributor

This is working in the Go port:

I have tested and it's really great

@johnkerl
Copy link
Owner

Fixed in Miller 6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go-port Things which will be addressed in the Go port AKA Miller 6
Projects
None yet
Development

No branches or pull requests

5 participants