Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with commas: from to DKVP to CSV #266

Open
aborruso opened this issue Sep 6, 2019 · 8 comments
Open

Problem with commas: from to DKVP to CSV #266

aborruso opened this issue Sep 6, 2019 · 8 comments

Comments

@aborruso
Copy link
Contributor

aborruso commented Sep 6, 2019

Hi,
this is a very stupid and basic question, but I'm just starting now to use DPKV.

If I run echo "name=Andrea,WhatIlike=mlr,john" |mlr --ocsv cat I have

name,WhatIlike,3
Andrea,mlr,john

How is it possible to have

name,WhatIlike
Andrea,"mlr,john"

Moreover if I start from CSV, then convert it to DKPV, then to CSV again, the last CSV is not equal to the source one. If I start from

name,WhatIlike
Andrea,"mlr,john"

and run mlr --icsv cat input.txt | mlr --ocsv cat

I have

name,WhatIlike,3
Andrea,mlr,john

Thank you

@aborruso
Copy link
Contributor Author

aborruso commented Sep 7, 2019

Moreover if I start from CSV, then convert it to DKPV, then to CSV again, the last CSV is not equal to the source one. If I start from

A way to solve is

mlr --icsv --ofs "\t" cat input.txt  | mlr --ocsv --ifs "\t" cat

It gives me

name,WhatIlike
Andrea,"mlr,john"

@johnkerl
Copy link
Owner

johnkerl commented Sep 7, 2019

This is not a stupid question at all!!

DKVP doesn't have a way to handle delimiters within data.

One option is using a different delimiter, e.g.

$ echo "name=Andrea;WhatIlike=mlr,john" | mlr --ifs semicolon --ocsv cat 
name,WhatIlike
Andrea,"mlr,john"

The other thing is that DKVP should be extended to handle double-quoting, like RFC-compliant CSV already does.

@aborruso
Copy link
Contributor Author

aborruso commented Sep 9, 2019

Hi @johnkerl in some way it's like - using LibreOffice Calc - you opened a CSV file, than saved it in Calc native format (ods), and then saved it again in csv. And without making any changes, and without any alert, your start file is different from the end one.

I think that extend DKVP could be important. You don't know how much I regret not knowing how to help you coding.

Meanwhile mlr could produce an alert everytime there is a conversion from DKVP to CSV and vice versa.

Thank you

@johnkerl
Copy link
Owner

This will be addressed by the Go port.

@johnkerl johnkerl added the go-port Things which will be addressed in the Go port AKA Miller 6 label Sep 15, 2020
@dbro
Copy link

dbro commented Oct 18, 2020

In case it's helpful, handling delimiters inside fields is exactly what csvquote does. Feel free to reuse that code in future versions of miller.

@johnkerl
Copy link
Owner

Thanks @dbro !! :D

@aborruso
Copy link
Contributor Author

aborruso commented Dec 7, 2020

This will be addressed by the Go port.

I have tested today. First running

echo '"ciao, come stai"' | mlrgo --icsv -N cat

And I have had 1=ciao, come stai.

But when I run the inverse

echo '1=ciao, come stai' | mlrgo --ocsv cat

I have a wrong output

1,2
ciao," come stai"

I have used the latest go binary, but it's probably not implemented yet.

Thank you

@johnkerl
Copy link
Owner

johnkerl commented Dec 7, 2020

@aborruso thanks for testing!

You're right, the DKVP reader in the Go port is still quotes-unaware as the C one is -- it doesn't yet have smart quote-handling implemented yet.

@johnkerl johnkerl removed the go-port Things which will be addressed in the Go port AKA Miller 6 label Nov 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants