Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRLF support for file formats other than CSV #19

Closed
Komosa opened this issue Aug 20, 2015 · 9 comments
Closed

CRLF support for file formats other than CSV #19

Komosa opened this issue Aug 20, 2015 · 9 comments

Comments

@Komosa
Copy link

Komosa commented Aug 20, 2015

Currently parsing files with CRLF line ending is not supported. (I used simplest possible DKVP file cat'ed to mlr put)
I know, that now you are working on CSV-RFC4180 support (which explicitly says that CRLF is vaild (and required - :( ).

For now, I can suggest following:
0. at least log warning if CRLF is encountered.

  1. silently drop CRLFs

Later of course proper support for line endings should be added :)

BTW. really great tool!

@johnkerl
Copy link
Owner

thanks! yes, rfc-csv currently #1. minimally, adding an option to do CRLF rather than LF is feasible. as an aside, depending on how the technical details work out i may replace my current pointer-walking file parsers with a lemon parser (currently i only use a lemon parser for DSLs, not file data) in which case we could get double-quoting in other formats besides CSV.

@ve3ied
Copy link

ve3ied commented Aug 21, 2015

Or, stick tr out front for now.

tr -d '\015' | mlr ...

@johnkerl
Copy link
Owner

Marking as wishlist for now, just relative to other things more pressing -- namely, RFC-CSV and packaging. In the medium term, though, this will be a should-do rather than a wishlist item.

@johnkerl
Copy link
Owner

Dup'ing to #50

@johnkerl
Copy link
Owner

Change of plans ... multi-character-separator options for CSV is separately implemented on #50. Meanwhile doing that for other formats -- and doing it performantly (RFC-CSV I/O is already slow, and I don't want to make the other formats also slow) -- is a separate, still-open coding task.

@johnkerl johnkerl reopened this Sep 13, 2015
@johnkerl
Copy link
Owner

So, I'm re-opening this issue.

@johnkerl johnkerl changed the title CRLF support CRLF support for file formats other than CSV Sep 13, 2015
@johnkerl johnkerl added active and removed on deck labels Sep 16, 2015
@johnkerl
Copy link
Owner

Support now exists for DKVP, e.g. mlr --rs crlf cut -f a,x,y myfile.dkvp, or mlr --ifs ': ' --ofs =, etc.

Please let me know if you find a bug and I'll add it as a unit-test case (in addition to fixing the bug of course).

I'll leave this issue open until other formats also have multi-character support.

@johnkerl
Copy link
Owner

Multi-char-separator CSV-lite done

@johnkerl johnkerl removed the active label Sep 22, 2015
@johnkerl
Copy link
Owner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants