-
-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading CSV file returns incorrect line break content #9797
Comments
Here is a minimal test case to reproduce: Test input CSV:
Current JSON output from Pandoc (3.1.13): {"pandoc-api-version":[1,23,1],"meta":{},"blocks":[{"t":"Table","c":[["",[],[]],[null,[]],[[{"t":"AlignDefault"},{"t":"ColWidthDefault"}]],[["",[],[]],[[["",[],[]],[[["",[],[]],{"t":"AlignDefault"},1,1,[{"t":"Plain","c":[{"t":"Str","c":"one_line_break:"},{"t":"SoftBreak"},{"t":"Str","c":"four_line_breaks:"},{"t":"SoftBreak"},{"t":"Str","c":"last_line"}]}]]]]]],[[["",[],[]],0,[],[]]],[["",[],[]],[]]]}]} Expected JSON output (edited for accuracy to input): {"pandoc-api-version":[1,23,1],"meta":{},"blocks":[{"t":"Table","c":[["",[],[]],[null,[]],[[{"t":"AlignDefault"},{"t":"ColWidthDefault"}]],[["",[],[]],[[["",[],[]],[[["",[],[]],{"t":"AlignDefault"},1,1,[{"t":"Plain","c":[{"t":"Str","c":"one_line_break:"},{"t":"LineBreak"},{"t":"Str","c":"four_line_breaks:"},{"t":"LineBreak"},{"t":"LineBreak"},{"t":"LineBreak"},{"t":"LineBreak"},{"t":"Str","c":"last_line"}]}]]]]]],[[["",[],[]],0,[],[]]],[["",[],[]],[]]]}]} This is the same as the previous data, but replaces If you save the above two JSON data structures as pandoc current.json -o current.odt
pandoc expected.json -o expected.odt you'll see that the table cell content in |
When multiple sequential new line characters appear inside a quoted CSV field, Pandoc coalesces these into a single
SoftBreak
in the resulting AST. According to RFC 4180, this would seem to be incorrect behavior. The RFC's grammar treats CR and LF like any other character inside a quoted field.Shouldn't individual
LineBreak
s be returned for\r\n\r\n\r\n
rather than a singleSoftBreak
by the CSV reader?At minimum, I would think there should be no information loss during the read, which means encoding the original number of line breaks in some way. Currently, it's not possible to reconstruct the input data accurately from the AST.
Tested with Pandoc 3.1.13
The text was updated successfully, but these errors were encountered: