BulkExport.to_str() writes when delimiter is present in table #45

theianrobertson · 2017-11-08T20:14:43Z

If I have a table with a pipe (|) in a character field and try to export and write to a string, there's no checking for whether the delimiter exists in any of the fields:

import giraffez

sql = """SELECT * FROM 
(
SELECT
CAST('first|field' AS VARCHAR(20)) AS first_char,
CAST('second field' AS VARCHAR(20)) AS second_char
) AS tbl
"""

with giraffez.BulkExport(sql) as export:
    r = list(export.to_str(delimiter='|'))

print(r)

['first|field|second field']

I could bring it out with a different delimiter, or roll my own to_str that takes in a dict from to_dict/to_list, but wondering if there should be functionality pushed up into to_str to error out, warn, or remove the field if the delimiter exists in a column?

The text was updated successfully, but these errors were encountered:

ChrisRx · 2018-02-05T16:34:04Z

Checking the delimiter for every byte of data would introduce a very significant performance change to cover this case. While I agree that it would be nice to find that the source data has a character that is being used for a delimiter, but I think for the most part this is why the to_list and to_dict exists so that in the case that this behavior is desired they can be used to produce those results with a higher level of safety (i.e. use to_list and pass to Python's csv module). Since to_str is building a buffer in C and returning that as text it would be a fairly large change to try and introduce the kind of logic that may address some of these concerns like a quote character. This was not originally in scope for this because to_str is suppose to be fast and straight-forward, and I also wish to discourage the use of character delimited formats altogether because they are not portable or safe to use in far too many cases, vs. something like JSON which fully supports unicode, is self-describing of all fields, and is portable regardless of OS/CPU Architecture.

I think it is possible to better abstract this so that it is either more apparent how to solve issues like in your example, or to just provide a better interface that would solve for this automatically. For example, to_str is still available but under the hood it just calls to_list and uses something like csv rather than providing the other option that builds the string in C. Thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BulkExport.to_str() writes when delimiter is present in table #45

BulkExport.to_str() writes when delimiter is present in table #45

theianrobertson commented Nov 8, 2017

ChrisRx commented Feb 5, 2018

BulkExport.to_str() writes when delimiter is present in table #45

BulkExport.to_str() writes when delimiter is present in table #45

Comments

theianrobertson commented Nov 8, 2017

ChrisRx commented Feb 5, 2018