-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leading spaces in header fields #78
Comments
If your struct is like this: #[derive(Deserialize)]
struct Record {
medallion: String,
hack_license: String,
...
} Then you could do: #[derive(Deserialize)]
struct Record {
#[serde(rename = " medallion")]
medallion: String,
#[serde(rename = " hack_license")]
hack_license: String,
...
} I'm not sure whether this is common enough to elevate this to a more convenient feature. Certainly, I've seen other CSV parsers have options like, "trim all whitespace around field values," which is something that could be feasibly added with some minor additional cost. |
I'm doing a CSV to JSON conversion, hence I ended up with: struct Fare {
medallion: String,
#[serde(rename(deserialize = " hack_license", serialize = "hack_license"))]
hack_license: String,
...
} Which isn't that nice. |
Here is what I'd suggest we do. In the pub enum Trim {
None,
Headers,
/// Hints that destructuring should not be exhaustive.
///
/// This enum may grow additional variants, so this makes sure clients
/// don't count on exhaustive matching. (Otherwise, adding a new variant
/// could break existing code.)
#[doc(hidden)]
__Nonexhaustive,
}
I think probably the easiest place to implement this is in the I think it is OK to create a new record that corresponds to the previous record, but with its fields trimmed. This introduces an extra allocation, but since it's only for the header record, I think that's OK. The hardest part of this will probably be trimming a |
I also encountered this, although it was with field values, not headers. A solution in my case would have been to allow a multi-character delimiter, like "; ", but support for trimming headers and fields would also work. |
BTW I also would like this for rows as well as headers. In general it would be nice to have a trim trait or something (or for me preferably a dont_trim trait and the default is to trim). I'm willing to write it but a rough roadmap of what files to look at and maybe a good write up on how to implement traits would be really helpful for me. |
@medwards This is where I'd start: #78 (comment) --- Note that I'm not sure why you're talking about traits here. I don't think implementing this feature should require any new traits. |
Sorry, I misspoke. I think I meant attributes (ie |
I don't think this needs attributes either. We can't add new Serde attributes anyway. I'm thinking that this is a CSV reader configuration knob that is applied to every field (or just the header, or whatever). |
Ok sounds good, I'll try to put some time in this weekend. |
@medwards Awesome, thanks! I'm |
first stab at BurntSushi#78
I took a stab at it (not fully tested) but I want to rethink things. From what I can tell the only opportune place to create a new record for whitespace trimming is in I can mess with |
This commit adds support for trimming CSV records. There are two levels of support: 1. Both `ByteRecord` and `StringRecord` have grown `trim` methods. A `ByteRecord` trims ASCII whitespace while a `StringRecord` trims Unicode whitespace. 2. The CSV reader can now be configured to automatically trim all records that it reads. This is useful when using Serde to match header names with spaces (for example) to struct member names. Fixes #78
This commit adds support for trimming CSV records. There are two levels of support: 1. Both `ByteRecord` and `StringRecord` have grown `trim` methods. A `ByteRecord` trims ASCII whitespace while a `StringRecord` trims Unicode whitespace. 2. The CSV reader can now be configured to automatically trim all records that it reads. This is useful when using Serde to match header names with spaces (for example) to struct member names. Fixes #78
I came across a CSV file which had leading spaces in the header fields:
When I deserialized it with Serde, each field has a space in front. Would it make sense to have an option to trim the header fields or to transform them in a more general way?
The text was updated successfully, but these errors were encountered: