-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent behaviour of trailing semicolon #207
Comments
@dariober, see #209 for some explanation. After looking at this more closely, it's actually expected behavior. Some explanation: First, the dialect inference is designed to handle inconsistencies by weighting more highly those lines with more attribute keys. The assumption is that more keys means more information for inferring dialect. See this line in Second, if there's a tie (which happens in the last two examples above each with 4 lines, two with trailing semicolons and two without), then the dialect falls back to the first one observed as a tiebreaker (this line). In fact, as demonstrated over in #209, if you add another line to act as a tiebreaker then you can force the dialect one way or the other, so it's actually consistent. Or I guess "internally consistent" would be more accurate. I think in this case, everything is behaving as expected and I'm not sure I would want to change anything in the code. Rather, for this example, you might want to force the dialect to have no trailing semicolon (e.g., set |
Sorry - me again. gffutils (v0.11.1) has inconsistent behavior regarding gff lines with or without a trailing semicolon. In some cases the trailing semicolon results in an attribute with empty string as key and None value, in some other cases the empty key is dropped. Here are some examples:
;
. So far so good:;
the second two don't. The first two lines have an empty string:;
and have the empty string attribute:It's not a big deal but it would be nice to have a consistent handling of trailing
;
. Personally, I don't see the point of an empty-string key withNone
value so I would be happy to always drop them.I came across this behavior when in some cases I inserted a key in the attribute list and it printed with two consecutive semicolons, like
ID=foo;Parent=bar;;gene_id=spam
, which I guess is harmless but looks confusing.The text was updated successfully, but these errors were encountered: