Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsed documentation and error handling #1418

Closed
wants to merge 16 commits into from

Conversation

pitdicker
Copy link
Collaborator

I intend to split this PR in multiple parts, but opened it for discussion on how best to do so.

The main goal is to move all range validation to the Parsed::set_* methods.
This is preliminary work to convert this type and the parsing methods to our new Error type (cc @Zomtir).

It has four pieces:

1. For the 0.4 branch

We have two small bugs in our implementation of Parsed:

  • If there is a timestamp and an offset field, the offset is added to the timestamp. But the definition of a Unix timestamp is that the value is in UTC, so this is not correct.
  • We were returning OUT_OF_RANGE if the year value didn't match with values of year_div_100 or year_mod_100. It should return IMPOSSIBLE instead.

2. Documentation

I have added documentation to all methods describing their error causes. And the Parsed type got some documentation describing why it exists (the resolution algorithm), and an example of how to use it. Partly taken from the blog post before chrono 0.2: https://lifthrasiir.github.io/rustlog/worklog-2015-02-19.html

3. Goal: move all range validation to the set_* methods

The goal of this PR was to move all range validation to the set_* methods.

Currently the set_* methods do a partial range check, or just a checked cast, and the to_* methods do the complete range checks. Doing them all in the set_* methods will allow us to give more a precise error when we convert the parsing code to the new error type. (It can then return the position in the format string where an invalid value appeared.)

4. Nice to have

It would be nice if the to_ methods can rely on the range checks performed by set_*.
But currently they can't because the fields of Parsed are public.

Making the fields private seems like a good thing to do for this type in general. It involves:

  • Adding functions to get each individual field.
  • Changing the macro-heavy tests in format::parse to use the set_* methods.

As a clean solution to replacing the macro's I changed the return type of the set_* methods to be usable as a builder pattern.

How to split

  1. I can open a PR to the 0.4 branch with the bug fixes.
  2. I can also backport the documentation changes. They will be a bit less nice without the builder pattern and with more error cases.
  3. We could bring the range validation in the set_* methods to the 0.4 branch. We already document they do range checks, only that they may not have been precise and delay the error until the to_* methods.
  4. Converting the set_* methods to a builder pattern, making the fields of Parsed private and simplifying the to_* methods require 0.5. None of those are strictly necessary, but they do improve our error story and are nice to have.

@pitdicker pitdicker force-pushed the parsed_changes branch 2 times, most recently from ddfb12a to dddd56f Compare February 8, 2024 20:17
@djc
Copy link
Member

djc commented Feb 10, 2024

The Parsed documentation mentions the set_() methods don't do all the checking because of efficiency. Why do we want to move all checks to the set_() methods? If we move all checks to set_() methods, can we get rid of the checks in to_() methods entirely, or will we still need to check some things?

@Zomtir
Copy link
Contributor

Zomtir commented Feb 10, 2024

The Parsed documentation mentions the set_() methods don't do all the checking because of efficiency. Why do we want to move all checks to the set_() methods? If we move all checks to set_() methods, can we get rid of the checks in to_() methods entirely, or will we still need to check some things?

I think those are separate points. One benefit of using set_() would be that at least some portion of the code is cross-validated. Not only do we check that the parser produces the correct outcome, it is also checked that the reference values are meaningful.

Another point is that iterative macros (like parsed!) are not trivial to wrap your head around (compared to the straight forward let p = Parsed::new;). On the other hand named parameters do look a bit nicer in the code.

Eventually if all set_() methods cover all checks, then yes, to_() would return T rather than Result<T>. But I don't think this is the scope. I guess set_() is meant to check for string related issues while to_() is meant to check the logic of the whole Parsed struct.

Copy link

codecov bot commented Feb 11, 2024

Codecov Report

Attention: Patch coverage is 82.95964% with 76 lines in your changes are missing coverage. Please review.

Project coverage is 93.32%. Comparing base (130bf14) to head (8f1bb55).

❗ Current head 8f1bb55 differs from pull request most recent head 8fe856a. Consider uploading reports for the commit 8fe856a to get more accurate results

Files Patch % Lines
src/format/parsed.rs 84.67% 59 Missing ⚠️
src/format/parse.rs 72.22% 15 Missing ⚠️
src/error.rs 0.00% 1 Missing ⚠️
src/format/mod.rs 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            0.5.x    #1418      +/-   ##
==========================================
- Coverage   94.28%   93.32%   -0.97%     
==========================================
  Files          37       37              
  Lines       16969    15959    -1010     
==========================================
- Hits        15999    14893    -1106     
- Misses        970     1066      +96     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pitdicker pitdicker marked this pull request as draft February 14, 2024 06:22
@pitdicker pitdicker force-pushed the parsed_changes branch 3 times, most recently from ac92b04 to 3540ede Compare February 14, 2024 08:06
@pitdicker pitdicker force-pushed the parsed_changes branch 2 times, most recently from dcfe74e to 7745c97 Compare February 19, 2024 13:25
@pitdicker
Copy link
Collaborator Author

The Parsed documentation mentions the set_() methods don't do all the checking because of efficiency.

The git commits are interesting. Parsing was introduced with v0.2.0. The same commits that introduced Parsed and this documentation say:

  • I absolutely don't know about the parser's performance!

We are already doing checks, and doing magnitudes more work than these checks when parsing. I'm not concerned this changes anything in terms of efficiency.

Why do we want to move all checks to the set_() methods?

It will allow the parsing code, that uses Parsed::set_*, to give precise errors that include the position of the error. That would allow neat error messages such as (just fantasizing, not planned for chrono):

Error parsing input: "2024-01-32 13:12:44"
                               ^ Invalid value

It takes little effort for us to supply this extra information; we only need to move these range checks to happen earlier in Parsed.

If we move all checks to set_() methods, can we get rid of the checks in to_() methods entirely, or will we still need to check some things?

We still need to check most things. The check that all fields are consistent with each other can only happen at the end. Doing the check after any parsed field is wasteful. And trying to resolve fields to a date/time/offset is also wasteful until we know we have all the fields that we are going to get.

@pitdicker pitdicker force-pushed the parsed_changes branch 2 times, most recently from ebf35b6 to 8f1bb55 Compare February 25, 2024 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants