`Parsed` documentation and error handling #1418

pitdicker · 2024-02-08T18:18:36Z

I intend to split this PR in multiple parts, but opened it for discussion on how best to do so.

The main goal is to move all range validation to the Parsed::set_* methods.
This is preliminary work to convert this type and the parsing methods to our new Error type (cc @Zomtir).

It has four pieces:

1. For the 0.4 branch

We have two small bugs in our implementation of Parsed:

If there is a timestamp and an offset field, the offset is added to the timestamp. But the definition of a Unix timestamp is that the value is in UTC, so this is not correct.
We were returning OUT_OF_RANGE if the year value didn't match with values of year_div_100 or year_mod_100. It should return IMPOSSIBLE instead.

2. Documentation

I have added documentation to all methods describing their error causes. And the Parsed type got some documentation describing why it exists (the resolution algorithm), and an example of how to use it. Partly taken from the blog post before chrono 0.2: https://lifthrasiir.github.io/rustlog/worklog-2015-02-19.html

3. Goal: move all range validation to the `set_*` methods

The goal of this PR was to move all range validation to the set_* methods.

Currently the set_* methods do a partial range check, or just a checked cast, and the to_* methods do the complete range checks. Doing them all in the set_* methods will allow us to give more a precise error when we convert the parsing code to the new error type. (It can then return the position in the format string where an invalid value appeared.)

4. Nice to have

It would be nice if the to_ methods can rely on the range checks performed by set_*.
But currently they can't because the fields of Parsed are public.

Making the fields private seems like a good thing to do for this type in general. It involves:

Adding functions to get each individual field.
Changing the macro-heavy tests in format::parse to use the set_* methods.

As a clean solution to replacing the macro's I changed the return type of the set_* methods to be usable as a builder pattern.

How to split

I can open a PR to the 0.4 branch with the bug fixes.
I can also backport the documentation changes. They will be a bit less nice without the builder pattern and with more error cases.
We could bring the range validation in the set_* methods to the 0.4 branch. We already document they do range checks, only that they may not have been precise and delay the error until the to_* methods.
Converting the set_* methods to a builder pattern, making the fields of Parsed private and simplifying the to_* methods require 0.5. None of those are strictly necessary, but they do improve our error story and are nice to have.

djc · 2024-02-10T09:49:58Z

The Parsed documentation mentions the set_() methods don't do all the checking because of efficiency. Why do we want to move all checks to the set_() methods? If we move all checks to set_() methods, can we get rid of the checks in to_() methods entirely, or will we still need to check some things?

Zomtir · 2024-02-10T14:20:07Z

The Parsed documentation mentions the set_() methods don't do all the checking because of efficiency. Why do we want to move all checks to the set_() methods? If we move all checks to set_() methods, can we get rid of the checks in to_() methods entirely, or will we still need to check some things?

I think those are separate points. One benefit of using set_() would be that at least some portion of the code is cross-validated. Not only do we check that the parser produces the correct outcome, it is also checked that the reference values are meaningful.

Another point is that iterative macros (like parsed!) are not trivial to wrap your head around (compared to the straight forward let p = Parsed::new;). On the other hand named parameters do look a bit nicer in the code.

Eventually if all set_() methods cover all checks, then yes, to_() would return T rather than Result<T>. But I don't think this is the scope. I guess set_() is meant to check for string related issues while to_() is meant to check the logic of the whole Parsed struct.

codecov · 2024-02-11T07:21:53Z

Codecov Report

Attention: Patch coverage is 82.95964% with 76 lines in your changes are missing coverage. Please review.

Project coverage is 93.32%. Comparing base (130bf14) to head (8f1bb55).

❗ Current head 8f1bb55 differs from pull request most recent head 8fe856a. Consider uploading reports for the commit 8fe856a to get more accurate results

Files	Patch %	Lines
src/format/parsed.rs	84.67%	59 Missing ⚠️
src/format/parse.rs	72.22%	15 Missing ⚠️
src/error.rs	0.00%	1 Missing ⚠️
src/format/mod.rs	83.33%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##            0.5.x    #1418      +/-   ##
==========================================
- Coverage   94.28%   93.32%   -0.97%     
==========================================
  Files          37       37              
  Lines       16969    15959    -1010     
==========================================
- Hits        15999    14893    -1106     
- Misses        970     1066      +96

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pitdicker · 2024-02-20T12:22:24Z

The Parsed documentation mentions the set_() methods don't do all the checking because of efficiency.

The git commits are interesting. Parsing was introduced with v0.2.0. The same commits that introduced Parsed and this documentation say:

I absolutely don't know about the parser's performance!

We are already doing checks, and doing magnitudes more work than these checks when parsing. I'm not concerned this changes anything in terms of efficiency.

Why do we want to move all checks to the set_() methods?

It will allow the parsing code, that uses Parsed::set_*, to give precise errors that include the position of the error. That would allow neat error messages such as (just fantasizing, not planned for chrono):

Error parsing input: "2024-01-32 13:12:44"
                               ^ Invalid value

It takes little effort for us to supply this extra information; we only need to move these range checks to happen earlier in Parsed.

If we move all checks to set_() methods, can we get rid of the checks in to_() methods entirely, or will we still need to check some things?

We still need to check most things. The check that all fields are consistent with each other can only happen at the end. Doing the check after any parsed field is wasteful. And trying to resolve fields to a date/time/offset is also wasteful until we know we have all the fields that we are going to get.

pitdicker force-pushed the parsed_changes branch 2 times, most recently from ddfb12a to dddd56f Compare February 8, 2024 20:17

pitdicker force-pushed the parsed_changes branch from dddd56f to 62fe7df Compare February 14, 2024 06:18

pitdicker marked this pull request as draft February 14, 2024 06:22

pitdicker force-pushed the parsed_changes branch 3 times, most recently from ac92b04 to 3540ede Compare February 14, 2024 08:06

pitdicker mentioned this pull request Feb 14, 2024

Parsed fixes and documentation #1439

Merged

pitdicker force-pushed the parsed_changes branch 2 times, most recently from dcfe74e to 7745c97 Compare February 19, 2024 13:25

pitdicker added 5 commits February 25, 2024 17:33

Return IMPOSSIBLE if year fields don't match

70dad15

Expand Parsed documentation

f1c72f2

Improve documentation of Parsed::set_*

e1d7b81

Improve documentation of Parset::to_*

5e11a10

Add range checks to the Parsed::set_* methods

99121dd

pitdicker force-pushed the parsed_changes branch from 7745c97 to d074166 Compare February 25, 2024 16:35

pitdicker added 6 commits February 25, 2024 17:36

Merge branch 'parsed_range_checks' into parsed_changes

de875d4

Add getters to Parsed

b0fea41

Support using Parsed methods in a builder pattern

5636d2f

Convert macro to methods

2bcb464

Make fields on Parsed private

4dae700

Don't set private fields to invalid values in tests

1f05c7c

pitdicker force-pushed the parsed_changes branch 2 times, most recently from ebf35b6 to 8f1bb55 Compare February 25, 2024 18:34

pitdicker added 2 commits February 25, 2024 21:00

Remove unnecessary range checks from Parsed::to_*

c7f1bb8

Simplify error value of parse_internal

80f7ddc

pitdicker added 3 commits February 25, 2024 21:00

Convert Parsed::set_* to new error type

f3c17ab

Attach remainder to error

bd349a2

WIP

8fe856a

pitdicker force-pushed the parsed_changes branch from 8f1bb55 to 8fe856a Compare February 25, 2024 20:08

pitdicker closed this Feb 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Parsed` documentation and error handling #1418

`Parsed` documentation and error handling #1418

pitdicker commented Feb 8, 2024

djc commented Feb 10, 2024

Zomtir commented Feb 10, 2024

codecov bot commented Feb 11, 2024 •

edited

Loading

pitdicker commented Feb 20, 2024

Parsed documentation and error handling #1418

Parsed documentation and error handling #1418

Conversation

pitdicker commented Feb 8, 2024

1. For the 0.4 branch

2. Documentation

3. Goal: move all range validation to the set_* methods

4. Nice to have

How to split

djc commented Feb 10, 2024

Zomtir commented Feb 10, 2024

codecov bot commented Feb 11, 2024 • edited Loading

Codecov Report

pitdicker commented Feb 20, 2024

`Parsed` documentation and error handling #1418

`Parsed` documentation and error handling #1418

3. Goal: move all range validation to the `set_*` methods

codecov bot commented Feb 11, 2024 •

edited

Loading