More case references (BVerwG, Sozialgerichte), resolved false positives, small improvements #6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi Malte,
I contributed some of the things mentioned in #5 and solved the false positives mentioned in #4. This includes:
Tell me if there is anything you would like me to change, I'd be happy to do so.
Commit overview:
Some more details:
Change case codes & their extraction 843b844
Currently the case codes are not being used. However, during my work I recognized that there are several ones from gerichtsaktenzeichen.de that are actually differently spelled in reality, at least when I checked on dejure. In case that list is being used in future I included those.
Change cases regex to include BVerwG cases and some heuristics 9f1f790
I first tried to use a concluding list of case codes (I think you did so too, juding from the code) but I too found it very hard, especially since the list from gerichtsaktenzeichen.de seems to be unrealiable. I agree with you that machine learning would be the right approach here. However, for now I did include some heuristics (like that the case code always starts with a capital letter, or that the chamber is not an arbitrary big number) which should filter out some false positives.
Add support for Sozialgerichtsbarkeit cases 8da12d8
I tried to leave the regex as small as possible, since you mentioned timeouts, so I included the format in the general file number regex (even though they are semantical a bit different, which I documented). I also did a bit on work on the court search, which is not perfect yet, but a incremental improvement.