Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More case references (BVerwG, Sozialgerichte), resolved false positives, small improvements #6

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

d-e-h-i-o
Copy link

@d-e-h-i-o d-e-h-i-o commented Jul 19, 2021

Hi Malte,

I contributed some of the things mentioned in #5 and solved the false positives mentioned in #4. This includes:

  • Support for BVerwG and Sozialgericht cases (and tests)
  • Resolving the false positives from False-Positive Extractions #4 (and tests)
  • Some small stuff on formatting, type hints, and test cases

Tell me if there is anything you would like me to change, I'd be happy to do so.

Commit overview:

  • Formatting (with black) 1af38f2
  • Replace type comments with type hints + add some more a9f7793
  • Change case codes & their extraction 843b844
  • Change cases regex to include BVerwG cases and some heuristics 9f1f790
  • Add support for Sozialgerichtsbarkeit cases 8da12d8

Some more details:

  1. Change case codes & their extraction 843b844
    Currently the case codes are not being used. However, during my work I recognized that there are several ones from gerichtsaktenzeichen.de that are actually differently spelled in reality, at least when I checked on dejure. In case that list is being used in future I included those.

  2. Change cases regex to include BVerwG cases and some heuristics 9f1f790
    I first tried to use a concluding list of case codes (I think you did so too, juding from the code) but I too found it very hard, especially since the list from gerichtsaktenzeichen.de seems to be unrealiable. I agree with you that machine learning would be the right approach here. However, for now I did include some heuristics (like that the case code always starts with a capital letter, or that the chamber is not an arbitrary big number) which should filter out some false positives.

  3. Add support for Sozialgerichtsbarkeit cases 8da12d8
    I tried to leave the regex as small as possible, since you mentioned timeouts, so I included the format in the general file number regex (even though they are semantical a bit different, which I documented). I also did a bit on work on the court search, which is not perfect yet, but a incremental improvement.

The list of codes is originally adapted from gerichtsaktenzeichen.de, however, some codes are actually written differently (e.g. "W (pat)" is space separated), and some missing are added.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant