Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Germany] invalid e-mail addresses in the data #95

Closed
augusto-herrmann opened this issue Aug 22, 2018 · 2 comments
Closed

[Germany] invalid e-mail addresses in the data #95

augusto-herrmann opened this issue Aug 22, 2018 · 2 comments
Assignees
Labels
Data Data sources and ingestion automation

Comments

@augusto-herrmann
Copy link
Collaborator

Goodtables detects some problems in the data for Germany in data/de.csv:

$ goodtables --schema public-body-schema.json data/de.csv
DATASET
=======
{'error-count': 5,
 'preset': 'nested',
 'table-count': 1,
 'time': 0.371,
 'valid': False}

TABLE [1]
=========
{'encoding': 'utf-8',
 'error-count': 5,
 'format': 'csv',
 'headers': ['id',
             'name',
             'abbreviation',
             'other_names',
             'description',
             'classification',
             'parent_id',
             'founding_date',
             'dissolution_date',
             'image',
             'url',
             'jurisdiction_code',
             'email',
             'address',
             'contact',
             'tags',
             'source_url'],
 'row-count': 1005,
 'schema': 'table-schema',
 'scheme': 'file',
 'source': 'data/de.csv',
 'time': 0.369,
 'valid': False}
---------
[352,13] [type-or-format-error] The value "info@dw-world" in row 352 and column 13 is not type "string" and format "email"
[367,13] [type-or-format-error] The value "info@landkreistag" in row 367 and column 13 is not type "string" and format "email"
[603,13] [type-or-format-error] The value "trabold@ids-mannheim" in row 603 and column 13 is not type "string" and format "email"
[776,11] [type-or-format-error] The value "url" in row 776 and column 11 is not type "string" and format "uri"
[776,13] [type-or-format-error] The value "email" in row 776 and column 13 is not type "string" and format "email"

For lines 352, 367 and 603, it looks like it might be missing a ".de" suffix. Especially considering that, in all those cases, the uri field contains a domain that would match the domain of the e-mail if we added a ".de" TLD.

Line 776 seem to be just an error in submission or something. I suggest to just clear the data and leave these fields blank.

@rufuspollock, you contributed this file. Are you ok with the proposed fixes?

@augusto-herrmann augusto-herrmann added the Data Data sources and ingestion automation label Aug 22, 2018
@augusto-herrmann augusto-herrmann self-assigned this Aug 22, 2018
@augusto-herrmann
Copy link
Collaborator Author

Actually, line 776 looks like this:

de/name,name,,,description,classification,,,,,url,DE,email,contact,address,keywords,

It seems it was just a header that was incorrectly put in the middle of the file. Perhaps as a result of concatenating two different CSVs. I propose just to just delete the line.

augusto-herrmann added a commit to augusto-herrmann/publicbodies that referenced this issue Aug 22, 2018
rufuspollock added a commit that referenced this issue Aug 27, 2018
fix errors in data for Germany (#95)
@augusto-herrmann
Copy link
Collaborator Author

Issue fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data Data sources and ingestion automation
Projects
None yet
Development

No branches or pull requests

1 participant