Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure proper encoding from scraped sources #49

Open
marfox opened this issue Jun 17, 2016 · 1 comment
Open

Ensure proper encoding from scraped sources #49

marfox opened this issue Jun 17, 2016 · 1 comment

Comments

@marfox
Copy link
Member

marfox commented Jun 17, 2016

Encountered typical 8-bit encodings interpreted as UTF-8:
elected at the Académie Française in 1816

iconv -f utf8 -t latin1 fixes that:
elected at the Académie Française in 1816

@marfox marfox added the major label Jun 17, 2016
@marfox marfox added this to the Production Corpus milestone Jun 17, 2016
@marfox marfox added minor and removed major labels Jun 17, 2016
@marfox
Copy link
Member Author

marfox commented Jun 17, 2016

Should be a minor issue for English, but not for other languages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants