-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Spanish language #1
Conversation
I have one guess. Make sure you have es.tar.gz BOTH in optional_languages and autocorrect/data. Speller first looks for it in autocorrect/data, and if it's not there, it tries to download from optional_languages on master. It's not merged to master yet so it will fail. |
Before this PR, my first (local) attempt was to put es.tar.gz into data folder in order to test
but the output was hloa istead of hola (Spanish hello word). README.md file suggest to use
for getting Wikipedia Spanish words. I think that;s incorrect due I'm adding Spanish language. I changed by
That's the only change I did in the process of adding new language. |
Ah, ok. The issue is probably, that the word 'hloe' exists in wikipedia, so the Speller doesn't try to correct it. The way I fixed it for other languages, was to cut out rarely used words. You can do it by calling for example:
To use only words which appeared at least 4 times in wikipedia. You'll have to find the right threshold value empirically. After that, you can manually delete all those rare words from the file in es.tar.gz (it's already sorted so it should be easy). |
With the new threshold... Original number of words: 12196114 I'm not really sure if it;s a lot but I tested some words and the Speller does not work properly with fewer threshold values. |
For other languages I set it smaller, like 4, but I think that Spanish has less variants of the same words, and also Spanish wiki is probably larger. So as long as it works fine on unit tests it's fine. |
I noticed es.tar.gz isn't stored in LFS, and I'd like to avoid bloating repo size. It probably happened because you forked before I set it up. You should be able to migrate it to LFS by running:
And then force push. |
It turned out LFS has a 1GB limit, after that it's paid and I've used up almost all of it. Also, there is no way to delete old, unnecessary files! :c I'll have to find some other way to store those tar.gz's. Storing them as regular files, without LFS is even worse, because there is a 500MB limit. I'll probably just put them in google drive. If you know of some better way let me know :) |
🤔 Google Drive or any other server you have (HTTP or FTP). Good luck with that 🤞 |
Hi, I can't download es.tar.gz anymore, so could you mail it to me to [email protected]? I will add it to my google drive. |
I think you can downloading it from here:
https://github.com/pr3ssh/autocorrect/tree/master/optional_languages
Pablo Martín
Director tecnológico
@pr3ssh
…On Thu, Jun 11, 2020, 15:13 Filip Sondej ***@***.***> wrote:
Hi, I can't download es.tar.gz anymore, so could you mail it to me to
***@***.***? I will add it to my google drive.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADCW434C3NHTTYXPR3APWDRWDJ6LANCNFSM4NP5DXKQ>
.
|
I can't, when I follow the link, it only gives LFS reference:
|
ACK
I'll send you ASAP. I had a problem with my laptop and yesterday I lost all
my local data.
Pablo Martín
Director tecnológico
@pr3ssh
…On Thu, Jun 11, 2020, 23:21 Filip Sondej ***@***.***> wrote:
I can't, when I follow the link, it only gives LFS reference:
version https://git-lfs.github.com/spec/v1
oid sha256:cad1ce706de6f7f84e420ece653af8d0ade59774c9bab12cdb0350e8f3b1a32a
size 1757679
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADCW43OXDUZ3VALZC4WPK3RWFDFFANCNFSM4NP5DXKQ>
.
|
OK, no hurry. Sorry for the lost data. Did you loose this tar.gz too? |
Yes, the entire /home partition 😭
Pablo Martín
Director tecnológico
@pr3ssh
…On Fri, Jun 12, 2020, 02:01 Filip Sondej ***@***.***> wrote:
OK, no hurry. Sorry for the lost data. Did you loose this tar.gz too?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADCW4YIAEAPHJZCMFQUCO3RWFV5TANCNFSM4NP5DXKQ>
.
|
:< I know this pain, happened to me last month too |
I merged your changes in ec15a64 instead of merging this pull request, to avoid adding es.tar.gz to the repo. I added that es.tar.gz you sent me to google drive. Thank you for contributing :) |
@fsondej it was a pleasure ;) |
I added es.tar.gz as an optional languages and also added unit tests strings but for some reason the Speller does not work properly.
For creating es.tar.gz, I folloewd the steps that appears on README file.
Any idea what can be wrong?