Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : implement a spellcheck model (xfspell, t5-spellchecker, etc) #233

Open
walking-octopus opened this issue Jun 6, 2023 · 11 comments
Open
Labels
good first issue Good for newcomers help wanted Extra attention is needed model Model specific

Comments

@walking-octopus
Copy link

walking-octopus commented Jun 6, 2023

Apple had recently announced a new transformer-based keyboard auto-correct and prediction.

xfspell seems to be an existing model that tried doing it, so why not investigate if it can be ported to GGML. If anyone know other models for predictive keyboard or auto correct, please drop your suggestions here.

Perhaps this may even be a good test case for on-device QLoRA fine-tuning.

High quality predictive keyboards and auto-correct in pure C++ can be a useful thing for open-source mobile operating systems like Ubuntu Touch and privacy-focused Android ROMs, because traditionally, such proposals got rejected because of excessive dependencies for ML inference.

@ggerganov
Copy link
Owner

Great idea - we should do that!

@ggerganov ggerganov added help wanted Extra attention is needed good first issue Good for newcomers labels Jun 8, 2023
@walking-octopus
Copy link
Author

walking-octopus commented Jun 25, 2023

It seems there are other, less niche models for spelling correction, like t5-spellchecker or other BERT-based models. Since there's been some work on T5 and there BERT.cpp (which does not yet support decoding), unless this model outperforms it in quality, ease of implementation, or resource usage, efforts can be directed to these two.

@ggerganov ggerganov added the model Model specific label Jun 25, 2023
@ggerganov
Copy link
Owner

Ok, will add this to the roadmap to get some extra attention

@ggerganov ggerganov changed the title xfspell transformer - Model port idea ggml : add implement a spellcheck model (xfspell, t5-spellchecker, etc) Jun 25, 2023
@gessha
Copy link

gessha commented Jul 7, 2023

I would like to give this a try.

@SolsticeProjekt
Copy link

While I was trying to figure out how to convert a small pytorch based model to ggml, I've found this thread.

I wanted to emphasize that small models (sub 1gig) exist,
which provide great results for their specific tasks,
without requiring multiple gigabytes of storage space and memory.

Thank you.

@ggerganov ggerganov changed the title ggml : add implement a spellcheck model (xfspell, t5-spellchecker, etc) ggml : implement a spellcheck model (xfspell, t5-spellchecker, etc) Aug 21, 2023
@Ferruolo
Copy link

I would like to finish this implementation, do any of the people who have already attempted have any recommendations?

@lin72h
Copy link

lin72h commented Mar 27, 2024

@Ferruolo please go head!

@Ferruolo
Copy link

Ferruolo commented Apr 2, 2024

Should changes go to LLAMA.cpp or GGML?

@ggerganov
Copy link
Owner

Depends on the interface that will be exposed, but I suppose the ggml repo would be more suitable

@fairydreaming
Copy link

fairydreaming commented Jun 27, 2024

I checked t5-base-spellchecker and it works with #8141:

./llama-cli -m /mnt/md0/models/t5-base-spellchecker.gguf -p 'christmas is celbrated on decembr 25 evry ear'

...
llama_output_reserve: reallocating output buffer from size 0.13 MiB to 2.13 MiB
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: reallocating buffers automatically
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: reallocating buffers automatically
 christmas is celebrated on december 25 every year [end of text]

llama_print_timings:        load time =      44.46 ms
llama_print_timings:      sample time =       1.22 ms /    11 runs   (    0.11 ms per token,  9001.64 tokens per second)
llama_print_timings: prompt eval time =      59.88 ms /    18 tokens (    3.33 ms per token,   300.58 tokens per second)
llama_print_timings:        eval time =     140.48 ms /    10 runs   (   14.05 ms per token,    71.18 tokens per second)
llama_print_timings:       total time =     255.65 ms /    28 tokens
Log end

@Green-Sky
Copy link
Contributor

just posted it here ggerganov/llama.cpp#8204 , but there is now an example of deployed ggml spellchecking AND on-device finetuning !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed model Model specific
Projects
Status: Todo
Development

No branches or pull requests

8 participants