Slow index with TF-IDF #476

pkiraly · 2021-03-11T16:11:28Z

At Göttingen we have a collection of roughly 10K scientific papers. Most of them have keywords. Based on these keywords we've created a flat .ttl file (no hierarchy among terms), and create a TF-IDF model out of the vocabulary and the content of the document. Then I we run annif index ... on the same collection. For each files it takes 5 minutes to create a .tsv file.

I contact with @osma, who suggested to try Omikuji. It helped, the full index took 35 mins instead of the predicted 35 days.

There must be some issue with the TF-IDF engine.

The text was updated successfully, but these errors were encountered:

osma added the bug label Mar 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow index with TF-IDF #476

Slow index with TF-IDF #476

pkiraly commented Mar 11, 2021

Slow index with TF-IDF #476

Slow index with TF-IDF #476

Comments

pkiraly commented Mar 11, 2021