(Ongoing module in development) Getting Wikipedia articles parsed content. Created for getting text corpuses data fast and easy. But can be freely used for other purpuses too
-
Updated
Jan 3, 2023 - Python
(Ongoing module in development) Getting Wikipedia articles parsed content. Created for getting text corpuses data fast and easy. But can be freely used for other purpuses too
Code and data for the paper 'Unsupervised Word Polysemy Quantification with Multiresolution Grids of Contextual Embeddings'
A desktop application that searches through a set of Wikipedia articles using Apache Lucene.
Some Faroese language statistics taken from fo.wikipedia.org content dump
Builds Wikipedia corpora in I5 (a TEI-based format)
A Search Engine built based on Wikipedia dump of 75GB. Involves creation of Index file and returns search results in real time
Clustering of Spanish Wikipedia articles.
A search engine trained from a corpus of wikipedia articles to provide efficient query results.
Create a wiki corpus using a wiki dump file for Natural Language Processing
RNN model trained from wikipedia corpus
📚 A Kotlin project which extracts ngram counts from Wikipedia data dumps.
Wiki dump parser (jupyter)
Convert Wikipedia XML dump files to JSON or Text files
Python script to split the text generated by 'wikipedia parallel title extractor' into separate text files (separate file for each language)
Interactive chatbot using python :)
Command line tool to extract plain text from Wikipedia database dumps
Repositório para disponibilização de bases de dados do Wikipedia e Simple Wikipedia pré-processadas, além de scripts de pré-processamento e geração de bases em Python.
IR search Engine for Wikipedia app
Add a description, image, and links to the wikipedia-corpus topic page so that developers can more easily learn about it.
To associate your repository with the wikipedia-corpus topic, visit your repo's landing page and select "manage topics."