Wiki-dumps-word-counter

Counting word occurrence in hewiki dumps downloaded from https://dumps.wikimedia.org/hewiki/.
Using WikiExtractor to extract text from the XML dump, parsing each article with regular expressions to strip it from any non-hebrew characters.
Finally, writes results to csv.

Update

Added Python version in branch "python" for comparsion/benchmarking purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
java		java
.gitignore		.gitignore
README.MD		README.MD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wiki-dumps-word-counter

Update

About

Releases

Packages

Languages

evyatarmeged/wiki-dumps-word-counter

Folders and files

Latest commit

History

Repository files navigation

Wiki-dumps-word-counter

Update

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages