Skip to content

(Ongoing module in development) Getting Wikipedia articles parsed content. Created for getting text corpuses data fast and easy. But can be freely used for other purpuses too

License

Notifications You must be signed in to change notification settings

Affenmilchmann/lingwiki

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lingwiki

This module will allow you to download parsed content from wikipedia articles.

Article return format

dict
{
    'title': 'Alan Turing',
    'paragraphs': ['Alan Mathison Turing OBE FRS (/ˈtjʊərɪŋ/; 23 June 1912 – 7 June 1954) was an English mathematician...', ... ],
    'images': ['image link1', 'image link2'...],
    'links': ['link1', 'link2'...],
}

Done:

  • get_article() - get article's content using url.

In progress

  • article_flow() - get multiple articles content by urls. It threads downloading. Acceppts as input iterable object or a text file.

ToDo

  • get_rand_article() - same as get_article() but returns random one. You can choose the article's language.
  • rand_article_flow() - You get the point.

About

(Ongoing module in development) Getting Wikipedia articles parsed content. Created for getting text corpuses data fast and easy. But can be freely used for other purpuses too

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages