Skip to content

Parses raw twitter JSON from stdin using python. I'm only extracting a few fields for quick processing in PIG. Still a lot of work to do. Currently, it extracts id, timestamp, client program, author, and tweet text. I'll add more fields such as geo, if requested. The filenames for the output and bad tweets are currently hardcoded for my testing.…

Notifications You must be signed in to change notification settings

beatgeek/tweetParser

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 

Repository files navigation

twitter parser
2010 18Data
license license license....
accepts twitter JSON from stdin and extracts tweet id, username, timestamp, client used, and tweet text
trying to keep the output lightweight for performance reasons and to quickly process in map/reduce environments
such as apache pig.
big to-do - override the default filenames for the output file and bad file.

About

Parses raw twitter JSON from stdin using python. I'm only extracting a few fields for quick processing in PIG. Still a lot of work to do. Currently, it extracts id, timestamp, client program, author, and tweet text. I'll add more fields such as geo, if requested. The filenames for the output and bad tweets are currently hardcoded for my testing.…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published