openideo-data-processing-pipeline

This folder holds all the code that I'm currently using to either pre-process or post-process data, and some code for "data collection" (e.g., downloading HTML files, extracting comments, genealogies, etc.). I'm still working out the details for the structure of the pipeline, but the code here handles:

extracting genealogies from pairwise citation data
extracting comments from HTML files
tokenization of text
feature selection and input file preparation for semantic models
searching the feature space (using gensim) for LSA and LDA
similarity queries for semantic models
some random R code for post-processing and exploring the data

More to come...

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
analyses		analyses
data-collection		data-collection
ipython-notebooks		ipython-notebooks
model-post-processing		model-post-processing
semantic-models		semantic-models
sim-queries-and-experiments		sim-queries-and-experiments
text-preprocessing		text-preprocessing
validation		validation
README.md		README.md
masterdoclist.txt		masterdoclist.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

openideo-data-processing-pipeline

About

Releases

Packages

Languages

joelchan/openideo-data-processing-pipeline

Folders and files

Latest commit

History

Repository files navigation

openideo-data-processing-pipeline

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages