Skip to content

nathanmwhite/hmong-medical-corpus

Repository files navigation

hmong-medical-corpus

A corpus of Hmong medical texts

This repository contains a number of files, primarily of three kinds:

  1. Code files for the corpus (to appear)
  2. Jupyter Notebook files that provide information on how the corpus is implemented, primarily for use on the Hmong Medical Corpus Blog at http:https://hmcorpus.home.blog/
  3. Supporting files for the corpus.

Given this format, it is currently recommended for potential users of the code to use Subversion (SVN) to download individual folders. In the longer term, the folders will be reorganized into separate code and raw data repositories.

Currently, the repository contains the following folders:

  • /blog -- supporting files for the Hmong Medical Corpus Blog, such as images
  • /corpus-docs -- processed docx files containing the POS-tagged text of the documents used in the Hmong Medical Corpus
  • /corpus_site -- to contain the code files for the corpus site
  • /data_processing -- to contain the scripts for processing the raw documents
  • /medical_corpus_finalized -- txt files containing the POS-tagged text for the Hmong Medical Corpus
  • /pos_tagger_interface -- code and supporting files for the Hmong Medical POS Tagger
  • /preprocessed_data -- zip file containing raw text portion of the Hmong Medical Corpus
  • /presentations -- Jupyter Notebook files providing information on implementation
  • /question_categorization -- code and ancillary files related to categorizing questions in Hmong

The Hmong Medical Corpus project. Copyright © 2019-2021 Nathan M. White. All rights reserved.

Releases

No releases published

Packages

No packages published