GitHub - lukas235/CS5012-p1-HMM: Part of Speech Tagger for NLP using Hidden Markov Models

lukas235 / CS5012-p1-HMM Public

Notifications You must be signed in to change notification settings
Fork 0
Star 1

Part of Speech Tagger for NLP using Hidden Markov Models

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
results		results
.gitignore		.gitignore
CS5012_p1_POS.pdf		CS5012_p1_POS.pdf
readme.txt		readme.txt
starter.py		starter.py
tagger.py		tagger.py
tagset.py		tagset.py

Repository files navigation

Readme
----------------
In order to run the code from the command-line, follow the subsequent steps:

1. Open a console window and navigate into the directory that contains the .py-files of the project.

2a. Type "python starter.py" in the console windows (without quotations). This will run the script with the default settings i.e. a training set of the first 50.000 words of the corpus, a full tagset and the subsequent 500 sentences as a test set

2b. In order to start the script with arguments, type "python starter.py a b c d e" into the console, where

The 5 parameters:
a: is for selecting the corpus: 0 for brown, 1 for alpino, 2 for cess_esp
b: is the size of the training set (in sentences) (e.g. 50000 for the first 50.000 sentences)
c: is the selection of the tagset (0 is the default (full) tagset); other tagset settings which were used for the tagset experiments for the brown corpus can be found in the "tagset.py" file. (e.g. 1 merges all the compound tags)
d: start sentence of the test set (e.g. 50000)
e: last sentence of the test set (e.g. 50500)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

lukas235/CS5012-p1-HMM

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages