Skip to content

Commit

Permalink
updated instructions for customizing configuration file
Browse files Browse the repository at this point in the history
  • Loading branch information
ashpjin committed Feb 2, 2013
1 parent 460daa8 commit 5c10e4d
Show file tree
Hide file tree
Showing 2 changed files with 37 additions and 21 deletions.
34 changes: 20 additions & 14 deletions README
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ README for Termite, a topic model visualization tool.
INFORMATION
---------------
Termite is a visualization tool for inspecting the output of statistical
topic models, and based on the techniques described in the following publication:
topic models based on the techniques described in the following publication:

Termite: Visualization Techniques for Assessing Textual Topic Models
Jason Chuang, Christopher D. Manning, Jeffrey Heer
Expand Down Expand Up @@ -86,6 +86,7 @@ downloaded onto a new machine.

Libraries fetched include:
: mallet
: stmt
: closure-compiler.js
: d3.js
: html5slider.js
Expand All @@ -101,14 +102,16 @@ using a web browser, described in the next section.

Customize configuration file with the following information. A sample configuration
file can be found in 'example.cfg'
: path to text corpus
: output directory for holding files generated by Termite
: number of topics to train
: number of terms to seriate
: topic model (either mallet or stmt)
[Corpus] : path to text corpus
[TopicModel]: directory for holding topic model outputs
[TopicModel]: number of topics to train
[TopicModel]: topic model (either mallet or stmt)
[Termite] : number of terms to seriate
[Termite] : path to save Termite-internal working files

Process the text corpus, and build a topic model by running the execution script.
>> ./execute.py file.cfg
Execution time will vary depending on the size of the corpus.
>> ./execute.py <your_config_file.cfg>

The execution script calls in order:
1. tokenize.py Tokenize the text corpus
Expand All @@ -125,17 +128,20 @@ The execution script calls in order:
----------------------------
You are now ready to visualize the topic model outputs! Termite's output can be viewed in
a web browser. To view the files locally (on your own computer), you need to set up a local
web server. Alternatively, you may copy the output file to a web server to publish the results.
web server. Alternatively, you may copy the output folder to a web server to publish the results.

Termite outputs are stored in the 'public_html' subfolder within the output directory.

To set up a local webserver:
1. Change into output directory (specified in the configuration file)
>> cd <output_folder>/public_html
2. Start a local server using python
>> ./web.sh
4. Open http:https://localhost:8888 in a modern web browser (Chrome, Safari, Firefox, or Opera)
to view a visualization of the model outputs.
1. Change into output directory (specified in the configuration file)
>> cd <output_folder>/public_html
2. Start a local server using python
>> ./web.sh
3. Open http:https://localhost:8888 in a modern web browser (Chrome, Safari, Firefox, or Opera)
to view a visualization of the model outputs.

To publish the results on a webserver:
1. Copy public_html directory to your remote server.

-----------------------------
TOPIC MODEL VISUALIZATION
Expand Down
24 changes: 17 additions & 7 deletions example.cfg
Original file line number Diff line number Diff line change
@@ -1,36 +1,46 @@
# One document per file, at the specified path

[Corpus]

# Currently only support one format: file
# In the future: file, folder, lucene

format = file
path = ../data/corpus/alice.txt
path = corpus/example-documents.txt

# -----------------------------------------------------------------------------

# Two topic models
[TopicModel]

# Two topic models
# Supported libraries: mallet, stmt
library = mallet
; library = stmt

# Path to save topic model outputs
path = ../data/alice-20/topic-model
path = output/example-project/topic-model

# Number of topics to train
num_topics = 20

# -----------------------------------------------------------------------------

[Termite]

# Currently only support one format: file
# In the future: file, database
format = file

# Path to save Termite-internal working files
path = ../data/alice-20
path = output/example-project

# Number of terms to seriate
number_of_seriated_terms = 400

# Miscellaneous program configurations
# -----------------------------------------------------------------------------

[Misc]

# Miscellaneous program configurations

;logging = 10 # Display all debug messages
;logging = 20 # Display info messages
;logging = 30 # Display only warnings
Expand Down

0 comments on commit 5c10e4d

Please sign in to comment.