Extensive model and notebook updates

samarthkeshari · Jul 3, 2020 · f5fb5ac · f5fb5ac
1 parent b00a3d1
commit f5fb5ac
Show file tree

Hide file tree

Showing 67 changed files with 10,667 additions and 5,041 deletions.
diff --git a/README.md b/README.md
@@ -1,8 +1,95 @@
 # CS224u: Natural Language Understanding
 
-Code for [the Stanford course](https://web.stanford.edu/class/cs224u/). The code is written to run under Python 3.7; [setup.ipynb](setup.ipynb) provides additional details.
+Code for [the Stanford course](https://web.stanford.edu/class/cs224u/).
+
+Fall 2020
 
 # Instructors
 
 * [Bill MacCartney](https://nlp.stanford.edu/~wcmac/)
 * [Christopher Potts](https://web.stanford.edu/~cgpotts/)
+
+
+# Core components
+
+
+## `setup.ipynb`
+
+Details on how to get set up to work with this code.
+
+
+## `tutorial_*` notebooks
+
+Introductions to Juypter notebooks, scientific computing with NumPy and friends, and PyTorch.
+
+
+## `torch_*.py` modules
+
+A generic optimization class (`torch_model_base.py`) and subclasses for GloVe, Autoencoders, shallow neural classifiers, RNN classifiers, tree-structured networks, and grounded natural language generation.
+
+`tutorial_pytorch_models.ipynb` shows how to use these modules as a general framework for creating original systems.
+
+
+## `np_*.py` modules
+
+Reference implementations for the `torch_*.py` models, designed to reveal more about how the optimization process works.
+
+
+## `vsm_*` and `hw_wordsim.ipynb`
+
+A until on vector space models of meaning, covering traditional methods like PMI and LSA as well as newer methods like Autoencoders and GloVe. `vsm.py` provides a lot of the core functionality, and `torch_glove.py` and `torch_autoencoder.py` are the learned models that we cover. `vsm_03_retroffiting.ipynb` is an extension that uses `retrofitting.py`.
+
+
+## `sst_*` and `hw_sst.ipynb`
+
+A unit on sentiment analysis with the [English Stanford Sentiment Treebank](https://nlp.stanford.edu/sentiment/treebank.html). The core code is `sst.py`, which includes a flexible experimental framework. All the PyTorch classifiers are put to use as well: `torch_shallow_neural_network.py`, `torch_rnn_classifier.py`, and `torch_tree_nn.py`.
+
+
+## `rel_ext*` and `hw_rel_ext.ipynb`
+
+A unit on relation extraction with distant supervision.
+
+
+## `nli_*` and `hw_wordentail.ipynb`
+
+A unit on Natural Language Inference. `nli.py` provides core interfaces to a variety of NLI dataset, and an experimental framework. All the PyTorch classifiers are again in heavy use: `torch_shallow_neural_network.py`, `torch_rnn_classifier.py`, and `torch_tree_nn.py`.
+
+
+## `colors*`, `torch_color_describer.py`, and `hw_colors.ipynb`
+
+A unit on grounded natural language generation, focused on generating context-dependent color descriptions using the [English Stanford Colors in Context dataset](https://cocolab.stanford.edu/datasets/colors.html).
+
+
+## `contextualreps.ipynb`
+
+Using pretrained parameters from [Hugging Face](https://huggingface.co) and [AllenNLP](https://allennlp.org) for featurization and fine-tuning.
+
+
+## `evaluation_*.ipynb` and `projects.md`
+
+Notebooks covering key experimental methods and practical considerations, and tips on writing up and presenting work in the field.
+
+
+## `utils.py`
+
+Miscellaneous core functions used throughout the code.
+
+
+## `test/`
+
+To run these tests, use
+
+```py.test -vv test/*```
+
+or, for just the tests in `test_shallow_neural_classifiers.py`,
+
+```py.test -vv test/test_shallow_neural_classifiers.py```
+
+If the above commands don't work, try
+
+```python3 -m pytest -vv test/test_shallow_neural_classifiers.py```
+
+
+## License
+
+The materials in this repo are licensed under the [Apache 2.0 license](LICENSE) and a [Creative Commons Attribution-ShareAlike 4.0 International license](https://creativecommons.org/licenses/by-sa/4.0/).
diff --git a/colors.py b/colors.py
@@ -5,25 +5,28 @@
 import matplotlib.patches as mpatch
 
 __author__ = "Christopher Potts"
-__version__ = "CS224u, Stanford, Spring 2020"
+__version__ = "CS224u, Stanford, Fall 2020"
 
 
 TURN_BOUNDARY = " ### "
 
 
 class ColorsCorpusReader:
- """Basic interface for the Stanford Colors in Context corpus:
+ """
+ Basic interface for the Stanford Colors in Context corpus:
 
  https://cocolab.stanford.edu/datasets/colors.html
 
  Parameters
  ----------
  src_filename : str
  Full path to the corpus file.
+
  word_count : int or None
  If int, then only examples with `word_count` words in their
  'contents' field are included (as estimated by the number of
  whitespqce tokens). If None, then all examples are returned.
+
  normalize_colors : bool
  The colors in the corpus are in HLS format with values
  [0, 360], [0, 100], [0, 100]. If `normalize_colors=True`,
@@ -43,7 +46,8 @@ def __init__(self, src_filename, word_count=None, normalize_colors=True):
  self.normalize_colors = normalize_colors
 
  def read(self):
- """The main interface to the corpus.
+ """
+ The main interface to the corpus.
 
  As in the paper, turns taken in the same game and round are
  grouped together into a single `ColorsCorpusExample` instance
@@ -72,7 +76,8 @@ def _word_count_filter(self, row):
 
 
 class ColorsCorpusExample:
- """Interface to individual examples in the Stanford Colors in
+ """
+ Interface to individual examples in the Stanford Colors in
  Context corpus.
 
  Parameters
@@ -81,6 +86,7 @@ class ColorsCorpusExample:
  This contains all of the turns associated with a given game
  and round. The assumption is that all of the key-value pairs
  in these dicts are the same except for the 'contents' key.
+
  normalize_colors : bool
  The colors in the corpus are in HLS format with values
  [0, 360], [0, 100], [0, 100]. If `normalize_colors=True`,
@@ -124,7 +130,8 @@ def __init__(self, rows, normalize_colors=True):
  self.speaker_context = self._get_reps_in_order('speaker')
 
  def parse_turns(self):
- """"Turns the `contents` string into a list by splitting on
+ """"
+ Turns the `contents` string into a list by splitting on
  `TURN_BOUNDARY`.
 
  Returns
@@ -135,7 +142,8 @@ def parse_turns(self):
  return self.contents.split(TURN_BOUNDARY)
 
  def display(self, typ='model'):
- """Prints examples to the screen in an intuitive format: the
+ """
+ Prints examples to the screen in an intuitive format: the
  utterance text appears first, following by the three color
  patches, with the target identified by a black border in the
  'speaker' and 'model' variants.
@@ -213,9 +221,10 @@ def _get_target_index(self, field):
 
  @staticmethod
  def _check_row_alignment(rows):
- """We expect all the dicts in `rows` to have the same
- keys and values except for the keys associated with the
- messages. This function tests this assumption holds.
+ """
+ We expect all the dicts in `rows` to have the same keys and
+ values except for the keys associated with the messages. This
+ function tests this assumption holds.
 
  """
  keys = set(rows[0].keys())