Skip to content
This repository has been archived by the owner on May 29, 2020. It is now read-only.

Adds support for loading MASC annotations into a Slab (issue #19) #22

Merged
merged 6 commits into from
Jul 28, 2013
Merged

Adds support for loading MASC annotations into a Slab (issue #19) #22

merged 6 commits into from
Jul 28, 2013

Conversation

bethard
Copy link
Contributor

@bethard bethard commented Jul 26, 2013

Here's a first draft of the code for loading MASC annotations into Slabs. I called the object MascSlab and put it into MascUtil, but I don't feel strongly at all about either the name or location. The various functions for loading Masc annotations are named to match the files the load from (e.g. "s", "seg", "penn", "ne"). My first intuition was to name them by the annotations they produce ("sentence", "partOfSpeech", etc.) but (1) there is more than one way of getting each type of annotation and (2) some files should probably produce more than one annotation at the same time (e.g. the "penn" file contains both part-of-speech tags and lemmas).

I was able to reuse some of the code in MascUtil, thanks!

(I did have to write something a bit different for named entities though, and you may want to review your named entity loading code. It looks like it's only getting one token for each named entity. Take a look at the use of groupBy in MascSlab.ne to see what I mean.)

I added a single MASC-annotated file to src/test/resources for testing. I assume there aren't any issues with that, but thought I should mention it just in case.

jasonbaldridge added a commit that referenced this pull request Jul 28, 2013
Adds support for loading MASC annotations into a Slab (issue #19)
@jasonbaldridge jasonbaldridge merged commit f073dd1 into scalanlp:master Jul 28, 2013
@jasonbaldridge
Copy link
Member

Great, thanks! I'm glad some of the previous code was useful, and the approach to naming is a fine start.

I'm happy to ditch the old NE loading code. I only ever got that to the point of knowing I could train and eval a model -- basically, I had just gotten things working with MASC and then other duties took priority!

It's fine to add the MASC file to the tests. I'll just add a note in the NOTICE file mentioning MASC and the use of the file to be extra good.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants