RIGA at SemEval 2023 Task 2: MultiCoNER II (Multilingual Complex Named Entity Recognition)

The repository describes RIGA team submission to MultiCoNER II.

Getting started

Create a new environment
```
python -m venv venv
```
Install dependencies
```
pip install -r requirements.txt
```
Now your environment is ready. Next step is get the data from MultiCoNER download page. Put the data in data directory.

Convert the data using parse_conll.py script

python parse_conll.py --source_path {specify a path to dataset in CoNNL format}

Start gathering context using get_context.py script. You'll need to specify your own API key and specifying the dataset split to use. You'll find a TODO comments in the file for a help
On step 5. each context is collected separately for easier navigation and not querying the same sentences multiple times in case of error.
On this step you'll need to merge all of them into a single file. Use merge_context.py script for this purpose. You'll also need to change the dataset split in order to merge contexts for all train/dev/test datasets.
The last step is NER model fine-tuning. You could run python train.py --help command to get all argument list. During the competition we used mainly either distilbert-base-uncased (66M parameters) or xlm-roberta-large models (558M parameters).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
get_context.py		get_context.py
merge_context.py		merge_context.py
parse_conll.py		parse_conll.py
requirements.txt		requirements.txt
train.py		train.py