Name		Name	Last commit message	Last commit date
parent directory ..
conf		conf
data		data
w2ner		w2ner
README.md		README.md
README_CN.md		README_CN.md
predict.py		predict.py
run_bert.py		run_bert.py
run_lstmcrf.py		run_lstmcrf.py

README.md

Easy Start

English | 简体中文

Requirements

python3

pip install -r requirements.txt

Download Code

git clone https://github.com/zjunlp/DeepKE.git
cd DeepKE/example/ner/standard

Install with Pip

Create and enter the python virtual environment.
Install dependencies: pip install -r requirements.txt.

Train and Predict

Dataset
- Download the dataset to this directory.
```
wget 120.27.214.45/Data/ner/standard/data.tar.gz
tar -xzvf data.tar.gz
```
- Three types of data formats are supported，including json,docx and txt. The dataset is stored in data：
  - train.txt: Training set
  - valid.txt : Validation set
  - test.txt: Test set
Training
- Parameters for training are in the conf folder and users can modify them before training.This task supports multi card training. Modify trian.yaml's parameter 'use_multi_gpu' to true.'os.environ['CUDA_VISIBLE_DEVICES']' in 'run.py' set to the selected gpus. The first card is the main card for calculation, which requires a little more memory.
- With BERT, it is recommended that the batch size is no less than 64.
- Logs for training are in the log folder and the trained model is saved in the checkpoints folder.
```
python run_bert.py or python run_crflstm.py
```
- W2NER(The new state-of-the-art ner model, which involvs with three major types, including flat, overlapped (aka. nested), and discontinuous NER.)
```
cd w2ner  
python run.py
```
Prediction

Chinese datasets are supported by default. If English datasets are used, 'nltk' need to be installed and download the corresponding vocabulary by running 'nltk.download('punkt')'. Meanwhile before prediction, 'lan' in config.yaml also need to be set en.
```
python predict.py
```

Model

BiLSTM + CRF

BERT

W2NER

Prepare weak_supervised data

If you only have text data and corresponding dictionaries, but no canonical training data.

You can get weakly supervised formatted training data through automated labeling methods.

Please make sure that:

Provide high-quality dictionaries
Enough text data

prepare-data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

standard

standard

README.md

Easy Start

Requirements

Download Code

Install with Pip

Train and Predict

Model

Prepare weak_supervised data

Files

standard

Directory actions

More options

Directory actions

More options

Latest commit

History

standard

Folders and files

parent directory

README.md

Easy Start

Requirements

Download Code

Install with Pip

Train and Predict

Model

Prepare weak_supervised data