Skip to content

Text Content Grapher based on keyinfo extraction by NLP method。输入一篇文档,将文档进行关键信息提取,进行结构化,并最终组织成图谱组织形式,形成对文章语义信息的图谱化展示。

License

Notifications You must be signed in to change notification settings

QubitPi/TextGrapher

 
 

Repository files navigation

TextGrapher

Text Content Grapher based on keyinfo extraction by NLP method。输入一篇文档,将文档进行关键信息提取,进行结构化, 并最终组织成图谱组织形式,形成对文章语义信息的图谱化展示。

项目介绍

如何用图谱和结构化的方式,即以简洁的方式对输入的文本内容进行最佳的语义表示是个难题。本项目将对这一问题进行尝试, 采用的方法为:输入一篇文档,将文档进行关键信息提取,并进行结构化,并最终组织成图谱组织形式, 形成对文章语义信息的图谱化展示。  

Usage

Getting Source Code

git clone https://github.com/QubitPi/TextGrapher.git
cd TextGrapher

Creating Virtual Environment

conda update conda
conda config --append channels conda-forge
conda create --name textgrapher  --file requirements.txt
conda activate textgrapher

Installing pyltp

The entity extraction of TextGrapher is backed by pyltp which is the Python wrapper of ltp. To install pyltp:

git clone https://github.com/HIT-SCIR/pyltp
cd pyltp
git submodule init
git submodule update
python setup.py install
cd ..

Downloading Model

pyltp uses a pre-trained model to inference texts into graph raw data. The model can be downloaded at https://ltp.ai/download.html . At the time of writing, 3.4.0/ltp_data_v3.4.0.zip works so please download it, put it under TextGrapher directory, and rename the decompressed to ltp_data, which will be the default location for reading the model bay TextGrapher

unzip ltp_data_v3.4.0.zip
mv ltp_data_v3.4.0.zip ltp_data

Generating Graph from Texts

python text_grapher.py
open graph_show.html

When Done

conda deactivate

事件举例

中兴事件

image

魏则西事件 

image

雷洋事件 

image

同学杀人事件 

image

总结

1)如何用图谱和结构化的方式,即以简洁的方式对输入的文本内容进行最佳的语义表示是个难题。
2)本项目采用了高频词,关键词,命名实体识别,主谓宾短语识别等抽取方式,并尝试将三类信息进行图谱组织表示,这种表示方式是一种尝试。
3)命名实体识别以及关键信息抽取受限于NLP的性能,在算法和方式上还存在多处不足。

About

Text Content Grapher based on keyinfo extraction by NLP method。输入一篇文档,将文档进行关键信息提取,进行结构化,并最终组织成图谱组织形式,形成对文章语义信息的图谱化展示。

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%