Skip to content
Nicolay Rusnachenko edited this page Feb 22, 2023 · 5 revisions

AREkit (Attitude and Relation Extraction Toolkit) -- is a python toolkit, devoted to document level Attitude and Relation Extraction between text objects from mass-media news.

This toolkit aims to solve data preparation problems in Relation Extraction related taks, considiering such factors as:

  • 🔗 EL (entity-linking) API support for objects,
  • ➰ avoidance of cyclic connections,
  • 📏 distance consideration between relation participants (in terms or sentences),
  • 📑 relations annotations and filtering rules,
  • *️⃣ entities formatting or masking, and more.

Using AREkit you may focus on preparation and experiments with your ML-models by shift all the data-preparation part onto toolset of this project for: neural-networks, language-models, ChatGPT.

In order to do so, we provide:

  • 📁 API for external collection binding (native support of BRAT-based exported annotations)
  • pipelines and iterators for handling large-scale collections serialization without out-of-memory issues.
  • evaluators which allows you to assess your trained model.

AREkit is a very close to opensource framework SeqIO proposed by Google for data-preprocessing, evaluation, for sequence models. While SeqIO dedicated for conversion/pre-processing of datasets of any type, this project proposes pipelines creation from the very raw or preannotated (BRAT-based) texts, including the solutions for problems mentioned above.

The core functionality includes (1) API for document presentation with EL (Entity Linking, i.e. Object Synonymy) support for sentence level relations preparation (dubbed as contexts) (2) API for contexts extraction (3) relations transferring from sentence-level onto document-level, and more.

Clone this wiki locally