Stars
- All languages
- Assembly
- AutoHotkey
- Awk
- Batchfile
- C
- C#
- C++
- CSS
- Clojure
- Diff
- Dockerfile
- FreeBasic
- Gherkin
- Go
- HTML
- Java
- JavaScript
- Julia
- Jupyter Notebook
- M
- Markdown
- Mustache
- OCaml
- Objective-C
- PHP
- Perl
- PowerShell
- Python
- R
- Roff
- Ruby
- Rust
- SCSS
- Shell
- TeX
- TypeScript
- VBA
- Vim Script
- Visual Basic
- Visual Basic .NET
- Visual Basic 6.0
EntiPy is a Python library that implements an incremental clustering approach to entity resolution.
This project implements a Named Entity Recognition (NER) system to identify and classify entities in text, such as PERSON, ORGANIZATION, and LOCATION. Utilizing machine learning and NLP techniques,…
Python library to perform entity linking over tabular data
Python package for deduplication/entity resolution using active learning
Repository hosting the common code for the entity-fishing clients
Retrieve, Read and LinK: Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget (ACL 2024)
String-to-String Algorithms for Natural Language Processing
Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources
Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.
Fast dictionary-based approach for semantic annotation / entity linking
Convert Unicode strings to nearest US ASCII equivalent by dropping accents, like manual entries into an old ASCII name database would.
Text Normalization & Inverse Text Normalization
Parse SEC EDGAR HTML documents into a tree of elements that correspond to the visual (semantic) structure of the document.
The only open-source toolkit that can download EDGAR financial reports and extract textual data from specific item sections into nice and clean JSON files.
Download the SEC filings index from EDGAR since 1993
📈 Download filings from the SEC EDGAR database using Python
Download all companies periodic reports, filings and forms from EDGAR database.
Merging Data from UK Company House RDF databases and Wikidata using OWL2 and Python
Company name matching algos including edit distance and token matching.
Get the company name and current year of the BRSR report from its XBRL file
The RecordLinker is a service that links records from two datasets based on a set of common attributes. The service is designed to be used in a variety of public health contexts, such as linking pa…