Skip to content
View whaletail's full-sized avatar

Block or report whaletail

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The SQL/Ibis powered sklearn of record linkage

Python 14 3 Updated Oct 7, 2024

Minimalist Entity Linking

Jupyter Notebook 3 1 Updated Jun 19, 2024

EntiPy is a Python library that implements an incremental clustering approach to entity resolution.

Python 4 Updated May 21, 2024

This project implements a Named Entity Recognition (NER) system to identify and classify entities in text, such as PERSON, ORGANIZATION, and LOCATION. Utilizing machine learning and NLP techniques,…

Python 1 Updated Sep 28, 2024

Python library to perform entity linking over tabular data

Jupyter Notebook 1 Updated Oct 10, 2024

Python package for deduplication/entity resolution using active learning

Python 78 9 Updated Aug 24, 2024

Repository hosting the common code for the entity-fishing clients

Python 9 4 Updated May 21, 2024

Retrieve, Read and LinK: Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget (ACL 2024)

Python 305 15 Updated Oct 1, 2024

Hierarchical record linkage at scale

Python 12 2 Updated Oct 10, 2024

String-to-String Algorithms for Natural Language Processing

Jupyter Notebook 533 27 Updated Jul 26, 2024

Record linkage - simple, flexible, efficient.

Python 2 Updated Mar 24, 2024

Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources

Python 195 38 Updated Oct 5, 2024

Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.

Python 143 19 Updated Jan 25, 2024

Highly optimized search for similar multisets

Python 3 Updated Mar 6, 2024

Fast dictionary-based approach for semantic annotation / entity linking

Python 7 1 Updated Apr 30, 2024

Convert Unicode strings to nearest US ASCII equivalent by dropping accents, like manual entries into an old ASCII name database would.

Python 3 2 Updated Oct 8, 2024

Convert unicode to closest ASCII equivalent.

Python 8 1 Updated Sep 20, 2024

Text Normalization & Inverse Text Normalization

Python 456 67 Updated Sep 5, 2024

SEC Breach Notification

Python 3 Updated Aug 11, 2024

Parse SEC EDGAR HTML documents into a tree of elements that correspond to the visual (semantic) structure of the document.

Python 145 48 Updated Jul 13, 2024

The only open-source toolkit that can download EDGAR financial reports and extract textual data from specific item sections into nice and clean JSON files.

Python 284 79 Updated Oct 5, 2024

Download the SEC filings index from EDGAR since 1993

Python 333 80 Updated May 11, 2024

📈 Download filings from the SEC EDGAR database using Python

Python 487 136 Updated Jul 26, 2024

Download all companies periodic reports, filings and forms from EDGAR database.

Python 1,024 290 Updated Jul 20, 2024

Merging Data from UK Company House RDF databases and Wikidata using OWL2 and Python

Python 1 Updated Sep 3, 2024

Company name matching algos including edit distance and token matching.

Python 1 Updated Sep 21, 2024

Get the company name and current year of the BRSR report from its XBRL file

Python 1 Updated Sep 12, 2024

Link financial datasets using noisy company names.

Python 1 Updated Sep 7, 2024

Company Name Processor written in Python

Python 325 95 Updated May 15, 2024

The RecordLinker is a service that links records from two datasets based on a set of common attributes. The service is designed to be used in a variety of public health contexts, such as linking pa…

Python 2 Updated Oct 10, 2024
Next