Stars
kylebgorman / citylex
Forked from CUNY-CL/citylexAn English lexical database from the Big 🍎, let's go Mets baby love da Mets
A library for data streaming and augmentation
This repository contains an extension of fairseq for pixel / visual representations for machine translation.
A toolkit to create, launch and monitor SLURM jobs over existing python scripts.
remote pbcopy over ssh
A tool for holistic analysis of language generations systems
Open information and community for machine translation
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
Code and data for the IWSLT 2022 shared task on Formality Control for SLT
Hackable and optimized Transformers building blocks, supporting a composable construction.
Learned string similarity for entity names using optimal transport.
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).
State-of-the-Art Text Embeddings
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents"
🔍 AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your da…
Models, data loaders and abstractions for language processing, powered by PyTorch
A data augmentations library for audio, image, text, and video.
OPUS-CAT is a collection of software which make it possible to OPUS-MT neural machine translation models in professional translation. OPUS-CAT includes a local offline MT engine and a collection of…
skweak: A software toolkit for weak supervision applied to NLP tasks
qurator-spk / ocrodeg
Forked from NVlabs/ocrodegdocument image degradation
A Unified Toolkit for Deep Learning Based Document Image Analysis
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.