Skip to content
View vphill's full-sized avatar

Organizations

@end-of-term

Block or report vphill

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Fast, permanent and flexible patterns for sharing and computing on texts with metadata using Apache Arrow.

Python 14 Updated Mar 1, 2022

Simple command line oai-pmh harvester written in Python.

Python 41 18 Updated Sep 25, 2022

ScanTailor Advanced is the version that merges the features of the ScanTailor Featured and ScanTailor Enhanced versions, brings new ones and fixes.

C++ 1,171 128 Updated Sep 13, 2023

HAdoop-based Web Archive Record Processing

Arc 7 5 Updated Aug 29, 2016

A classifier for detecting soft 404 pages

Jupyter Notebook 15 3 Updated Sep 10, 2022

Soft 404 (dead page) detector in Python

Python 13 6 Updated Oct 1, 2018

Python library for reading and writing warc files, and processing their contents through Apache Tika

Python 3 Updated Mar 9, 2016

stand-alone coarse geocoder

JavaScript 314 35 Updated Aug 12, 2024

The ultimate Python library in building OAuth, OpenID Connect clients and servers. JWS,JWE,JWK,JWA,JWT included.

Python 4,531 452 Updated Sep 4, 2024

Python port of Mikolov's word2phrase.c from the word2vec toolkit

Python 112 20 Updated Apr 1, 2020

Book image cover cache

JavaScript 29 12 Updated Aug 18, 2024

Fixes mojibake and other glitches in Unicode text, after the fact.

Python 3,790 120 Updated Oct 11, 2024

Accurately separates a URL’s subdomain, domain, and public suffix, using the Public Suffix List (PSL).

Python 1,835 210 Updated Aug 27, 2024

Highlighting various OCR formats directly in Solr

HTML 83 13 Updated Oct 6, 2024

A WebGL viewer for UMAP or TSNE-clustered images

JavaScript 594 139 Updated Apr 15, 2023

Multi-layer Recurrent Neural Networks (LSTM, RNN) for word-level language models in Python using TensorFlow.

Python 1,302 494 Updated Oct 9, 2019

A collection of small corpuses of interesting data for the creation of bots and similar stuff.

JavaScript 4,929 1,302 Updated Feb 7, 2024

Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.

Python 1,272 97 Updated Dec 1, 2020

Torch implementation of DeepMask and SharpMask

Lua 3,113 508 Updated Jan 16, 2019

Tool for visual diffing YUV420 files

Rust 2 2 Updated Oct 13, 2017

Bash scripts to manage LTO cartridges with LTFS

Shell 39 8 Updated Jun 5, 2024

Detailed documentation is available here: https://ifiscripts.readthedocs.io/en/latest/index.html

Python 50 34 Updated Aug 7, 2020

Library for fast text representation and classification.

HTML 25,878 4,710 Updated Mar 22, 2024

A python implementation of the Rapid Automatic Keyword Extraction

Python 974 594 Updated Sep 4, 2020

A python implementation of the Rapid Automatic Keyword Extraction

Python 375 223 Updated Mar 14, 2018

Updates to Zope's keyphrase extractor (forked from 1.1.0)

Python 67 22 Updated Apr 28, 2017

An easy to use django app that provides Foursquare/Stack Overflow style badges

Python 44 21 Updated Jan 9, 2018

A bot that offers sympathy to people who have suffered paper cuts.

Python 17 4 Updated Oct 6, 2012

It is a jQuery plugin, provides an easy way to enable jk binding navigation on a page.

HTML 17 4 Updated Jun 7, 2016
Next