Stars
小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Concurrently chat with ChatGPT, Bing Chat, Bard, Alpaca, Vicuna, Claude, ChatGLM, MOSS, 讯飞星火, 文心一言 and more, discover the best answers
This is a CoNLL formatted version of the OntoNotes 5.0 release.
EmbedRank: Unsupervised Keyphrase Extraction using Sentence Embeddings (official implementation)
CogDL: A Comprehensive Library for Graph Deep Learning (WWW 2023)
📘《Python进阶》(Intermediate Python - Chinese Version)
"狗屁不通文章生成器" (https://github.com/menzi11/BullshitGenerator) 的 Telegram Bot 移植版
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
超轻量级中文ocr,支持竖排文字识别, 支持ncnn、mnn、tnn推理 ( dbnet(1.8M) + crnn(2.5M) + anglenet(378KB)) 总模型仅4.7M
A full Python Implementation of the ROUGE Metric (not a wrapper)
Calculating ROUGE score between two files (line-by-line)
These file are for hunting multiple bumps in graphs
BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)
Natural Language Processing notes and implementations.
Open Source Neural Machine Translation and (Large) Language Models in PyTorch
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、…
Topic-Aware Convolutional Neural Networks for Extreme Summarization
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.