Skip to content
View yanghua's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@apachehudi-ci

Block or report yanghua

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Volcengine TOS Python SDK

Python 6 4 Updated Nov 4, 2024

Marks issues and pull requests that have not had recent interaction

TypeScript 1,369 364 Updated Oct 28, 2024

Processing engine and React components for constructing configuration-based data transformation and processing pipelines.

TypeScript 186 21 Updated Sep 16, 2024

An open-source RAG-based tool for chatting with your documents.

Python 16,745 1,284 Updated Nov 8, 2024

Supercharge Your LLM Application Evaluations 🚀

Python 7,164 729 Updated Nov 8, 2024

Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are commi…

Python 1,016 74 Updated Nov 8, 2024
Jupyter Notebook 328 49 Updated Jan 7, 2024

An easy-to-use framework for modular RAG

Python 289 44 Updated Nov 8, 2024

深度学习经典、新论文逐段精读

27,019 2,441 Updated Aug 8, 2024

Pythonic file-system interface for TOS(Tinder Object Storage)https://tosfs.readthedocs.io/en/latest/

Python 6 Updated Nov 9, 2024

Open source project for data preparation of LLM application builders

Jupyter Notebook 265 124 Updated Nov 8, 2024

Streaming WARC/ARC library for fast web archive IO

Python 383 58 Updated Nov 4, 2024

S3 Filesystem

Python 887 273 Updated Nov 6, 2024

A specification that python filesystems should adhere to.

Python 1,032 360 Updated Oct 31, 2024

10 Weeks, 20 Lessons, Data Science for All!

Jupyter Notebook 28,208 5,819 Updated Oct 15, 2024

The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

Python 9,692 750 Updated Oct 23, 2024

This is my personal template collection. Here you'll find templates, and configurations for various tools, and technologies.

HCL 4,694 1,525 Updated Nov 9, 2024

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Python 22,270 2,179 Updated Nov 8, 2024

OCR, layout analysis, reading order, table recognition in 90+ languages

Python 13,891 863 Updated Nov 7, 2024

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks

Python 5,433 454 Updated Nov 3, 2024

A modular graph-based Retrieval-Augmented Generation (RAG) system

Python 18,883 1,846 Updated Nov 9, 2024

🧑‍🚀 全世界最好的LLM资料总结 | Summary of the world's best LLM resources.

2,140 262 Updated Nov 9, 2024

A simple, high-throughput file client for mounting an Amazon S3 bucket as a local file system.

Rust 4,605 160 Updated Nov 8, 2024

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

HTML 9,049 748 Updated Nov 7, 2024

A next-generation crawling and spidering framework.

Go 11,221 595 Updated Nov 8, 2024

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

Python 4,209 136 Updated Nov 9, 2024

A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.

TypeScript 22,279 1,578 Updated Nov 9, 2024

Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.

592 33 Updated Aug 3, 2024

Open, Multi-modal Catalog for Data & AI

Java 2,396 383 Updated Nov 9, 2024

A Data Streaming Library for Efficient Neural Network Training

Python 1,133 142 Updated Nov 8, 2024
Next