Hey HN, I am the founder of Tensorlake. Prototyping LLM applications have become a lot easier, building decision making LLM applications that work on constantly updating data is still very challenging in production settings. The systems engineering problems that we have seen people face are -
1. Reliably process ingested content in real time if the application is sensitive to freshness of information.
2. Being able to bring in any kind of model, and run different parts of the pipeline on GPUs and CPUs.
3. Fault Tolerance to ingestion spike, compute infrastructure failure.
4. Scaling compute, reads and writes as data volume grows.
We are built and open sourced Indexify(https://github.com/tensorlakeai/indexify), to provide a compute engine and data frameworks to LLM applications that work on dynamic environments where data is updated frequently, or new data is constantly created.
Developers describe a declarative extraction graph, with stages that extract or transform unstructured data. Data passes from one stage to another, and end up finally at sinks like Vector Databases, Blob Stores or Structured DataStores like Postgres.
Examples -
1. Graph that does Video Understanding could be: Ingestion -> Audio Extraction -> Transcriptions -> NER and Embedding. And another path, Ingestion -> Key Frame Extraction -> Object and Scene Description (https://github.com/tensorlakeai/indexify/blob/main/docs/docs...)
2. Structured Extraction and Search on PDF: PDF -> Markdown -> Chunking -> Embedding, NER (https://github.com/tensorlakeai/indexify/blob/main/docs/docs...)
Application Layer - Indexify works as a retriever in the LLM application stack, so you can use it pretty easily with your existing applications. Call the retriever API over HTTP to get extracted data from Indexify, and that's pretty much all the integration you need to search or retrieve data.
You could use composable extractors and chain them together to build complex real time data pipelines that work with any unstructured data.
Since this is HN, I have the liberty to talk some technical details :)
How is it Real Time?
We built a replicated state machine with Raft to process 10s of 1000s of ingestion events every second. The storage and network layer is optimized for progressing the scheduler to create tasks under 2 milliseconds. The architecture of the scheduler is very similar to that of Google's Borg and Hashicorp's Nomad. The architecture we have can be extended to parallel scheduling on multiple machines and have a centralized sequencer like Nomad.
Storage Systems: Since the focus is unstructured data, we wanted to be able to support storing and extracting from large files and be able to scale horizontally as data volume grows. Indexify uses blob stores under the hood to store unstructured data. If a graph creates embeddings, they are automatically stored in Vector Stores, and structured data is stored in structured stores like Postgres. Under the hood we have Rust traits between the ingestion server and the data stores, so we can easily implement support for other vector stores.
Sync Vector and Structured Store - Indexify also syncs structured data with vector store, if it detects the presence of both in a graph. This allows to use pre-filtering capabilities to narrow down the search space for better results.
APIs - Indexify exposes semantic search APIs over vector store, and a read SQL queries over semi-structured data. We can automatically figure out the schema of the structured data and expose a SQL interface on top. Behind the scenes we parse SQL and have a layer which scans and reads databases to slice and dice the rows. So BI tools should work out of the box on extracted data.
We have Python and Typescript libraries to make it easy for people to build new or integrate into existing applications.
Thoughts? Would love to hear if you think this would be useful to what you are building!