Generative search, active search, hierarchical search, selective search, all for better information retrieval.
As before, our past is complete chaos because of inadequate knowledge and time for us to learn and develop.
Find tools for searching our code, build a semantic chatbot to serve us, for hacking, earning and creating.
Planning: Run the code in sandbox environment, to let the agent make use of the code, write documents and save development time
refuse to generate detailed documentation on huge project at once. only generate for limited number of code within limited areas. others might be summarized and hyperlinked.
the search algorithm goes recursively, and will not pursue all computational power at once. it will improve based on feedback, and only search in chunks.
can put this to huggingface space
you need webllm, tensorflow.js or pytorch ported to web
prepare two versions (at least) of index.html
and scripts to load resources from different origin
web resources are limited. you can either prompt the llm to generate search keywords to enhance query, or use sentence transformers to semantically search instead, but not both (on conventional computers). you may design two different websites. or for more powerful ones, use both.
next you would search for: ai code repository documentation generation
Theoriotically it applies to any project that is messy and time consuming to document.
This can be a great challange for me and helpful feature for developers.
Also, your notes, your browsing history, and some agents being trained with your personal data.
it is been a long time that we have not fully reviewed our browsing history. we need to find out those influencers (like 老麦的工具库, kuxai), media sources and trace down everything they have posted in the past, not to miss any detail in order to develop the best information retrieval enhanced LLM, which can support our research and workflow, integrate into our projects like pyjom.
for your local files, i guess we need to setup some threshold that eliminates low priority indexes, otherwise taking too much space.
what makes things prioritized? usefulness. can be checked by ai itself or human queries.
we can setup some dynamic buffer space if something is not found in solid vector space, recurse till found.
Pyenv, virtualenv and conda/mamba are used for environment management. To store model cache other than default $HOME
(*nix) or %USERPROFILE%
(Windows) you may need to change the home directory of current user. Complex solutions such as overlayfs may apply but not preferred.
To install "compiled" packages like hnswlib on windows you may need mingw2 (follow the installation guide of privateGPT) or conda:
conda install -c conda-forge -n prometheous hnswlib
conda install libpython m2w64-toolchain -c msys2
You may want to use other models. There are currently a lot models.
WizardLM: for coding
Awesome-LLM-for-RecSys: for recommendation, may assist our "pyjom" project or help models to explore more
more ai related news/info can be found on kuxai
serve custom models with openai api compatible server:
langchain: the mother of all agent gpts
autogpt: making gpt into agi
promptify: structural gpt output
when using these retrieval based models, we need to provide more context. we need to know what the content is (more than filename), how it was retrieved (more than timestamp) and some brief. though these can be achieved by some genetic prompt generation algorithm.
localGPT: inspired by privateGPT
h2ogpt with alternatives
haystack: information retrieval toolkit
autodoc: generate documentation in codebase
Bookmarks, important code repositories, text files within given suffix and sizes, notes, weblinks, even our actions and clipboards. The bot's actions need to be recorded by its demand.
Assist us by chatting, executing commands in sandboxes and receiving feedback from us and the internet (we may predefine some rewards)
The bot needs to learn and modify itself.
You might want a bigger SSD for storing all these data. It is simply overwhelming. Also for faster searching and training, you may want bigger RAM.