Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sidekick - whole project files view #46

Open
hanselke opened this issue Jul 17, 2023 · 9 comments
Open

sidekick - whole project files view #46

hanselke opened this issue Jul 17, 2023 · 9 comments

Comments

@hanselke
Copy link
Contributor

hanselke commented Jul 17, 2023

i believe using a vectorDB to store the files, then only search for relevant portions would work to get around the input context limit.

https://python.langchain.com/docs/modules/chains/additional/vector_db_text_generation

Could break it down into multiple queries, probably using something like the chain of thoughts to break up different parts. seems like what the community is doing.

falcon-40b-code seems interesting. the encoder can be used standalone to run that vector search, before piping that output to the decoder.

@TechNickAI
Copy link
Owner

TechNickAI commented Jul 17, 2023

Thanks for the input!

As a neat coincidence, I started playing around with vector stores and embeddings tonight, with a new learn command that will allow you to feed in external repos.

7f5ae4a

It's not working for large repos. Hitting API limits for # of tokens with OpenAI. I'd like to explore a local embedding solution anyway.

Once I get the learn command working, I'll look into having sidekick use a similar approach for the local repo

@TechNickAI
Copy link
Owner

I'm also exploring using ctags to pass along as context

@hanselke
Copy link
Contributor Author

hanselke commented Jul 18, 2023 via email

@TechNickAI
Copy link
Owner

Still working on this.

In the meantime, I added commands to the sidekick prompt, so you can add / drop files for context, without restarting.

/add file
/drop file
/files # list files

@TechNickAI
Copy link
Owner

FYI - I got aicodebot learn working.

And then built a sidekick-agent command that accepts a --learned-repo $repo argument.

But the results are currently terrible, because the resulting text from the VectorStore is summarized text that answers a question, but it doesn't actually give the code/documents.

I think the next thing to try is to have a custom Vector Store tool that searches the vector store database, returns the relevant code, and returns that for use by the LLM.

@hanselke
Copy link
Contributor Author

mm think i see your problem.

i think it would be best resolved by breaking up code into functions. one function per file seems like the easy prototype.

Then you would probably want to have a 'fill context' method that, tries to add all the dependent functions once you can identify the functions that matter.

Once that limit is hit, multi shot approaches seems like the way forward.

"What other functions do you need information on?"

@TechNickAI
Copy link
Owner

Makes sense, and I'm headed in that direction

In the meantime, the current aicodebot sidekick command is pretty good, because you can add files for context. Today I put a lot of energy into making that a better process, including the ability to add/drop files cleanly, a better management of the token economy, etc.

My current set up is that I use openrouter.ai so that I can use gpt4-32k. With this amount of context, I'm able to solve most of my programming tasks very efficiently.

I'll sit down to solve a problem, think about what files are going to be needed to do it, and load those in, then chat with those files.

@hanselke
Copy link
Contributor Author

hanselke commented Jul 28, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@TechNickAI @hanselke and others