Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: collection name everywhere #310

Merged
merged 19 commits into from
Aug 9, 2023

Conversation

jonasiwnl
Copy link
Contributor

@jonasiwnl jonasiwnl commented Jul 18, 2023

Description

This change allows the collection of the vectordb to be switched with the set_collection method on the Embedchain class. It also allows the initial collection to be specified using InitConfig, e.g. InitConfig(collection_name="collection1"). No new dependencies are required as this is a built-in feature of Chroma.

Fixes # (issue)

Issue 261

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • Documentation update

How Has This Been Tested?

This feature has been tested through 2 new tests added to tests/vectordb/test_chroma_db.py. It tests that the default collection name is correct (if no name is specified), a custom initial collection name, and switching the collection once the app has already been initialized.

  • TestChromaDbDefaultCollection
  • TestChromaDbSetCollection

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Maintainer Checklist

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Made sure Checks passed

@jonasiwnl jonasiwnl changed the title Feat/collection name everywhere feat: collection name everywhere Jul 18, 2023
@cachho cachho added the enhancement New feature or request label Jul 18, 2023
@cachho cachho linked an issue Jul 18, 2023 that may be closed by this pull request
@cachho
Copy link
Contributor

cachho commented Jul 18, 2023

Thanks for the PR, I especially appreciate the inclusion of unit tests. In #305, the configs were reworked. Can you resolve that merge conflict by using the new AppConfig, BaseAppConfig and OpenSourceAppConfig. Thank you!

@jonasiwnl
Copy link
Contributor Author

Sure, I'll get to work on that.

@jonasiwnl
Copy link
Contributor Author

jonasiwnl commented Jul 18, 2023

Hey, I think there is a missing dependency sentence_transformers. I'm getting an error when trying to initalize an opensourceapp that isn't related to my changes. I ran poetry install --with dev and all that. To recreate:

config = OpenSourceAppConfig()
app = OpenSourceApp(config)

after adding this dep in, all tests pass and the collection feature works as intended.

@cachho
Copy link
Contributor

cachho commented Jul 18, 2023

Hey, I think there is a missing dependency sentence_transformers. I'm getting an error when trying to initalize an opensourceapp that isn't related to my changes. I ran poetry install --with dev and all that. To recreate:

config = OpenSourceAppConfig()
app = OpenSourceApp(config)

after adding this dep in, all tests pass and the collection feature works as intended.

this is expected behavior, since we support open source models (and will support many more). If we make them a requirement everyone has to download them, even if they're just using OpenAI.

@jonasiwnl
Copy link
Contributor Author

Okay, fixed

pyproject.toml Outdated Show resolved Hide resolved
@cachho
Copy link
Contributor

cachho commented Jul 18, 2023

I did the merge conflicts for you.

@cachho cachho mentioned this pull request Jul 19, 2023
13 tasks
embedchain/config/apps/BaseAppConfig.py Outdated Show resolved Hide resolved
embedchain/vectordb/chroma_db.py Outdated Show resolved Hide resolved
@cachho
Copy link
Contributor

cachho commented Aug 2, 2023

  • unified argument order
  • fixed docs
  • set default name back to embedchain_store, otherwise this would break all existing collections
  • added tons of unit tests.

@cachho
Copy link
Contributor

cachho commented Aug 2, 2023

This has been thoroughly tested and is ready for production.

Here's my test script. Compared to the unit tests, this uses dry_run instead of count , but the result is the same, this might just be easier to understand.

from embedchain import App
from embedchain.config import AppConfig

naval_chat_bot = App()
naval_chat_bot.reset()
print("Reset app to run clean test")
naval_chat_bot = App()

# Embed Online Resources
naval_chat_bot.add("web_page", "https://nav.al/feedback")
naval_chat_bot.add("web_page", "https://nav.al/feedback")

# Embed Local Resources
naval_chat_bot.add_local("qna_pair", ("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."))

naval_chat_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?")
# Answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality.

# Test: Add them again, it should show that they are already existing.
naval_chat_bot.add("web_page", "https://nav.al/feedback")
naval_chat_bot.add_local("qna_pair", ("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."))
# Success

# Test: change database. It should allow to add again.
naval_chat_bot.set_collection('another_collection')
naval_chat_bot.add("web_page", "https://nav.al/feedback")
naval_chat_bot.add_local("qna_pair", ("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."))
# Success

# Test: One collection should not know anything that's in another collection
naval_chat_bot.set_collection('johns_collection')
naval_chat_bot.add('text','My name is John Doe')
naval_chat_bot.set_collection('janes_collection')
r = naval_chat_bot.query("What is my name?", dry_run=True)
print(r)
# Sucess, shows no context.

# Test: Should allow two apps at the same time. We also test setting a collection through the config.
app1 = App(config=AppConfig(collection_name='first_collection'))
app2 = App(config=AppConfig(collection_name='second_collection'))
# Add data to first collection
app1.add("text", "1")
print("App 1:", app1.count())
# Add data to second collection
app2.add("text", "1")
print("App 2:", app2.count())
# App 1 should be unchanged
print("App 1:", app1.count())

@taranjeet
Copy link
Member

@cachho : can we merge this PR?
have seen the code, looks good.
But there is one comment remaining from your side which needs to be reviewed/addressed or closed.
Let me know.

@cachho
Copy link
Contributor

cachho commented Aug 9, 2023

@cachho : can we merge this PR? have seen the code, looks good. But there is one comment remaining from your side which needs to be reviewed/addressed or closed. Let me know.

good to go

@taranjeet taranjeet merged commit eeac84e into mem0ai:main Aug 9, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Collection name everywhere
3 participants