You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, we are storing a new vector for each user, in this way:
User -> saves google.com -> first, old stuff is deleted -> vector id is google.com/#supermemory-${userid} (which is then reduced to less than 63 bytes using seededRandom()) -> The page content is chunked and each chunk is saved with it's own ID to the KV. a user: userID metadata is added for retrieval.
User 2 -> saves google.com -> duplicate is made and same after that.
Instead, we want to do it in this way, so that there's no duplicates:
User 1 -> saves google.com -> vector id is google.com (which is reduced to less than 63 bytes using seededrandom()) -> Page content is chunked and each chunk is saved with vectorid-chunkid to the KV. THIS TIME, a metadata with user_${userid}: 1 should be added.
User 2 -> we use the vectorize index to get it by ID
let ids = ["11", "22", "33", "44"]; // all chunk Ids, that we can get by doing list with prefix: seededRandom(url)
const vectors = await env.VECTORIZE_INDEX.getByIds(ids);
If found, simply do env.vectorize_index.upsert with the same vector BUT this time with updated metadata (for each chunk), with the user_${userid2}: 1 added to the json. We don't need to add new documents again. I think we don't even need to update the KV again (since KV is just a lookup of our seededRandom for Url)
NOTE: we also need to change the space adding logic, by adding new keys for each space. like space-userid-spaceid: 1 format.
We would also need to update the retrieval logic in /api/chat which will mostly be the same except this time the filter for spaceid and userid is different.
The text was updated successfully, but these errors were encountered:
This code is mainly here
supermemory/apps/cf-ai-backend/src/helper.ts
Line 82 in 5af20f7
Currently, we are storing a new vector for each user, in this way:
User -> saves google.com -> first, old stuff is deleted -> vector id is google.com/#supermemory-${userid} (which is then reduced to less than 63 bytes using seededRandom()) -> The page content is chunked and each chunk is saved with it's own ID to the KV. a user: userID metadata is added for retrieval.
User 2 -> saves google.com -> duplicate is made and same after that.
Instead, we want to do it in this way, so that there's no duplicates:
User 1 -> saves google.com -> vector id is google.com (which is reduced to less than 63 bytes using seededrandom()) -> Page content is chunked and each chunk is saved with vectorid-chunkid to the KV. THIS TIME, a metadata with
user_${userid}: 1
should be added.User 2 -> we use the vectorize index to get it by ID
let ids = ["11", "22", "33", "44"]; // all chunk Ids, that we can get by doing list with prefix: seededRandom(url)
const vectors = await env.VECTORIZE_INDEX.getByIds(ids);
If found, simply do env.vectorize_index.upsert with the same vector BUT this time with updated metadata (for each chunk), with the user_${userid2}: 1 added to the json. We don't need to add new documents again. I think we don't even need to update the KV again (since KV is just a lookup of our seededRandom for Url)
NOTE: we also need to change the space adding logic, by adding new keys for each space. like space-userid-spaceid: 1 format.
We would also need to update the retrieval logic in
/api/chat
which will mostly be the same except this time the filter for spaceid and userid is different.The text was updated successfully, but these errors were encountered: