Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Found duplicated primary key value Summer founders program, which violates the uniqueness constraint of the primary key column.[Bug]: #13754

Open
andysingal opened this issue May 27, 2024 · 1 comment
Labels
bug Something isn't working kuzu triage Issue needs to be triaged/prioritized

Comments

@andysingal
Copy link

Bug Description

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-11-9642f26cc04c>](https://localhost:8080/#) in <cell line: 6>()
      4 
      5 # NOTE: can take a while!
----> 6 index = KnowledgeGraphIndex.from_documents(
      7     documents,
      8     max_triplets_per_chunk=2,

8 frames
[/usr/local/lib/python3.10/dist-packages/kuzu/connection.py](https://localhost:8080/#) in execute(self, query, parameters)
    132         else:
    133             prepared_statement = self.prepare(query) if isinstance(query, str) else query
--> 134             _query_result = self._connection.execute(prepared_statement._prepared_statement, parameters)
    135         if not _query_result.isSuccess():
    136             raise RuntimeError(_query_result.getErrorMessage())

RuntimeError: Runtime exception: Found duplicated primary key value Summer founders program, which violates the uniqueness constraint of the primary key column.

Version

0.10.14

Steps to Reproduce

https://docs.llamaindex.ai/en/stable/examples/index_structs/knowledge_graph/KuzuGraphDemo/

only added

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay1.txt'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay2.txt'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay3.txt'

Relevant Logs/Tracbacks

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-11-9642f26cc04c>](https://localhost:8080/#) in <cell line: 6>()
      4 
      5 # NOTE: can take a while!
----> 6 index = KnowledgeGraphIndex.from_documents(
      7     documents,
      8     max_triplets_per_chunk=2,

8 frames
[/usr/local/lib/python3.10/dist-packages/kuzu/connection.py](https://localhost:8080/#) in execute(self, query, parameters)
    132         else:
    133             prepared_statement = self.prepare(query) if isinstance(query, str) else query
--> 134             _query_result = self._connection.execute(prepared_statement._prepared_statement, parameters)
    135         if not _query_result.isSuccess():
    136             raise RuntimeError(_query_result.getErrorMessage())

RuntimeError: Runtime exception: Found duplicated primary key value Summer founders program, which violates the uniqueness constraint of the primary key column.
@andysingal andysingal added bug Something isn't working triage Issue needs to be triaged/prioritized labels May 27, 2024
Copy link

dosubot bot commented May 27, 2024

To resolve the RuntimeError related to a duplicated primary key value "Summer founders program" when creating a KnowledgeGraphIndex from documents using the llama_index package, you need to handle duplicates in the document store. You can use the upsert_triplet_and_node method from the KnowledgeGraphIndex class to handle duplicates when adding nodes and triplets.

Here is an example of how to use this method:

from llama_index.legacy.indices.knowledge_graph.base import KnowledgeGraphIndex
from llama_index.legacy.schema import TextNode
from llama_index.legacy.service_context import ServiceContext

# Create an instance of KnowledgeGraphIndex
index = KnowledgeGraphIndex([], service_context=ServiceContext())

# Define your triplets and nodes
triplets = [
    ("foo", "is", "bar"),
    ("hello", "is not", "world"),
    ("Jane", "is mother of", "Bob"),
]
nodes = [TextNode(text=str(tup)) for tup in triplets]

# Add triplets and nodes to the index
for triplet, node in zip(triplets, nodes):
    index.upsert_triplet_and_node(triplet, node)

This method ensures that duplicates are handled appropriately when adding triplets and nodes to the KnowledgeGraphIndex.

In the provided test cases, the upsert_triplet_and_node method is used to handle duplicates effectively. For example:

from unittest.mock import patch
from llama_index.legacy.indices.knowledge_graph.base import KnowledgeGraphIndex
from llama_index.legacy.schema import TextNode
from llama_index.legacy.service_context import ServiceContext

@patch.object(KnowledgeGraphIndex, "_extract_triplets", side_effect=mock_extract_triplets)
def test_build_kg_manual(_patch_extract_triplets, mock_service_context):
    index = KnowledgeGraphIndex([], service_context=mock_service_context)
    tuples = [
        ("foo", "is", "bar"),
        ("hello", "is not", "world"),
        ("Jane", "is mother of", "Bob"),
    ]
    nodes = [TextNode(text=str(tup)) for tup in tuples]
    for tup, node in zip(tuples, nodes):
        index.upsert_triplet_and_node(tup, node)

    # Verify the nodes and triplets
    docstore_nodes = index.docstore.get_nodes(list(index.index_struct.node_ids))
    table_chunks = {n.get_content() for n in docstore_nodes}
    assert len(table_chunks) == 3
    assert "('foo', 'is', 'bar')" in table_chunks
    assert "('hello', 'is not', 'world')" in table_chunks
    assert "('Jane', 'is mother of', 'Bob')" in table_chunks

    # Test inserting the same node twice
    node = TextNode(text=str(("foo", "is", "bar")), id_="test_node")
    index.upsert_triplet_and_node(tup, node)
    index.upsert_triplet_and_node(tup, node)

This test case demonstrates how the upsert_triplet_and_node method can be used to handle duplicates effectively, ensuring that the RuntimeError related to duplicated primary key values is resolved [1][2][3][4].

Details

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working kuzu triage Issue needs to be triaged/prioritized
Projects
None yet
Development

No branches or pull requests

2 participants