Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: QdrantVectorStore parser always expects a key called "text" in response #13831

Open
deveshasha opened this issue May 30, 2024 · 2 comments
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized

Comments

@deveshasha
Copy link

deveshasha commented May 30, 2024

Bug Description

The method parse_to_query_result in QdrantVectorStore always expects a "text" key in the metadata. This raises an error if the metadata does not contain a key called "text".

Line 740 in llama-index/vector_stores/qdrant/base.py

    def parse_to_query_result(self, response: List[Any]) -> VectorStoreQueryResult:
        """
        Convert vector store response to VectorStoreQueryResult.

        Args:
            response: List[Any]: List of results returned from the vector store.
        """
        nodes = []
        similarities = []
        ids = []

        for point in response:
            payload = cast(Payload, point.payload)
            try:
                node = metadata_dict_to_node(payload)
            except Exception:
                metadata, node_info, relationships = legacy_metadata_dict_to_node(
                    payload
                )

                node = TextNode(
                    id_=str(point.id),
                    text=payload.get("text"),  # <----- this should not be hardcoded
                    metadata=metadata,
                    start_char_idx=node_info.get("start", None),
                    end_char_idx=node_info.get("end", None),
                    relationships=relationships,
                )
            nodes.append(node)
            similarities.append(point.score)
            ids.append(str(point.id))

        return VectorStoreQueryResult(nodes=nodes, similarities=similarities, ids=ids)

This should not be the case and forces to use "text" in the vectorDB metadata. The user should be able to pass a param as keyname for text content, or should be warned about instead of raising parsing error.

Version

0.10.39

Steps to Reproduce

  1. Create collection in Qdrant that does not have "text" key in its metadata.
  2. Try to retrieve any node from the colletcion by asking any query.
vec_db_client = qdrant_client.QdrantClient(
    host=QDRANT_HOST,
    port=443,
    https=True,
)

vec_index = VectorStoreIndex.from_vector_store(
    vector_store=QdrantVectorStore(
        client=vec_db_client, collection_name=collection
    )
)

retriever = VectorIndexRetriever(index=vec_index, similarity_top_k=10)

query_engine = RetrieverQueryEngine(
    retriever=retriever,
    node_postprocessors=postprocessors,
)

query_engine.retrieve("query")  # <---- Raises error

Relevant Logs/Tracbacks

{
	"name": "ValidationError",
	"message": "1 validation error for TextNode
text
  none is not an allowed value (type=type_error.none.not_allowed)",
	"stack": "---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/vector_stores/qdrant/base.py:757, in QdrantVectorStore.parse_to_query_result(self, response)
    756 try:
--> 757     node = metadata_dict_to_node(payload)
    758 except Exception:

File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/core/vector_stores/utils.py:70, in metadata_dict_to_node(metadata, text)
     69 if node_json is None:
---> 70     raise ValueError(\"Node content not found in metadata dict.\")
     72 node: BaseNode

ValueError: Node content not found in metadata dict.

During handling of the above exception, another exception occurred:

ValidationError                           Traceback (most recent call last)
Cell In[6], line 1
----> 1 result = search.query(query=\"query?\", collection=\"test\")

Cell In[1], line 210, in Search.query(self, query, collection)
    204 # Query engine is used only for retrieval
    205 query_engine = RetrieverQueryEngine(
    206     retriever=retriever,
    207     node_postprocessors=postprocessors,
    208 )
--> 210 retrieved_nodes = query_engine.retrieve(query)
    212 logger.info(f\"Retrieved nodes: {retrieved_nodes}\")
    214 if len(retrieved_nodes) == 0:

File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/core/query_engine/retriever_query_engine.py:144, in RetrieverQueryEngine.retrieve(self, query_bundle)
    143 def retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
--> 144     nodes = self._retriever.retrieve(query_bundle)
    145     return self._apply_node_postprocessors(nodes, query_bundle=query_bundle)

File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py:274, in Dispatcher.span.<locals>.wrapper(func, instance, args, kwargs)
    270 self.span_enter(
    271     id_=id_, bound_args=bound_args, instance=instance, parent_id=parent_id
    272 )
    273 try:
--> 274     result = func(*args, **kwargs)
    275 except BaseException as e:
    276     self.event(SpanDropEvent(span_id=id_, err_str=str(e)))

File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/core/base/base_retriever.py:244, in BaseRetriever.retrieve(self, str_or_query_bundle)
    239 with self.callback_manager.as_trace(\"query\"):
    240     with self.callback_manager.event(
    241         CBEventType.RETRIEVE,
    242         payload={EventPayload.QUERY_STR: query_bundle.query_str},
    243     ) as retrieve_event:
--> 244         nodes = self._retrieve(query_bundle)
    245         nodes = self._handle_recursive_retrieval(query_bundle, nodes)
    246         retrieve_event.on_end(
    247             payload={EventPayload.NODES: nodes},
    248         )

File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/core/instrumentation/dispatcher.py:274, in Dispatcher.span.<locals>.wrapper(func, instance, args, kwargs)
    270 self.span_enter(
    271     id_=id_, bound_args=bound_args, instance=instance, parent_id=parent_id
    272 )
    273 try:
--> 274     result = func(*args, **kwargs)
    275 except BaseException as e:
    276     self.event(SpanDropEvent(span_id=id_, err_str=str(e)))

File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py:101, in VectorIndexRetriever._retrieve(self, query_bundle)
     95     if query_bundle.embedding is None and len(query_bundle.embedding_strs) > 0:
     96         query_bundle.embedding = (
     97             self._embed_model.get_agg_embedding_from_queries(
     98                 query_bundle.embedding_strs
     99             )
    100         )
--> 101 return self._get_nodes_with_embeddings(query_bundle)

File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py:177, in VectorIndexRetriever._get_nodes_with_embeddings(self, query_bundle_with_embeddings)
    173 def _get_nodes_with_embeddings(
    174     self, query_bundle_with_embeddings: QueryBundle
    175 ) -> List[NodeWithScore]:
    176     query = self._build_vector_store_query(query_bundle_with_embeddings)
--> 177     query_result = self._vector_store.query(query, **self._kwargs)
    178     return self._build_node_list_from_query_result(query_result)

File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/vector_stores/qdrant/base.py:605, in QdrantVectorStore.query(self, query, **kwargs)
    598 else:
    599     response = self._client.search(
    600         collection_name=self.collection_name,
    601         query_vector=query_embedding,
    602         limit=query.similarity_top_k,
    603         query_filter=query_filter,
    604     )
--> 605     return self.parse_to_query_result(response)

File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/llama_index/vector_stores/qdrant/base.py:763, in QdrantVectorStore.parse_to_query_result(self, response)
    758 except Exception:
    759     metadata, node_info, relationships = legacy_metadata_dict_to_node(
    760         payload
    761     )
--> 763     node = TextNode(
    764         id_=str(point.id),
    765         text=payload.get(\"text\"),
    766         metadata=metadata,
    767         start_char_idx=node_info.get(\"start\", None),
    768         end_char_idx=node_info.get(\"end\", None),
    769         relationships=relationships,
    770     )
    771 nodes.append(node)
    772 similarities.append(point.score)

File ~/miniconda3/envs/gptllama/lib/python3.10/site-packages/pydantic/v1/main.py:341, in BaseModel.__init__(__pydantic_self__, **data)
    339 values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
    340 if validation_error:
--> 341     raise validation_error
    342 try:
    343     object_setattr(__pydantic_self__, '__dict__', values)

ValidationError: 1 validation error for TextNode
text
  none is not an allowed value (type=type_error.none.not_allowed)"
}
@deveshasha deveshasha added bug Something isn't working triage Issue needs to be triaged/prioritized labels May 30, 2024
Copy link

dosubot bot commented May 30, 2024

@deveshasha Hello there! I'm here to help you with any issues you encounter. If you have any questions or need assistance with bug fixes, feel free to ask!

To address the issue where the parse_to_query_result method in the QdrantVectorStore class expects a "text" key in the metadata, you can modify the method to check if the "text" key exists before attempting to access it. If the key does not exist, you can handle it gracefully by setting a default value or skipping the operation.

Here is an example modification:

def parse_to_query_result(self, response: List[Any]) -> VectorStoreQueryResult:
    """
    Convert vector store response to VectorStoreQueryResult.

    Args:
        response: List[Any]: List of results returned from the vector store.
    """
    nodes = []
    similarities = []
    ids = []

    for point in response:
        payload = cast(Payload, point.payload)
        try:
            node = metadata_dict_to_node(payload)
        except Exception:
            metadata, node_info, relationships = legacy_metadata_dict_to_node(
                payload
            )

            text = payload.get("text", "Default text")  # Check if "text" key exists
            node = TextNode(
                id_=str(point.id),
                text=text,
                metadata=metadata,
                start_char_idx=node_info.get("start", None),
                end_char_idx=node_info.get("end", None),
                relationships=relationships,
            )
        nodes.append(node)
        ids.append(str(point.id))
        try:
            similarities.append(point.score)
        except AttributeError:
            # certain requests do not return a score
            similarities.append(1.0)

    return VectorStoreQueryResult(nodes=nodes, similarities=similarities, ids=ids)

In this modification, the payload.get("text", "Default text") line checks if the "text" key exists in the payload. If it does not, it sets text to "Default text". This approach ensures that the method handles missing "text" keys gracefully by providing a default value [1][2][3].

To continue talking to Dosu, mention @dosu.

@cwtsoftware
Copy link

I was getting the same error:

ValidationError: 1 validation error for TextNode
text
  none is not an allowed value (type=type_error.none.not_allowed)

I was creating a VectoreStoreIndex with already created collection from before, few weeks time.

After some hours debugging and trying some stuff I've realized that after making a new collection in qdrant with openai embeddings, resolved the error.

I did not go into research to what exactly caused the bug but I guess something changed in the way that qdrant vectore store is created and that did not match with old version of creating embeddings.
I'm not sure if that makes sense but it resolved the error. Just sharing if someone else has the same problem :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized
Projects
None yet
Development

No branches or pull requests

2 participants