[Question]: Getting ValueError Invalid literal for base 10 while vectorizing #14535

SiddharthMurjani · 2024-07-03T08:07:09Z

Question Validation

I have searched both the documentation and discord for an answer.

Question

I am vectorizing using the below code

embed_model = llm_class(api_key=os.getenv("OPENAI_API_KEY"), deployment=model_type, model=model_type, **kwargs)
Settings.embed_model = embed_model

def save_llama(content, metadata, storage_context):
    document = Document(text=content, metadata=metadata)
    VectorStoreIndex.from_documents(
        documents=[document],
        storage_context=storage_context,
        show_progress=True
    )

dosubot · 2024-07-03T09:45:52Z

To resolve the ValueError: Invalid literal for base 10 error while vectorizing, ensure that the vector store is properly initialized and passed to the query. The error message indicates that a vector store is required for the vector store query.

Here is the corrected code snippet:

embed_model = llm_class(api_key=os.getenv("OPENAI_API_KEY"), deployment=model_type, model=model_type, **kwargs)
Settings.embed_model = embed_model

def save_llama(content, metadata, storage_context):
    document = Document(text=content, metadata=metadata)
    
    # Ensure the vector store is properly loaded or initialized
    vector_store = SomeVectorStore.load_from_disk('vector_store.json')  # Replace with actual vector store loading method
    
    VectorStoreIndex.from_documents(
        documents=[document],
        storage_context=storage_context,
        vector_store=vector_store,  # Pass the vector store to the query
        show_progress=True
    )

Make sure to replace SomeVectorStore.load_from_disk('vector_store.json') with the actual method to load or initialize your vector store. This should resolve the ValueError by ensuring that the vector store is provided for the vector store query ^[1].

To continue talking to Dosu, mention @dosu.

SiddharthMurjani · 2024-07-03T09:50:45Z

After debuggin I got that no index_struct is being returned.

        with self._callback_manager.as_trace("index_construction"):
            if index_struct is None:
                nodes = nodes or []
                index_struct = self.build_index_from_nodes(
                    nodes + objects  # type: ignore
                )

And that's why VectoreStoreIndex is not being initialized

class VectorStoreIndex(BaseIndex[IndexDict]):
    """Vector Store Index.

    Args:
        use_async (bool): Whether to use asynchronous calls. Defaults to False.
        show_progress (bool): Whether to show tqdm progress bars. Defaults to False.
        store_nodes_override (bool): set to True to always store Node objects in index
            store and document store even if vector store keeps text. Defaults to False
    """

    index_struct_cls = IndexDict

    def __init__(
        self,
        nodes: Optional[Sequence[BaseNode]] = None,
        # vector store index params
        use_async: bool = False,
        store_nodes_override: bool = False,
        embed_model: Optional[EmbedType] = None,
        insert_batch_size: int = 2048,
        # parent class params
        objects: Optional[Sequence[IndexNode]] = None,
        index_struct: Optional[IndexDict] = None,
        storage_context: Optional[StorageContext] = None,
        callback_manager: Optional[CallbackManager] = None,
        transformations: Optional[List[TransformComponent]] = None,
        show_progress: bool = False,
        # deprecated
        service_context: Optional[ServiceContext] = None,
        **kwargs: Any,
    ) -> None:
        """Initialize params."""
        self._use_async = use_async
        self._store_nodes_override = store_nodes_override
        self._embed_model = (
            resolve_embed_model(embed_model, callback_manager=callback_manager)
            if embed_model
            else embed_model_from_settings_or_context(Settings, service_context)
        )

        self._insert_batch_size = insert_batch_size
        super().__init__(
            nodes=nodes,
            **index_struct=index_struct,**
            service_context=service_context,
            storage_context=storage_context,
            show_progress=show_progress,
            objects=objects,
            callback_manager=callback_manager,
            transformations=transformations,
            **kwargs,
        )

logan-markewich · 2024-07-03T16:09:00Z

I don't think this is related to anything with VectoreStoreIndex -- I think your embedding model is not returning raw floats for embedding values.

I don't know what llm_class is doing, but that is likely the issue.

You can easily check with embeddings = embed_model.get_text_embedding("Hello world") and ensure the returned type is a list of float. Seems like it might be returning numpy or something else

SiddharthMurjani added the question Further information is requested label Jul 3, 2024

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Getting ValueError Invalid literal for base 10 while vectorizing #14535

[Question]: Getting ValueError Invalid literal for base 10 while vectorizing #14535

SiddharthMurjani commented Jul 3, 2024 •

edited

Loading

dosubot bot commented Jul 3, 2024

SiddharthMurjani commented Jul 3, 2024

logan-markewich commented Jul 3, 2024

[Question]: Getting ValueError Invalid literal for base 10 while vectorizing #14535

[Question]: Getting ValueError Invalid literal for base 10 while vectorizing #14535

Comments

SiddharthMurjani commented Jul 3, 2024 • edited Loading

Question Validation

Question

dosubot bot commented Jul 3, 2024

SiddharthMurjani commented Jul 3, 2024

logan-markewich commented Jul 3, 2024

SiddharthMurjani commented Jul 3, 2024 •

edited

Loading