Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Performance optimizations and value error when streaming in langfuse #798

Merged
merged 9 commits into from
Jun 13, 2024

Conversation

Redna
Copy link
Contributor

@Redna Redna commented Jun 10, 2024

Related Issues

Proposed Changes:

  • flushing the data can be controlled now by an environment variable HAYSTACK_LANGFUSE_ENFORCE_FLUSH (Added a hint on the documentation that it needs to be done manually instead
  • flushing will be done now only once before closing the trace
  • traces are now independent for each pipeline run (Remove the last span from the context list in order to properly trace down the time of one pipeline run independently)
  • Usage will be set to None when it is an empty dict to solve the langfuse value error

How did you test it?

  • Unit tests are checking for enabling and disabling HAYSTACK_LANGFUSE_ENFORCE_FLUSH
  • Unit tests check if the last trace will be ended and removed from the context
Bildschirmfoto 2024-06-10 um 12 50 03

** Manual tests: **

import os

os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["HAYSTACK_CONTENT_TRACING_ENABLED"]="true"
os.environ["LANGFUSE_SECRET_KEY"]="<secret_key>"
os.environ["LANGFUSE_PUBLIC_KEY"]="<public_key>"
os.environ["LANGFUSE_HOST"]="https://cloud.langfuse.com"


from haystack import Pipeline
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.document_stores.in_memory import InMemoryDocumentStore

from haystack.tracing import tracer

from haystack.utils import Secret

from haystack_integrations.components.connectors.langfuse import LangfuseConnector

def print_callback(data):
    print(data.content, end="")

docstore = InMemoryDocumentStore()
docstore.write_documents([Document(content="Rome is the capital of Italy"), Document(content="Paris is the capital of France")])

query = "What is the capital of France?"

template = """
Given the following information, answer the question.

Context: 
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{ query }}?
"""
pipe = Pipeline()

langfuse_connector = LangfuseConnector(name="TEST")

pipe.add_component("tracer", langfuse_connector)

pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component("llm", OpenAIGenerator(api_key=Secret.from_token("<your-api-key>"),
                                          api_base_url="http:https://localhost:30091/v1",
                                          streaming_callback=print_callback))

pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")

try: 
    res=pipe.run({
        "prompt_builder": {
            "query": query
        },
        "retriever": {
            "query": query
        }
    })

    print(res)
finally:
    tracer.actual_tracer.flush()

*** Enforce flush disabled ***

Bildschirmfoto 2024-06-10 um 12 51 36

*** Enforce flush enabled ***
Bildschirmfoto 2024-06-10 um 12 53 29

Notes for the reviewer

Notice the 4s to 40ms difference between the two settings. I think it is crucial to enable the user the option to flush the data lazy.

Checklist

@github-actions github-actions bot added integration:langfuse type:documentation Improvements or additions to documentation labels Jun 10, 2024
@Redna Redna marked this pull request as ready for review June 10, 2024 11:01
@Redna Redna requested a review from a team as a code owner June 10, 2024 11:01
@Redna Redna requested review from vblagoje and removed request for a team June 10, 2024 11:01
@anakin87 anakin87 requested a review from masci June 10, 2024 12:41
@anakin87
Copy link
Member

@masci feel free to review it if/when you have time.

@masci masci removed the request for review from vblagoje June 10, 2024 17:16
@Redna Redna requested a review from masci June 12, 2024 20:09
Copy link
Contributor

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, it looks good to me!

@masci masci merged commit bf5c641 into deepset-ai:main Jun 13, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration:langfuse type:documentation Improvements or additions to documentation
Projects
None yet
3 participants