louisbrulenaudet (Louis Brulé Naudet)

Posts 11

Post

991

Understanding the json format response with HF's Serverless Inference API 🤗

As it stands, there seems to be an inconsistency with the OpenAI documentation on the question of implementing the JSON response format using the InferenceClient completion API.

After investigating the InferenceClient source code, I share the official solution using a JSON Schema. This consolidates the structure of the response and simplifies parsing as part of an automated process for extracting metadata, information:

from huggingface_hub import InferenceClient

client = InferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")

messages = [
    {
        "role": "user",
        "content": "I saw a puppy a cat and a raccoon during my bike ride in the park. What did I saw and when?",
    },
]

response_format = {
    "type": "json",
    "value": {
        "properties": {
            "location": {"type": "string"},
            "activity": {"type": "string"},
            "animals_seen": {"type": "integer", "minimum": 1, "maximum": 5},
            "animals": {"type": "array", "items": {"type": "string"}},
        },
        "required": ["location", "activity", "animals_seen", "animals"],
    },
}

response = client.chat_completion(
    messages=messages,
    response_format=response_format,
    max_tokens=500,
)

print(response.choices[0].message.content)

As a reminder, json mode is activated with the OpenAI client as follows:

response = client.chat.completions.create(
     model="gpt-3.5-turbo-0125",
     messages=[...],
     response_format={"type": "json_object"}
)

One question remains unanswered, however, and will perhaps be answered by the community: it seems that an incompatibility persists for list of dictionaries generation, and currently, the production of simple dictionaries seems to be the only functional option.

Post

2715

🚀 RAGoon is now available on PyPI, GitHub, and as a Space on Hugging Face for batched embeddings generation 🤗

RAGoon is a set of NLP utilities for multi-model embedding production, high-dimensional vector visualization, and aims to improve language model performance by providing contextually relevant information through search-based querying, web scraping and data augmentation techniques.

At this stage, 5 major classes are available via RAGoon to facilitate:
- the production of chain embeddings for several models to simplify a continuous deployment process;
- production of LLM requests for web querying and content retrieval via the Google API;
- recursive chunking via tokens;
- data visualization and the function to load embeddings from a FAISS index, reduce their dimensionality using PCA and/or t-SNE, and visualize them in an interactive 3D graph;
- the creation of binary indexes for search with scalar (int8) rescoring.

Link to GitHub: https://github.com/louisbrulenaudet/ragoon
Link to the 🤗 Space: louisbrulenaudet/ragoon

View all posts