Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset has a column question but Dataset constructor is throwing an Error that it doesn't #1961

Closed
osok opened this issue Jun 20, 2024 · 1 comment

Comments

@osok
Copy link

osok commented Jun 20, 2024

Issue Type

Bug

Source

source

Giskard Library Version

2.14.0

Giskard Hub Version

not running hub

OS Platform and Distribution

Ubuntu 22.04.4 LTS

Python version

Python 3.9.19

Installed python packages

conda list
# packages in environment at /home/michael/anaconda3/envs/giskard:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
aiohttp                   3.9.5                    pypi_0    pypi
aiosignal                 1.3.1                    pypi_0    pypi
annotated-types           0.7.0                    pypi_0    pypi
anyio                     4.4.0                    pypi_0    pypi
asttokens                 2.0.5              pyhd3eb1b0_0    anaconda
async-timeout             4.0.3                    pypi_0    pypi
attrs                     23.2.0                   pypi_0    pypi
backcall                  0.2.0              pyhd3eb1b0_0    anaconda
bert-score                0.3.13                   pypi_0    pypi
bokeh                     3.4.1                    pypi_0    pypi
bzip2                     1.0.8                hd590300_5    conda-forge
ca-certificates           2023.08.22           h06a4308_0    anaconda
cachetools                5.3.3                    pypi_0    pypi
certifi                   2024.6.2                 pypi_0    pypi
chardet                   5.2.0                    pypi_0    pypi
charset-normalizer        3.3.2                    pypi_0    pypi
click                     8.1.7                    pypi_0    pypi
cloudpickle               3.0.0                    pypi_0    pypi
colorama                  0.4.6                    pypi_0    pypi
comm                      0.1.2            py39h06a4308_0    anaconda
contourpy                 1.2.1                    pypi_0    pypi
cycler                    0.12.1                   pypi_0    pypi
dataclasses-json          0.6.7                    pypi_0    pypi
datasets                  2.20.0                   pypi_0    pypi
debugpy                   1.6.7            py39h6a678d5_0    anaconda
decorator                 5.1.1              pyhd3eb1b0_0    anaconda
deprecated                1.2.14                   pypi_0    pypi
dill                      0.3.8                    pypi_0    pypi
distro                    1.9.0                    pypi_0    pypi
docopt                    0.6.2                    pypi_0    pypi
entrypoints               0.4                      pypi_0    pypi
evaluate                  0.4.2                    pypi_0    pypi
exceptiongroup            1.2.1                    pypi_0    pypi
executing                 0.8.3              pyhd3eb1b0_0    anaconda
faiss-cpu                 1.8.0                    pypi_0    pypi
filelock                  3.15.3                   pypi_0    pypi
fonttools                 4.53.0                   pypi_0    pypi
frozenlist                1.4.1                    pypi_0    pypi
fsspec                    2024.5.0                 pypi_0    pypi
giskard                   2.14.0                   pypi_0    pypi
gitdb                     4.0.11                   pypi_0    pypi
gitpython                 3.1.43                   pypi_0    pypi
greenlet                  3.0.3                    pypi_0    pypi
griffe                    0.47.0                   pypi_0    pypi
h11                       0.14.0                   pypi_0    pypi
httpcore                  1.0.5                    pypi_0    pypi
httpx                     0.27.0                   pypi_0    pypi
huggingface-hub           0.23.4                   pypi_0    pypi
idna                      3.7                      pypi_0    pypi
importlib-metadata        7.1.0                    pypi_0    pypi
importlib-resources       6.4.0                    pypi_0    pypi
importlib_metadata        6.0.0                hd3eb1b0_0    anaconda
ipykernel                 6.25.0           py39h2f386ee_0    anaconda
ipython                   8.15.0           py39h06a4308_0    anaconda
jedi                      0.18.1           py39h06a4308_1    anaconda
jinja2                    3.1.4                    pypi_0    pypi
joblib                    1.4.2                    pypi_0    pypi
jsonpatch                 1.33                     pypi_0    pypi
jsonpointer               3.0.0                    pypi_0    pypi
jupyter_client            8.6.0            py39h06a4308_0    anaconda
jupyter_core              5.5.0            py39h06a4308_0    anaconda
kiwisolver                1.4.5                    pypi_0    pypi
langchain                 0.2.5                    pypi_0    pypi
langchain-community       0.2.5                    pypi_0    pypi
langchain-core            0.2.9                    pypi_0    pypi
langchain-openai          0.1.8                    pypi_0    pypi
langchain-text-splitters  0.2.1                    pypi_0    pypi
langdetect                1.0.9                    pypi_0    pypi
langsmith                 0.1.81                   pypi_0    pypi
ld_impl_linux-64          2.40                 hf3520f5_7    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.2.0              h77fa898_11    conda-forge
libgomp                   13.2.0              h77fa898_11    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libsodium                 1.0.18               h7b6447c_0    anaconda
libsqlite                 3.46.0               hde9e2c9_0    conda-forge
libstdcxx-ng              11.2.0               h1234567_1    anaconda
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libzlib                   1.3.1                h4ab18f5_1    conda-forge
llvmlite                  0.43.0                   pypi_0    pypi
markdown                  3.6                      pypi_0    pypi
markupsafe                2.1.5                    pypi_0    pypi
marshmallow               3.21.3                   pypi_0    pypi
matplotlib                3.9.0                    pypi_0    pypi
matplotlib-inline         0.1.6            py39h06a4308_0    anaconda
mixpanel                  4.10.1                   pypi_0    pypi
mlflow-skinny             2.14.1                   pypi_0    pypi
mpmath                    1.3.0                    pypi_0    pypi
multidict                 6.0.5                    pypi_0    pypi
multiprocess              0.70.16                  pypi_0    pypi
mypy-extensions           1.0.0                    pypi_0    pypi
ncurses                   6.5                  h59595ed_0    conda-forge
nest-asyncio              1.5.6            py39h06a4308_0    anaconda
networkx                  3.2.1                    pypi_0    pypi
num2words                 0.5.13                   pypi_0    pypi
numba                     0.60.0                   pypi_0    pypi
numpy                     1.26.4                   pypi_0    pypi
nvidia-cublas-cu12        12.1.3.1                 pypi_0    pypi
nvidia-cuda-cupti-cu12    12.1.105                 pypi_0    pypi
nvidia-cuda-nvrtc-cu12    12.1.105                 pypi_0    pypi
nvidia-cuda-runtime-cu12  12.1.105                 pypi_0    pypi
nvidia-cudnn-cu12         8.9.2.26                 pypi_0    pypi
nvidia-cufft-cu12         11.0.2.54                pypi_0    pypi
nvidia-curand-cu12        10.3.2.106               pypi_0    pypi
nvidia-cusolver-cu12      11.4.5.107               pypi_0    pypi
nvidia-cusparse-cu12      12.1.0.106               pypi_0    pypi
nvidia-nccl-cu12          2.20.5                   pypi_0    pypi
nvidia-nvjitlink-cu12     12.5.40                  pypi_0    pypi
nvidia-nvtx-cu12          12.1.105                 pypi_0    pypi
openai                    1.35.1                   pypi_0    pypi
openssl                   3.3.1                h4ab18f5_0    conda-forge
opentelemetry-api         1.25.0                   pypi_0    pypi
opentelemetry-sdk         1.25.0                   pypi_0    pypi
opentelemetry-semantic-conventions 0.46b0                   pypi_0    pypi
orjson                    3.10.5                   pypi_0    pypi
packaging                 24.1                     pypi_0    pypi
pandas                    2.2.2                    pypi_0    pypi
parso                     0.8.3              pyhd3eb1b0_0    anaconda
pexpect                   4.8.0              pyhd3eb1b0_3    anaconda
pickleshare               0.7.5           pyhd3eb1b0_1003    anaconda
pillow                    10.3.0                   pypi_0    pypi
pip                       24.0               pyhd8ed1ab_0    conda-forge
platformdirs              3.10.0           py39h06a4308_0    anaconda
prompt-toolkit            3.0.36           py39h06a4308_0    anaconda
protobuf                  4.25.3                   pypi_0    pypi
psutil                    5.9.0            py39h5eee18b_0    anaconda
ptyprocess                0.7.0              pyhd3eb1b0_2    anaconda
pure_eval                 0.2.2              pyhd3eb1b0_0    anaconda
pyarrow                   16.1.0                   pypi_0    pypi
pyarrow-hotfix            0.6                      pypi_0    pypi
pydantic                  2.7.4                    pypi_0    pypi
pydantic-core             2.18.4                   pypi_0    pypi
pygments                  2.15.1           py39h06a4308_1    anaconda
pymupdf                   1.24.5                   pypi_0    pypi
pymupdfb                  1.24.3                   pypi_0    pypi
pynndescent               0.5.13                   pypi_0    pypi
pyparsing                 3.1.2                    pypi_0    pypi
pypdf                     3.17.0                   pypi_0    pypi
python                    3.9.19          h0755675_0_cpython    conda-forge
python-dateutil           2.9.0.post0              pypi_0    pypi
pytz                      2024.1                   pypi_0    pypi
pyyaml                    6.0.1                    pypi_0    pypi
pyzmq                     25.1.0           py39h6a678d5_0    anaconda
readline                  8.2                  h8228510_1    conda-forge
regex                     2024.5.15                pypi_0    pypi
requests                  2.32.3                   pypi_0    pypi
requests-toolbelt         1.0.0                    pypi_0    pypi
safetensors               0.4.3                    pypi_0    pypi
scikit-learn              1.5.0                    pypi_0    pypi
scipy                     1.11.4                   pypi_0    pypi
sentence-transformers     3.0.1                    pypi_0    pypi
sentry-sdk                2.6.0                    pypi_0    pypi
setuptools                70.1.0             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyhd3eb1b0_1    anaconda
smmap                     5.0.1                    pypi_0    pypi
sniffio                   1.3.1                    pypi_0    pypi
sqlalchemy                2.0.31                   pypi_0    pypi
sqlparse                  0.5.0                    pypi_0    pypi
stack_data                0.2.0              pyhd3eb1b0_0    anaconda
sympy                     1.12.1                   pypi_0    pypi
tenacity                  8.4.1                    pypi_0    pypi
threadpoolctl             3.5.0                    pypi_0    pypi
tiktoken                  0.7.0                    pypi_0    pypi
tk                        8.6.13          noxft_h4845f30_101    conda-forge
tokenizers                0.19.1                   pypi_0    pypi
torch                     2.3.1                    pypi_0    pypi
tornado                   6.4.1                    pypi_0    pypi
tqdm                      4.66.4                   pypi_0    pypi
traitlets                 5.7.1            py39h06a4308_0    anaconda
transformers              4.41.2                   pypi_0    pypi
triton                    2.3.1                    pypi_0    pypi
typing-extensions         4.12.2                   pypi_0    pypi
typing-inspect            0.9.0                    pypi_0    pypi
typing_extensions         4.7.1            py39h06a4308_0    anaconda
tzdata                    2024.1                   pypi_0    pypi
umap-learn                0.5.6                    pypi_0    pypi
urllib3                   2.2.2                    pypi_0    pypi
wcwidth                   0.2.5              pyhd3eb1b0_0    anaconda
wheel                     0.43.0             pyhd8ed1ab_1    conda-forge
wrapt                     1.16.0                   pypi_0    pypi
xxhash                    3.4.1                    pypi_0    pypi
xyzservices               2024.6.0                 pypi_0    pypi
xz                        5.2.6                h166bdaf_0    conda-forge
yarl                      1.9.4                    pypi_0    pypi
zeromq                    4.3.4                h2531618_0    anaconda
zipp                      3.19.2                   pypi_0    pypi
zstandard                 0.22.0                   pypi_0    pypi

Current Behaviour?

The full code is below. As you can see the dataset has a column question but Dataset constructor is throwing an Error that it doesn't.

# Step 4: Wrap the dataset with giskard.Dataset
giskard_dataset = giskard.Dataset(
    df=test_dataset,
    name="Climate Change Question Answering",
    target="question",
    column_types={"question": "text"}
)

returns

/home/michael/anaconda3/envs/giskard/lib/python3.9/site-packages/giskard/datasets/base/__init__.py:443: UserWarning: The provided keys ['question'] in 'column_types' are not part of your dataset 'columns'. Please make sure that the column names in `column_types` refers to existing columns in your dataset.
  warning(

In the step before I print out what test_dataset looks like and I get this:

2024-06-20 17:08:52,178 pid:3360791 MainThread giskard.models.automodel INFO     Your 'prediction_function' is successfully wrapped by Giskard's 'PredictionFunctionModel' wrapper class.
                                           question
0                           What is climate change?
1   How does climate change affect the environment?
2            What are the causes of climate change?
3                           What is climate change?
4   How does climate change affect the environment?
..                                              ...
94  How does climate change affect the environment?
95           What are the causes of climate change?
96                          What is climate change?
97  How does climate change affect the environment?
98           What are the causes of climate change?

[99 rows x 1 columns]

If run

print(test_dataset.columns)

I get

Index(['question'], dtype='object')

I rtied changing to

# Step 4: Wrap the dataset with giskard.Dataset
giskard_dataset = giskard.Dataset(
    df=test_dataset,
    name="Climate Change Questions",
    target="question",
    column_types={"question": "object"}
)

returns

ValueError: Invalid column_types parameter: {}. Please specify non-empty dictionary.

Standalone code OR list down the steps to reproduce the issue

I am using LM Studio to host TheBloke/Llama2-chat-13B-q-*_0-GGUF with nomic-embeded-text embeddings. I succesfully modified the first example steps leveraging LM Studio.

import requests
import numpy as np
import faiss
from openai import OpenAI
from langchain import PromptTemplate
from langchain.chains import RetrievalQA
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.docstore.in_memory import InMemoryDocstore
from langchain.schema import Document
from langchain_openai import ChatOpenAI
import giskard
import pandas as pd

# Initialize OpenAI client for LM Studio embeddings
embedding_client = OpenAI(base_url="https://localhost:5000/v1", api_key="lm-studio")

# Function to get embeddings from LM Studio
def get_embedding(text, model="model-identifier", retries=3, timeout=120):
    text = text.replace("\n", " ")
    data = {
        "input": [text],
        "model": model
    }
    for attempt in range(retries):
        try:
            response = requests.post("https://localhost:5000/v1/embeddings", json=data, timeout=timeout)
            response.raise_for_status()  # Raise an error for bad status codes
            response_data = response.json()
            if 'data' in response_data and len(response_data['data']) > 0:
                return response_data['data'][0]['embedding']
            return None
        except requests.exceptions.RequestException as e:
            if attempt < retries - 1:
                time.sleep(2 ** attempt)  # Exponential backoff
        except Exception as e:
            if attempt < retries - 1:
                time.sleep(2 ** attempt)  # Exponential backoff
    return None

# Function to load and split PDF using PyMuPDF
def load_pdf(file_path):
    import fitz  # PyMuPDF
    doc = fitz.open(file_path)
    text = ""
    for page_num in range(len(doc)):
        page = doc.load_page(page_num)
        text += page.get_text("text")
    return text

# Load the PDF
pdf_text = load_pdf("IPCC_AR6_SYR_LongerReport.pdf")

# Split the text into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100, add_start_index=True)
texts = text_splitter.split_text(pdf_text)

# Get embeddings for each text chunk
embeddings = [get_embedding(text) for text in texts if get_embedding(text)]

# Convert embeddings to a NumPy array
embeddings = np.array(embeddings, dtype=np.float32)

# Ensure the embeddings have the desired dimension
desired_dimension = 768
if embeddings.shape[1] != desired_dimension:
    print(f"Warning: The embedding dimension is {embeddings.shape[1]}, not {desired_dimension}")

# Create a FAISS index
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(embeddings)

# Prepare documents and docstore
documents = [Document(page_content=text) for text in texts]
docstore = InMemoryDocstore({str(i): doc for i, doc in enumerate(documents)})

# Create an index-to-docstore-id mapping
index_to_docstore_id = {i: str(i) for i in range(len(documents))}

# Use Langchain's FAISS retriever
vectorstore = FAISS(embedding_function=get_embedding, index=index, docstore=docstore, index_to_docstore_id=index_to_docstore_id)

# Prepare QA chain
PROMPT_TEMPLATE = """You are the Climate Assistant, a helpful AI assistant made by Giskard.
Your task is to answer common questions on climate change.
You will be given a question and relevant excerpts from the IPCC Climate Change Synthesis Report (2023).
Please provide short and clear answers based on the provided context. Be polite and helpful.

Context:
{context}

Question:
{question}

Your answer:
"""

```python
llm = ChatOpenAI(base_url="https://localhost:5000/v1", temperature=0.85, api_key="not_needed")

prompt = PromptTemplate(template=PROMPT_TEMPLATE, input_variables=["question", "context"])
climate_qa_chain = RetrievalQA.from_llm(llm=llm, retriever=vectorstore.as_retriever(), prompt=prompt)

# Example question
question = "What are the main impacts of climate change?"
answer = climate_qa_chain({"query": question})
print(answer['result'])

NOTE I'm using a local copy of the PDF.

that returns:

Hi there! As the Climate Assistant, I'd be happy to help you understand the main impacts of climate change based on the IPCC Climate Change Synthesis Report (2023). Here are some key points:

* Climate change is causing impacts across multiple sectors, including agriculture, forestry, fishery, energy, and tourism, with regional effects and substantial economic damages (high confidence).
* The extent and magnitude of climate change impacts are larger than estimated in previous assessments (high confidence).
* Biological responses to climate change, such as changes in geographic placement and shifting seasonal timing, are often not sufficient to cope with recent climate change (very high confidence).
* Hundreds of local losses of species have been driven by increases in the magnitude of heat extremes (high confidence) and mass mortality events on land and in the ocean (very high confidence).
* Impacts on some ecosystems are approaching irreversibility, such as the impacts of hydrological changes resulting from the retreat of glaciers or the changes in some mountain and Arctic ecosystems driven by permafrost thaw (high confidence).
* In urban settings, climate change has caused adverse impacts on human health, livelihoods, and key infrastructure, including hot extremes, worsened air pollution events, and compromised transportation, water, sanitation, and energy systems (high confidence).
* Climate change has caused widespread adverse impacts and related losses and damages to nature and people, with adverse effects on gender and social equity (high confidence).

Overall, the main impacts of climate change are significant and far-reaching, affecting ecosystems, human health, economies, and societies around the world. It's important for us to take action to mitigate and adapt to climate change in order to minimize these impacts and protect our planet and its inhabitants.

Step 2 I modificed a bit, Im using Jupyter notebook so there may be some redundacy here.

from openai import OpenAI

# Initialize LM Studio client
client = OpenAI(base_url="https://localhost:5000/v1", api_key="lm-studio")

def test_lm_studio():
    response = client.embeddings.create(
        input="Test embedding",  # Pass a single string instead of a list
        model="model-identifier"
    )
    print(response)

test_lm_studio()

responds

CreateEmbeddingResponse(data=[Embedding(embedding=[0.0064669218845665455, 0.06351444870233536, -0.1611400842666626, -0.06326187402009964, 0.05246135964989662, -0.03064507618546486, 0.04681089520454407, -0.0018800891702994704, -0.008492004126310349, -0.050695400685071945, -0.05691378191113472, 0.017255719751119614, 0.042320359498262405, 0.022760674357414246, -0.07611146569252014, 0.02281494066119194, 0.06283671408891678, -0.07043047249317169, -0.042003415524959564, 0.012212948873639107, -0.008728792890906334, -0.0174024049192667, -0.01927606388926506, 0.029659563675522804, 0.07451793551445007, -0.0008673956035636365, 0.03871045634150505, 0.017129886895418167, -0.0477774441242218, -0.01823514886200428, 0.03305645287036896, -0.016419149935245514, 0.0023386923130601645, -0.07695670425891876, 0.01280568540096283, -0.023264162242412567, -0.02257232740521431, 0.06116673722863197, -0.010568642988801003, 0.018292315304279327, -0.018471090123057365, 0.027066316455602646, -0.04678845405578613, 0.007002583704888821, -0.012322541326284409, -0.004099644720554352, 0.04447122663259506, -0.008417337201535702, 0.06361556798219681, -0.07586868852376938, -0.008127271197736263, -0.015134919434785843, 0.027777042239904404, -0.030881009995937347, 0.10104407370090485, -0.0220839474350214, -0.004400433972477913, 0.0027780423406511545, 0.04309779405593872, -0.047020912170410156, 0.02646847814321518, 0.04641517251729965, -0.08350083231925964, 0.06673450022935867, 0.03603386506438255, -0.012580040842294693, -0.032048121094703674, 0.041211072355508804, 0.02176036313176155, -0.002407516585662961, 0.05346294492483139, 0.01760036125779152, 0.021599186584353447, -0.028477348387241364, -0.00904479343444109, -0.02677873708307743, -0.008542780764400959, 0.013255150988698006, -0.023049646988511086, 0.030451616272330284, 0.05886407569050789, 0.0402376763522625, 0.052195604890584946, -0.036597318947315216, 0.12094457447528839, -0.00866342056542635, -0.025758899748325348, -0.043120115995407104, -0.035248998552560806, 0.03243231773376465, 0.027565641328692436, -0.0014823925448581576, -0.0022183116525411606, 0.06148054450750351, -0.0052839056588709354, 0.00538307148963213, 0.0033648405224084854, 0.05711190775036812, -0.027715327218174934, 0.017224883660674095, -0.021165885031223297, 0.004475398920476437, -0.0386495478451252, 0.006505256984382868, -0.016500772908329964, 0.03756341338157654, 0.016824880614876747, 0.019949600100517273, -0.037670113146305084, 0.01594969816505909, -0.03279519081115723, 0.05598878860473633, -0.07088998705148697, 0.002489296020939946, -0.008262785151600838, 0.010792279615998268, 0.015421098098158836, -0.03783341497182846, -0.00959998182952404, 0.02276875264942646, 0.047679588198661804, 0.014206277206540108, -0.005345615092664957, 0.004882083274424076, -0.03128238767385483, 0.026165325194597244, -0.04385925829410553, -0.024430256336927414, -0.032794591039419174, -0.02272931858897209, 0.009260221384465694, -0.06611528992652893, -0.0516049899160862, -0.012640990316867828, -0.0005521396524272859, -0.0035098930820822716, -0.024526061490178108, -0.0037367884069681168, -0.04975273832678795, 0.04512752220034599, -0.02734244614839554, 0.015485684387385845, 0.015125617384910583, -0.004751755390316248, 0.024411054328083992, -0.07216054201126099, 0.012908409349620342, 0.010312492959201336, 0.008203060366213322, 0.018258025869727135, 0.007358795031905174, 0.012128644622862339, -0.009694535285234451, -0.009757845662534237, 0.03871001675724983, 0.01456446386873722, 0.0009730601450428367, 0.02474880963563919, 0.0569312646985054, 0.06847716122865677, -0.0012344700517132878, 0.014515952207148075, -0.009127219207584858, 0.0011213109828531742, 0.011005045846104622, -0.03458358719944954, 0.015338078141212463, 0.05648224428296089, 0.050727639347314835, -0.03235146403312683, -0.02907950058579445, -0.03269701078534126, -0.023486020043492317, -0.0047586336731910706, -0.03985333442687988, 0.005676861386746168, 0.00659682834520936, -0.08443038910627365, 0.07168836146593094, 0.017495859414339066, 0.03293969854712486, -0.002945663407444954, 0.019641689956188202, -0.024228127673268318, -0.019984599202871323, -0.04830162599682808, 0.007991835474967957, -0.035350024700164795, -0.02952738292515278, -0.03579626604914665, -0.02995678409934044, 0.006777525879442692, -0.062358833849430084, -0.05450890213251114, -0.03311857581138611, -0.024961676448583603, 0.07185479253530502, -0.04093334078788757, 0.009538433514535427, 0.006622259970754385, -0.009208044037222862, 0.0009557810262776911, 0.0024578291922807693, 0.019045177847146988, -0.014431153424084187, -0.020905209705233574, -0.040727950632572174, -0.0434359610080719, -0.03912324458360672, 0.028614087030291557, 0.04695074260234833, 0.014604046009480953, 0.025924693793058395, 0.04032445326447487, 0.08680761605501175, 0.010817321948707104, 0.040021706372499466, -0.06351478397846222, -0.006450123153626919, -0.02251751720905304, 0.05152507871389389, 0.004767491482198238, 0.0053273411467671394, -0.08093636482954025, 0.02866566739976406, 0.017136791720986366, -0.040922246873378754, 0.030787605792284012, -0.04714637249708176, -0.02139500342309475, -0.00781954824924469, -0.04180819168686867, 0.025715487077832222, -0.008652662858366966, 0.041171666234731674, 0.02446984127163887, -0.0530635342001915, 0.03282074257731438, -0.04804784432053566, -0.014628262259066105, 0.0002607041969895363, 0.0437760129570961, 0.005246252287179232, -0.01116296835243702, 0.008658702485263348, 0.006703667342662811, 0.027969475835561752, -0.06369654834270477, 0.012788929976522923, 0.08130958676338196, -0.025345565751194954, 0.02728247456252575, 0.013214290142059326, 0.02459097094833851, 0.0350150540471077, -0.038794856518507004, -0.017310889437794685, -0.07265616953372955, -0.004030259326100349, -0.02326565608382225, 0.013323637656867504, -0.014955551363527775, -0.03313535079360008, -0.023806126788258553, -0.06867102533578873, -0.024037038907408714, 0.009499359875917435, -0.04669489711523056, 0.017498185858130455, 0.020837239921092987, -0.026310861110687256, 0.004767427686601877, 0.03933931514620781, 0.015174010768532753, 0.029891228303313255, -0.019849998876452446, 0.03169771283864975, 0.00739964097738266, -0.045707717537879944, 0.02353556454181671, -0.08459107577800751, 0.0021938851568847895, -0.05820438638329506, 0.03605872020125389, -0.04334457963705063, -0.023945368826389313, 0.014687820337712765, -0.04180072620511055, 0.00973773468285799, 0.051976196467876434, 0.020788908004760742, -0.004348635673522949, -0.012950548902153969, 0.042148955166339874, -0.016067558899521828, 0.02339160069823265, 0.03127419203519821, -0.023078955709934235, 0.017691096290946007, -0.015490148216485977, 0.026949729770421982, 0.03287918120622635, 0.07686491310596466, 0.027904212474822998, -0.0008750182460062206, -0.007884553633630276, 0.037201639264822006, -0.05191276967525482, 0.02670985832810402, 0.03232601657509804, -0.08001592755317688, -0.016095492988824844, -0.029995247721672058, -0.017702709883451462, -0.03060649149119854, 0.026476068422198296, 0.039438895881175995, 0.023204607889056206, 0.04085801914334297, -0.04049360007047653, 0.007129500154405832, -0.062467195093631744, 0.0351988784968853, 0.011753782629966736, -0.011560337617993355, 0.042401909828186035, 0.01742706447839737, -0.016193179413676262, -0.015626953914761543, 0.01809951476752758, -0.013486414216458797, 0.05121060833334923, 0.04361537843942642, 0.02023041434586048, 0.033356923609972, 0.01827261783182621, -0.02336309477686882, 0.013343027792870998, -0.033495862036943436, -0.021350154653191566, 0.07325061410665512, -0.012360010296106339, 0.03039928711950779, -0.04877610132098198, 0.009006082080304623, -0.024131424725055695, 0.011627141386270523, 0.029163958504796028, 0.01608617790043354, 0.06448400020599365, -0.03661284223198891, 0.015647592023015022, -0.004488649778068066, -0.039592038840055466, 0.028375636786222458, 0.009019126184284687, 0.042142897844314575, 0.015107478015124798, 0.01696348935365677, 0.001594932284206152, 0.04240649566054344, 0.01627621240913868, 0.0016775354743003845, -0.06172338128089905, 0.01651819795370102, 0.016346583142876625, 0.003767977934330702, 0.03326999023556709, 0.03263920173048973, 0.033231645822525024, 0.010786733590066433, 0.013637762516736984, -0.026318805292248726, 0.06833741813898087, 0.02849600836634636, 0.007687230594456196, -0.053022272884845734, 0.011708362959325314, -0.008576362393796444, -0.016171930357813835, -0.007763010915368795, 0.03261803463101387, 0.01678888313472271, 0.012416249141097069, 0.00897963996976614, -0.010406003333628178, 0.07460719347000122, 0.010480720549821854, 0.02685808576643467, 0.018592767417430878, -0.005825844593346119, 0.017785441130399704, -0.05594264343380928, 0.039596617221832275, -0.006403950043022633, -0.03710291162133217, 0.019536325708031654, -0.017828939482569695, 0.009722371585667133, 0.027111709117889404, 0.004760473035275936, -0.04541614651679993, -0.01273305993527174, -0.003585611004382372, -0.05254477262496948, -0.029230743646621704, -0.041338589042425156, -0.03115028887987137, 0.030279966071248055, -0.0175043772906065, -0.010006062686443329, 0.007332868408411741, 0.0034796998370438814, -0.07500150799751282, -0.017533743754029274, -0.0019460401963442564, 0.04960237443447113, 0.037081409245729446, -0.03608382120728493, 0.026143530383706093, -0.03841753676533699, 0.0680161565542221, -0.019936896860599518, 0.012672101147472858, 0.006700278725475073, -0.00541430851444602, -0.03146914765238762, 0.08149436116218567, 0.03490617871284485, -0.0882243886590004, -0.05242807790637016, 0.0060660382732748985, 0.005201791413128376, -0.007147623226046562, -0.010148906148970127, -0.017412472516298294, -0.041571054607629776, 0.004562534391880035, -0.006886942777782679, 0.030223270878195763, -0.014288891106843948, -0.06147239729762077, -0.023474542424082756, -0.00042716896859928966, -0.004371347837150097, 0.13663944602012634, 0.03271247819066048, -0.00744217773899436, -0.02015269547700882, 0.033815257251262665, 0.0175873301923275, 0.010461780242621899, 0.021290937438607216, 0.019136672839522362, 0.04297681897878647, -0.03435078263282776, 0.023254167288541794, -0.05730059742927551, -0.017826490104198456, 0.003571450710296631, -0.012805739417672157, 0.013203319162130356, -0.018117476254701614, 0.009594768285751343, 0.012546363286674023, -0.021365936845541, -0.011964722536504269, -0.04805305227637291, 0.04003211483359337, 0.05739491432905197, -0.0483238510787487, -0.015819253399968147, 0.07700393348932266, -0.012757106684148312, 0.05151572450995445, 0.05521472170948982, -0.006753029767423868, -0.0057896715588867664, 0.010394951328635216, 0.002361587481573224, 0.000747988058719784, -0.008461506105959415, -0.07311912626028061, -0.03628796339035034, 0.05641316622495651, 0.016635600477457047, 0.02174631878733635, 0.005375778768211603, 0.060076747089624405, 0.02465786039829254, 0.055420856922864914, -0.02514880895614624, 0.008257453329861164, -0.009999371133744717, 0.007029465865343809, -0.06820139288902283, -0.0028257579542696476, -0.014921392314136028, 0.006841226015239954, -0.03352511674165726, 0.035307757556438446, 0.034895215183496475, 0.0034501629415899515, 0.013075599446892738, 0.0027024992741644382, -0.027817778289318085, 0.014184064231812954, -0.04449447989463806, -0.10315562784671783, -0.006870269775390625, -0.02596263960003853, 0.02792181819677353, -0.005587881896644831, -0.004875448998063803, 0.04713517054915428, -0.022023627534508705, -0.05460619926452637, 0.0471685454249382, -0.0924735739827156, -0.0003413876402191818, 0.0016974554164335132, 0.002821990055963397, 0.03196147829294205, -0.0036092365626245737, -0.0008493000059388578, 0.0459713488817215, 0.047094132751226425, -0.033592745661735535, 0.05015481635928154, -0.005313350353389978, 0.023843586444854736, 0.066672183573246, -0.01851898431777954, -0.02704344131052494, 0.0035750519018620253, -0.043833304196596146, -0.006541137583553791, 0.015382961370050907, 0.054175857454538345, 0.020561786368489265, -0.01360496785491705, -0.0316595658659935, -0.014386347495019436, -0.012991293333470821, 0.022083396092057228, -0.009722176939249039, -0.010408229194581509, 0.030620716512203217, 0.012967919930815697, -0.05106015130877495, 0.035644400864839554, -0.05865604802966118, -0.038381658494472504, -0.015894511714577675, 0.02397276647388935, -0.02201148308813572, -0.04172242805361748, -0.03432030603289604, 0.021079018712043762, -0.018125882372260094, 0.02686733938753605, 0.001940704882144928, 0.006762257311493158, 0.0549536757171154, -0.019271332770586014, 0.0054336790926754475, 0.002715972252190113, 0.021183861419558525, 0.017679056152701378, -0.015626464039087296, 0.03714105486869812, -0.0514628067612648, 0.07506392896175385, 0.024151958525180817, -0.012376755475997925, -0.03313298895955086, 0.04844049736857414, 0.022122422233223915, -0.04490162804722786, -0.02700832672417164, 0.001581602031365037, -0.055697038769721985, 0.022010469809174538, 0.05736992508172989, 0.02725932188332081, 0.04387751966714859, -0.01949669048190117, -0.036421921104192734, 0.014568613842129707, -0.03256143257021904, -0.06027417257428169, 0.00467876298353076, 0.05188950523734093, 0.029664307832717896, 0.036481183022260666, 0.02236274816095829, -0.007175090257078409, -0.00922755803912878, -0.00996395107358694, -0.01925850845873356, -0.003458652412518859, -0.007193622644990683, 0.0760287195444107, -0.07645963877439499, -0.001986706629395485, 0.01924012042582035, 0.02547403611242771, 0.015276525169610977, -0.002434811322018504, -0.023279378190636635, 0.007963144220411777, -0.019890913739800453, -0.06749679893255234, -0.00269368477165699, 0.019425945356488228, -0.03872581571340561, 0.045933593064546585, -0.057660676538944244, -0.023669904097914696, -0.0002989989297930151, -0.04594632610678673, -0.01796506904065609, 0.009114691987633705, -0.0026603280566632748, -0.030732043087482452, -0.006153917871415615, -0.05214850604534149, 0.017981255427002907, 0.030268365517258644, 0.027224373072385788, -0.026168514043092728, 0.006116398610174656, -0.09696870297193527, -0.052056532353162766, -0.05806194990873337, 0.025156082585453987, 0.03248655050992966, 0.04207954928278923, 0.027716390788555145, 0.07465996593236923, -0.030829977244138718, 0.01914070174098015, -0.038477759808301926, 0.003418478649109602, 0.01380791887640953, 0.04617342725396156, 0.0322195366024971, 0.05447426438331604, 0.026217833161354065, 0.017040016129612923, 0.05920951068401337, 0.08289343863725662, -0.0016060096677392721, -0.03412627428770065, 0.051659297198057175, 0.044387899339199066, 0.016087546944618225, -0.013531801290810108, -0.053853463381528854, -0.01549070980399847, -0.026151955127716064, 0.013041652739048004, -0.04869207739830017, -0.045520346611738205, 0.044269122183322906, -0.0005854396149516106, -0.007095281034708023, -0.028690923005342484, 0.02107657678425312, -0.06382524222135544, 0.01946890912950039, 0.016513187438249588, -0.006017669569700956, -0.02650108002126217, -0.008004686795175076, -0.005938094109296799, 0.05151036009192467, 0.08207043260335922, -0.015559064224362373, -0.030441613867878914, -0.03611428290605545, 0.036825746297836304, 0.08147342503070831, 0.06271284073591232, 0.0138129573315382, -0.08564220368862152, -0.009011520072817802, -0.060650378465652466, -0.019033733755350113, -0.015941092744469643, 0.0012677423655986786, 0.005306144244968891, -0.013587048277258873, 0.006077124737203121, 0.01135781966149807, 0.02398698963224888, 0.03029831498861313, -0.04704483970999718, -0.0017800189089030027, 0.031199628487229347, -0.03671538829803467, 0.025005007162690163, 0.05154069885611534, -0.010397246107459068, -0.014512952417135239, 0.0002844708214979619, 0.013077836483716965, -0.02597816288471222, -0.041518911719322205, -0.06034376099705696, -0.02533825673162937, 0.0057769641280174255, 0.010067281313240528, 0.040516167879104614, 0.01124226301908493, 0.05546366795897484, -0.0383976474404335, -0.04635262116789818, 0.00451926002278924, 0.0074876295402646065, 0.05953868478536606, -0.01706620305776596, -0.022760093212127686, -0.00487673981115222, -0.001997151644900441, -0.07449643313884735, 0.04713372141122818, -0.02409207448363304, 0.02161566913127899, -0.0316925048828125, 0.05660460889339447, 0.004477438982576132, -0.009653101675212383, 0.0545215979218483, -0.010513699613511562, -0.013578704558312893, -0.036784347146749496, -0.03356889635324478, -0.026856353506445885, -0.024890422821044922, -0.058577604591846466, -0.005681315902620554, -0.005090511403977871, -0.011252435855567455, -0.011662655510008335, 0.013907620683312416, -0.04636986181139946, 0.015272825956344604, 0.0015955079579725862, -0.01105506531894207, 0.024654245004057884, -0.10843000560998917, -0.013360092416405678, 0.006700391415506601, -0.0016775388503447175, 0.025766419246792793, -0.0157511867582798, 0.0208958201110363, 0.1340246945619583, 0.03025158680975437, 0.028975633904337883, -0.01683378592133522, -0.03967336565256119, 0.008797859773039818, -0.013110505416989326, -0.035762809216976166, 0.006320158485323191, -0.007143684197217226], index=0, object='embedding')], model='nomic-ai/nomic-embed-text-v1.5-GGUF/nomic-embed-text-v1.5.Q8_0.gguf', object='list', usage=Usage(prompt_tokens=0, total_tokens=0))

import giskard
import pandas as pd

def model_predict(df: pd.DataFrame):
    return [climate_qa_chain.run({"query": question}) for question in df["question"]]

giskard_model = giskard.Model(
    model=model_predict,
    model_type="text_generation",
    name="Climate Change Question Answering",
    description="This model answers any question about climate change based on IPCC reports",
    feature_names=["question"],
    base_url="https://localhost:5000/v1"  # Ensure this points to your LM Studio
)

responds

2024-06-20 16:57:27,482 pid:3360791 MainThread giskard.models.automodel INFO     Your 'prediction_function' is successfully wrapped by Giskard's 'PredictionFunctionModel' wrapper class.

then

import pandas as pd

def generate_test_dataset_with_lm_studio(model, num_samples=100):
    # Use LM Studio to generate synthetic data
    # Replace this with your actual implementation to generate data using LM Studio
    questions = ["What is climate change?", "How does climate change affect the environment?", "What are the causes of climate change?"] * (num_samples // 3)
    data = pd.DataFrame({"question": questions})
    return data

# Step 1: Test LM Studio Client
def test_lm_studio():
    response = client.embeddings.create(
        input="Test embedding",
        model="model-identifier"
    )
   # print(response)

test_lm_studio()

# Step 2: Define the Giskard model
def model_predict(df: pd.DataFrame):
    return [climate_qa_chain.run({"query": question}) for question in df["question"]]

giskard_model = giskard.Model(
    model=model_predict,
    model_type="text_generation",
    name="Climate Change Question Answering",
    description="This model answers any question about climate change based on IPCC reports",
    feature_names=["question"],
    base_url="https://localhost:5000/v1"
)

# Step 3: Generate the test dataset with LM Studio
test_dataset = generate_test_dataset_with_lm_studio(giskard_model)

# Ensure the test dataset has the expected structure
print(test_dataset)

returns

2024-06-20 17:08:52,178 pid:3360791 MainThread giskard.models.automodel INFO     Your 'prediction_function' is successfully wrapped by Giskard's 'PredictionFunctionModel' wrapper class.
                                           question
0                           What is climate change?
1   How does climate change affect the environment?
2            What are the causes of climate change?
3                           What is climate change?
4   How does climate change affect the environment?
..                                              ...
94  How does climate change affect the environment?
95           What are the causes of climate change?
96                          What is climate change?
97  How does climate change affect the environment?
98           What are the causes of climate change?

[99 rows x 1 columns]
# Step 4: Wrap the dataset with giskard.Dataset
giskard_dataset = giskard.Dataset(
    df=test_dataset,
    name="Climate Change Questions",
    target="question",
    column_types={"question": "text"}
)

returns, however there is a column

/home/michael/anaconda3/envs/giskard/lib/python3.9/site-packages/giskard/datasets/base/__init__.py:443: UserWarning: The provided keys ['question'] in 'column_types' are not part of your dataset 'columns'. Please make sure that the column names in `column_types` refers to existing columns in your dataset.
  warning(
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[61], line 2
      1 # Step 4: Wrap the dataset with giskard.Dataset
----> 2 giskard_dataset = giskard.Dataset(
      3     df=test_dataset,
      4     name="Climate Change Questions",
      5     target="question",
      6     column_types={"question": "text"}
      7 )

File /home/michael/anaconda3/envs/giskard/lib/python3.9/site-packages/pydantic/validate_call_decorator.py:59, in validate_call.<locals>.validate.<locals>.wrapper_function(*args, **kwargs)
     57 @functools.wraps(function)
     58 def wrapper_function(*args, **kwargs):
---> 59     return validate_call_wrapper(*args, **kwargs)

File /home/michael/anaconda3/envs/giskard/lib/python3.9/site-packages/pydantic/_internal/_validate_call.py:81, in ValidateCallWrapper.__call__(self, *args, **kwargs)
     80 def __call__(self, *args: Any, **kwargs: Any) -> Any:
---> 81     res = self.__pydantic_validator__.validate_python(pydantic_core.ArgsKwargs(args, kwargs))
     82     if self.__return_pydantic_validator__:
     83         return self.__return_pydantic_validator__(res)

File /home/michael/anaconda3/envs/giskard/lib/python3.9/site-packages/giskard/datasets/base/__init__.py:211, in Dataset.__init__(self, df, name, target, cat_columns, column_types, id, validation, original_id)
    208 if validation:
    209     from giskard.core.dataset_validation import validate_column_types
--> 211     validate_column_types(self)
    213 if validation:
    214     from giskard.core.dataset_validation import (
    215         validate_column_categorization,
    216         validate_numeric_columns,
    217     )

File /home/michael/anaconda3/envs/giskard/lib/python3.9/site-packages/giskard/core/dataset_validation.py:94, in validate_column_types(ds)
     89         raise ValueError(
     90             f"Invalid column_types parameter: {ds.column_types}"
     91             + f"Please choose types among {[column_type.value for column_type in SupportedColumnTypes]}."
     92         )
     93 else:
---> 94     raise ValueError(f"Invalid column_types parameter: {ds.column_types}. Please specify non-empty dictionary.")
     96 df_columns_set = set(ds.columns)
     97 df_columns_set.discard(ds.target)

ValueError: Invalid column_types parameter: {}. Please specify non-empty dictionary.


### Relevant log output

_No response_
@osok
Copy link
Author

osok commented Jun 21, 2024

It is worth noting that

# Step 4: Wrap the dataset with giskard.Dataset
giskard_dataset = giskard.Dataset(
    df=test_dataset,
    name="Climate Change Questions"
)

works just fine.

@osok osok closed this as completed Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant