Is there a way to test the LLM-based models for stereotypes and discrimination using a saved dataset #1893

rudziankou · 2024-04-15T21:08:43Z

Can I generate prompts before the test, save them to a file, and reuse them instead of using OpenAI for prompt generation?

luca-martial · 2024-04-16T08:36:26Z

Hi @rudziankou, you would be able to do that by using one of the tests from the catalog directly: https://docs.giskard.ai/en/stable/knowledge/catalogs/test-catalog/text_generation/index.html

Depends what your saved dataset looks like, but you could pick a requirement or ground truth test. Otherwise, if you are asking the question because you don't want to use OpenAI but another model for generating prompts, our latest release allows to do that. I can share the docs with you as soon as they're ready.

Does that answer your question?

rudziankou · 2024-04-16T21:48:58Z

Hi @luca-martial, thanks for getting back to me. Below are the use cases I'm considering. Could you please give me some examples of how to implement them if it's even possible?

Use case:
I have an LLM model that operates sensitive customer and company data. I want to use the Giskard Detectors for LLM models to test my model.

Scenarios:

Use another model for generating prompts and use another model for evaluating responses. (DO NOT USE OpenAI for prompts and evaluation).
Use a list of saved prompts and use another model for evaluating responses. (DO NOT USE OpenAI for prompts and evaluation. The list of prompts is saved locally in a file and can be updated later).
Use another model for generating prompts and use OpenAI for evaluating responses. (USE OpenAI for evaluation only).
Use a list of saved prompts and use OpenAI for evaluating responses. (USE OpenAI for evaluation only. The list of prompts is saved locally in a file and can be updated later).
Use OpenAI for generating prompts and use another model for evaluating responses. (USE OpenAI for prompts only).

kevinmessiaen · 2024-04-17T10:50:12Z

Hello @rudziankou

Actually our scan only generate a dataset is None is provided. This means that you can generate a dataset using any LLM and then use it in the scan using another LLM of your choice:

import giskard
from giskard.llm.client import set_default_client
from giskard.llm.utils import generate_test_dataset

giskard_model = ... # Wrap your model here

set_default_client(...)
giskard_dataset = generate_test_dataset(giskard_model)

set_default_client(...)
scan_result = giskard.scan(giskard_model, giskard_dataset)

You can load and save a list of saved prompt since they are wrapped into giskard.Dataset

Hope this answer all your questions, a final note is that the scan allow you to generate a test suite using scan_result.generate_test_suite(), this test suite can be re-runned and will be faster than doing a full scan. Furthermore it is customizable.

rudziankou added the question Further information is requested label Apr 15, 2024

luca-martial closed this as completed Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to test the LLM-based models for stereotypes and discrimination using a saved dataset #1893

Is there a way to test the LLM-based models for stereotypes and discrimination using a saved dataset #1893

rudziankou commented Apr 15, 2024

luca-martial commented Apr 16, 2024

rudziankou commented Apr 16, 2024

kevinmessiaen commented Apr 17, 2024 •

edited

Loading

Is there a way to test the LLM-based models for stereotypes and discrimination using a saved dataset #1893

Is there a way to test the LLM-based models for stereotypes and discrimination using a saved dataset #1893

Comments

rudziankou commented Apr 15, 2024

luca-martial commented Apr 16, 2024

rudziankou commented Apr 16, 2024

kevinmessiaen commented Apr 17, 2024 • edited Loading

kevinmessiaen commented Apr 17, 2024 •

edited

Loading