Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to test the LLM-based models for stereotypes and discrimination using a saved dataset #1893

Closed
rudziankou opened this issue Apr 15, 2024 · 3 comments
Labels
question Further information is requested

Comments

@rudziankou
Copy link

Can I generate prompts before the test, save them to a file, and reuse them instead of using OpenAI for prompt generation?

@rudziankou rudziankou added the question Further information is requested label Apr 15, 2024
@luca-martial
Copy link
Member

Hi @rudziankou, you would be able to do that by using one of the tests from the catalog directly: https://docs.giskard.ai/en/stable/knowledge/catalogs/test-catalog/text_generation/index.html

Depends what your saved dataset looks like, but you could pick a requirement or ground truth test. Otherwise, if you are asking the question because you don't want to use OpenAI but another model for generating prompts, our latest release allows to do that. I can share the docs with you as soon as they're ready.

Does that answer your question?

@rudziankou
Copy link
Author

Hi @luca-martial, thanks for getting back to me. Below are the use cases I'm considering. Could you please give me some examples of how to implement them if it's even possible?

Use case:
I have an LLM model that operates sensitive customer and company data. I want to use the Giskard Detectors for LLM models to test my model.

Scenarios:

  1. Use another model for generating prompts and use another model for evaluating responses. (DO NOT USE OpenAI for prompts and evaluation).
  2. Use a list of saved prompts and use another model for evaluating responses. (DO NOT USE OpenAI for prompts and evaluation. The list of prompts is saved locally in a file and can be updated later).
  3. Use another model for generating prompts and use OpenAI for evaluating responses. (USE OpenAI for evaluation only).
  4. Use a list of saved prompts and use OpenAI for evaluating responses. (USE OpenAI for evaluation only. The list of prompts is saved locally in a file and can be updated later).
  5. Use OpenAI for generating prompts and use another model for evaluating responses. (USE OpenAI for prompts only).

@kevinmessiaen
Copy link
Member

kevinmessiaen commented Apr 17, 2024

Hello @rudziankou

Actually our scan only generate a dataset is None is provided. This means that you can generate a dataset using any LLM and then use it in the scan using another LLM of your choice:

import giskard
from giskard.llm.client import set_default_client
from giskard.llm.utils import generate_test_dataset

giskard_model = ... # Wrap your model here

set_default_client(...)
giskard_dataset = generate_test_dataset(giskard_model)

set_default_client(...)
scan_result = giskard.scan(giskard_model, giskard_dataset)

You can load and save a list of saved prompt since they are wrapped into giskard.Dataset

Hope this answer all your questions, a final note is that the scan allow you to generate a test suite using scan_result.generate_test_suite(), this test suite can be re-runned and will be faster than doing a full scan. Furthermore it is customizable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Development

No branches or pull requests

3 participants