-
-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: add data quality tests in Giskard #1601
Comments
Hi, I would like to work on this issue |
Hi @Kranium2002 Thanks, I assigned you to the issue. Feel free to ask us if you have any question! |
I had a question, should I work in giskard/utils and create a file for data quality tests for user? User will pass a pandas df and then the system will check its quality. How does this sound? |
In order to have the tests to integrate smoothly with Giskard, it's better to use import giskard # You'll have to use relative import of used objects to prevent circular import issue
@giskard.test(name="My example data quality test")
def example_quality_test(dataset: giskard.Dataset, column: str, threshold: float=0.5):
# Sample test that check if uniqueness ratio is greater than a threshold
column = dataset.df[column]
uniqueness = len(column.unique()) / len(column)
return giskard.TestResult(passed=uniqueness > threshold)
# Trying my test
dataset = giskard.Dataset(pd.DataFrame({'test': [1, 2, 3, 2, 4, 1]}))
assert example_quality_test(dataset, 'test').execute().passed
assert not example_quality_test(dataset, 'test', 1).execute().passed
assert example_quality_test(dataset, 'test', 0).execute().passed We organized Giskard so that tests are under |
Working on this in #1651 |
🚀 Feature Request
Giskard currently focuses on model quality testing, but since ML models are heavily dependent on the data they are trained on, data quality testing is of high interest. We are looking to implement various data quality tests and are open to community contributions.
Examples of tests to add:
1. Data Completeness Test
2. Data Uniqueness Test
3. Data Range and Validity Test
4. Data Correlation Test
5. Data Anomaly Detection Test
6. Data Integrity Test
7. Label Consistency Test
8. Class Imbalance Test
9. Feature Importance Test
10. Label Noise Detection Test
🔈 Motivation
This will enhance the completeness of Giskard's testing capabilities.
The text was updated successfully, but these errors were encountered: