Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dataframe filter to filter tabular data #51

Merged
merged 4 commits into from
Jul 7, 2023

Conversation

aiakide
Copy link
Collaborator

@aiakide aiakide commented Jul 7, 2023

📥 Pull Request Description

A DataframeFilter is used to filter tabular data in a DfDataset.
Besides the abstract implementation, a NaNDataframeFilter has also been implemented, which removes rows with NaN values in the input and target columns of the data description.

👀 Affected Areas

  • DataframeFilter (new)

📝 Checklist

Please make sure you've completed the following tasks before submitting this pull request:

  • Pre-commit hooks were executed
  • Changes have been reviewed by at least one other developer
  • Tests have been added or updated to cover the changes (only necessary if the changes affect the executable code)
  • All tests ran successfully
  • All merge conflicts are resolved
  • Documentation has been updated to reflect the changes
  • Any necessary migrations have been run

📌 Related Issues

None

🔗 Links

None

📷 Screenshots

None

@aiakide aiakide requested review from dstalzjohn and ankeko July 7, 2023 15:25
@aiakide aiakide merged commit 3561200 into develop Jul 7, 2023
6 checks passed
@aiakide aiakide deleted the feature/dataframe-filter branch July 7, 2023 20:54
@aiakide aiakide mentioned this pull request Jul 10, 2023
7 tasks
aiakide added a commit that referenced this pull request Jul 10, 2023
## 📥 Pull Request Description

The following features and fixes will be part of the next release
(`v0.6.0`)

- feat: Add lockfile name as attribute of `FileChecksumProcessor ` (#46)
- fix: Remove temp directory from hydra search path. Add hydra config
mapping factory (#47)
- fix: save result files from `tensorgraphanalyzer` at the correct place
and implemented validation for that (#50)
- feat: Add dagster op for dataframe normalization (#48)
- feat: Add `NanDataframeFilter` to drop nan values of feature columns
(#51)
- fix: Adjust supported python versions in `Getting Started` docs
section

Additionally, there are several adjustments to the project organization

- Pull request template added
- Bug Report template added
- Code of Conduct added

## 👀 Affected Areas

- `FileChecksumProcessor `
- dagster ops
   - `df_normalization`
- `DataframeFilter`
   - `NanDataframeFilter`
- `tensorgraphanalyzer` 
- docs

## 📝 Checklist

Please make sure you've completed the following tasks before submitting
this pull request:

- [X] Pre-commit hooks were executed
- [ ] Changes have been reviewed by at least one other developer
- [X] Tests have been added or updated to cover the changes (only
necessary if the changes affect the executable code)
- [X] All tests ran successfully
- [X] All merge conflicts are resolved
- [X] Documentation has been updated to reflect the changes
- [X] Any necessary migrations have been run

## 📌 Related Issues

_None_

## 🔗 Links

_None_

## 📷 Screenshots

_None_
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants