Soup #7

plutasnyy · 2019-09-07T10:27:07Z

damianhorna · 2019-09-07T11:24:25Z

multi_imbalance/resampling/SOUP.py

+ """
+ Similarity Oversampling and Undersampling Preprocessing (SOUP) is an algorithm that equalizes number of samples
+ in each class. It also takes care of the similarity between classes, which means that it removes samples from
+ majority class, that are close to samples from the other class and duplicate samples from th minority classes,


you have a typo in this line

damianhorna · 2019-09-07T11:26:12Z

examples/resampling/SOUP.ipynb

@@ -0,0 +1,163 @@
+{


examples in ipynb format are good idea! 👍

hancia · 2019-09-07T11:52:43Z

multi_imbalance/utils/plot.py

+
+def plot_multi_dimensional_data(X, y, ax=None):
+ """
+ This function reduce quantity of dimensions to 2 principal components and prepare pretty scatter plot for your data


reduces, prepares

hancia · 2019-09-07T11:52:49Z

multi_imbalance/utils/plot.py

+ y = pd.DataFrame({'y': y})
+
+ X_df = pd.DataFrame(data=X, columns=['x1', 'x2'])
+ df = pd.concat([X_df, y], axis=1)


Why not extract the data preparation to a separate method?

100% agree that manipulating on data in function for plotting was bad. I decided to change this function to be only for manipulating data and moved rest to notebook

hancia · 2019-09-07T12:23:42Z

multi_imbalance/resampling/SOUP.py

+ for sample_id in indices_in_class:
+ neighbours_indices = self.neigh_clf.kneighbors([list(X[sample_id])], return_distance=False)
+ neighbours_classes = y[neighbours_indices[0]]
+ neighbours_quantities = Counter(neighbours_classes)


The body of the loop could possibly be extracted as a method

I agree. I extracted but also decided to don't create unit tests for function only with sklearn knn and built-in Counter - nothing to test ;)

hancia · 2019-09-07T12:42:10Z

multi_imbalance/resampling/tests/test_soup.py

@@ -0,0 +1,140 @@
+from collections import Counter, defaultdict


Pls test with invalid data

plutasnyy · 2019-09-08T08:57:47Z

multi_imbalance/resampling/SOUP.py

+ undersampled_X, undersampled_y = list(), list()
+ for idx, _ in safe_levels_list:
+ undersampled_X.append(X[idx])
+ undersampled_y.append(y[idx])


Comprehension here

plutasnyy · 2019-09-08T09:00:45Z

setup.py

@@ -21,5 +21,8 @@
 install_requires=[
 "numpy>=1.17.0",
 "scikit-learn>=0.21.3",
+ "pandas",
+ "seaborn",


add versions

plutasnyy · 2019-09-08T09:01:26Z

examples/resampling/SOUP.ipynb

+ "import matplotlib.pyplot as plt\n",
+ "\n",
+ "%matplotlib inline\n",
+ "rc = {'text.color':'white','axes.labelcolor':'white', 'xtick.color':'white','ytick.color':'white'}\n",


…ng in soup example and added tests for invalid data

hancia and others added 8 commits August 26, 2019 13:57

Started work on SOUP

a05940e

add calculating safe levels and undersampling

b858b5b

fixed wrong variables names

2855b26

added oversampling

d3de38e

fixed charts

16a08c3

started work on tests

8416828

created unit tests for soup

c6441a4

added documentation

92713c6

plutasnyy requested review from damianhorna, hancia and jacekgry September 7, 2019 10:27

plutasnyy self-assigned this Sep 7, 2019

plutasnyy added this to In progress in multi-imbalance via automation Sep 7, 2019

damianhorna approved these changes Sep 7, 2019

View reviewed changes

hancia requested changes Sep 7, 2019

View reviewed changes

jacekgry approved these changes Sep 7, 2019

View reviewed changes

plutasnyy commented Sep 8, 2019

View reviewed changes

plutasnyy added 5 commits September 10, 2019 22:25

updates after review, added versions for dependencies, changed plotti…

008d5fe

…ng in soup example and added tests for invalid data

added type hints

0b49641

fixed order of columns in soup example

8b1cb33

updated PCA in SOUP example

bfe713a

removed np typing and removed sample from knn neighbours

d27ad42

plutasnyy requested review from hancia and damianhorna September 18, 2019 14:35

hancia approved these changes Sep 20, 2019

View reviewed changes

hancia merged commit a97c284 into develop Sep 24, 2019

multi-imbalance automation moved this from In progress to Done Sep 24, 2019

damianhorna deleted the soup branch October 14, 2019 20:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Soup #7

Soup #7

plutasnyy commented Sep 7, 2019

damianhorna Sep 7, 2019

plutasnyy Sep 10, 2019

damianhorna Sep 7, 2019

plutasnyy Sep 10, 2019

hancia Sep 7, 2019

plutasnyy Sep 10, 2019

hancia Sep 7, 2019

plutasnyy Sep 10, 2019

hancia Sep 7, 2019

plutasnyy Sep 10, 2019

hancia Sep 7, 2019

plutasnyy Sep 10, 2019

plutasnyy Sep 8, 2019

plutasnyy Sep 10, 2019

plutasnyy Sep 8, 2019 •

edited

Loading

plutasnyy Sep 10, 2019

plutasnyy Sep 8, 2019 •

edited

Loading

plutasnyy Sep 10, 2019

		@@ -0,0 +1,140 @@
		from collections import Counter, defaultdict

Soup #7

Soup #7

Conversation

plutasnyy commented Sep 7, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

plutasnyy Sep 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

plutasnyy Sep 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

plutasnyy Sep 8, 2019 •

edited

Loading

plutasnyy Sep 8, 2019 •

edited

Loading