Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to train a multi-label Classifier #741

Closed
xieximeng2008 opened this issue Sep 28, 2015 · 74 comments
Closed

How to train a multi-label Classifier #741

xieximeng2008 opened this issue Sep 28, 2015 · 74 comments

Comments

@xieximeng2008
Copy link

I need train a multi-label softmax classifier, but there is a lot of one-hot code labels in examples, so how to change code to do it?

@elanmart
Copy link

Don't use softmax. Use sigmoid units in the output layer and then use "binary_crossentrpy" loss.

@holderm
Copy link

holderm commented Sep 28, 2015

That works in my case. However model.predict_classes is not "adapted" for this. As an example for a sample from the test set, where target label is 1 0 1 0 0 0 0 (I have 7 in total, )
model.predict(tSets[1,:]): 9.90e-01, 2.7e-07, 6.05e-13, 9.98e-01, 2.16e-05, 7.62e-05, 1.51e-04 (so that is correct), but
model.predict_classes(tSets[1,:]) gives just array([3]) (seems like it picks the highest value from model.predict. A quick fix might be numpy.around but maybe there is a more elegant solution?

@elanmart
Copy link

Getting classes from .predict() is one line of numpy code really.

@lemuriandezapada
Copy link

model.predict(blabla) > 0.5 ?

@arushi02
Copy link

@elanmart Hi, why do you think using softmax is not a good idea?

Do you use a graph model, given we have multiple outputs?

@xieximeng2008
Copy link
Author

my loss is not convergence @holderm @elanmart

model.predict(Y_train[1,:])

it shows [ 0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000]
my complete code:

from __future__ import absolute_import
from __future__ import print_function
import scipy.io
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.optimizers import SGD, Adadelta, Adagrad
from keras.utils import np_utils, generic_utils
from six.moves import range

batch_size = 100
nb_classes = 5
nb_epoch = 5
data_augmentation = True

shapex, shapey = 64, 64

nb_filters = [32, 64]

nb_pool = [4, 3]

nb_conv = [5, 4]

image_dimensions = 3


mat = scipy.io.loadmat('E:\scene.mat')

X_train = mat['x_train']
Y_train = mat['y_train']
X_test =  mat['x_test']
Y_test =  mat['y_test']
print(X_train.shape)
print(X_test.shape)

model = Sequential()

model.add(Convolution2D(nb_filters[0], image_dimensions, nb_conv[0], nb_conv[0], border_mode='valid'))
model.add(Activation('relu'))

model.add(MaxPooling2D(poolsize=(nb_pool[0], nb_pool[0])))
model.add(Dropout(0.25))

model.add(Convolution2D(nb_filters[1], nb_filters[0], nb_conv[1], nb_conv[1], border_mode='valid'))
model.add(Activation('relu'))

model.add(MaxPooling2D(poolsize=(nb_pool[1], nb_pool[1])))
model.add(Dropout(0.25))

model.add(Flatten())

model.add(Dense(nb_filters[-1] * (((shapex - nb_conv[0]+1)/ nb_pool[0] -nb_conv[1]+1)/ nb_pool[1]) * (((shapey -nb_conv[0]+1)/ nb_pool[0] -nb_conv[1]+1)/ nb_pool[1]), 512))
model.add(Activation('relu'))
model.add(Dropout(0.5))

model.add(Dense(512, nb_classes,init='uniform'))
model.add(Activation('sigmoid'))


sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd) 

if not data_augmentation:
    print("Not using data augmentation or normalization")

    X_train = X_train.astype("float32")
    X_test = X_test.astype("float32")
    X_train /= 255
    X_test /= 255
    model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch)
    score = model.evaluate(X_test, Y_test, batch_size=batch_size)
    print('Test score:', score)

else:
    print("Using real time data augmentation")

    # this will do preprocessing and realtime data augmentation
    datagen = ImageDataGenerator(
        featurewise_center=True,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=True,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=20,  # randomly rotate images in the range (degrees, 0 to 180)
        width_shift_range=0.2,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.2,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=True,  # randomly flip images
        vertical_flip=False)  # randomly flip images

    datagen.fit(X_train)
    model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch)
    score = model.evaluate(X_test, Y_test, batch_size=batch_size)
    print (model.predict(X_test[1,:]))

could you help me to find out where it is wrong, thx !

@elanmart
Copy link

@lemuriandezapada yeah,

labels = np.zeros(preds.shape)
labels[preds>0.5] = 1

@arushi02 in softmax when increasing score for one label, all others are lowered (it's a probability distribution). You don't want that when you have multiple labels.
No, you don't need Graph

Here's an example of one of my multilabel nets:

# Build a classifier optimized for maximizing f1_score (uses class_weights)

clf = Sequential()

clf.add(Dropout(0.3))
clf.add(Dense(xt.shape[1], 1600, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(1600, 1200, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(1200, 800, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(800, yt.shape[1], activation='sigmoid'))

clf.compile(optimizer=Adam(), loss='binary_crossentropy')

clf.fit(xt, yt, batch_size=64, nb_epoch=300, validation_data=(xs, ys), class_weight=W, verbose=0)

preds = clf.predict(xs)

preds[preds>=0.5] = 1
preds[preds<0.5] = 0

print f1_score(ys, preds, average='macro')

@xieximeng2008 What does it print during training?

@xieximeng2008
Copy link
Author

@elanmart Using real time data augmentation

Epoch 0

 100/1800 [>.............................] - ETA: 58s - loss: 8.1209��������������������������������������������������������������������
 200/1800 [==>...........................] - ETA: 55s - loss: 6.7125��������������������������������������������������������������������
 300/1800 [====>.........................] - ETA: 51s - loss: 6.2430��������������������������������������������������������������������
 400/1800 [=====>........................] - ETA: 48s - loss: 6.0284��������������������������������������������������������������������
 500/1800 [=======>......................] - ETA: 44s - loss: 6.1214��������������������������������������������������������������������
 600/1800 [=========>....................] - ETA: 40s - loss: 5.9915��������������������������������������������������������������������
 700/1800 [==========>...................] - ETA: 37s - loss: 5.8876��������������������������������������������������������������������
 800/1800 [============>.................] - ETA: 33s - loss: 5.7681��������������������������������������������������������������������
 900/1800 [==============>...............] - ETA: 30s - loss: 5.6844��������������������������������������������������������������������
1000/1800 [===============>..............] - ETA: 27s - loss: 5.6092��������������������������������������������������������������������
1100/1800 [=================>............] - ETA: 23s - loss: 5.5703��������������������������������������������������������������������
1200/1800 [===================>..........] - ETA: 20s - loss: 5.5240��������������������������������������������������������������������
1300/1800 [====================>.........] - ETA: 16s - loss: 5.4976��������������������������������������������������������������������
1400/1800 [======================>.......] - ETA: 13s - loss: 5.4809��������������������������������������������������������������������
1500/1800 [========================>.....] - ETA: 10s - loss: 5.4526��������������������������������������������������������������������
1600/1800 [=========================>....] - ETA: 6s - loss: 5.4486 �������������������������������������������������������������������
1700/1800 [===========================>..] - ETA: 3s - loss: 5.4596�������������������������������������������������������������������
1800/1800 [==============================] - 60s - loss: 5.4326    
Epoch 1

 100/1800 [>.............................] - ETA: 56s - loss: 5.1808��������������������������������������������������������������������
 200/1800 [==>...........................] - ETA: 52s - loss: 5.0979��������������������������������������������������������������������
 300/1800 [====>.........................] - ETA: 49s - loss: 5.1670��������������������������������������������������������������������
 400/1800 [=====>........................] - ETA: 45s - loss: 5.2326��������������������������������������������������������������������
 500/1800 [=======>......................] - ETA: 42s - loss: 5.2554��������������������������������������������������������������������
 600/1800 [=========>....................] - ETA: 39s - loss: 5.2430��������������������������������������������������������������������
 700/1800 [==========>...................] - ETA: 36s - loss: 5.2104��������������������������������������������������������������������
 800/1800 [============>.................] - ETA: 33s - loss: 5.1912��������������������������������������������������������������������
 900/1800 [==============>...............] - ETA: 29s - loss: 5.1716��������������������������������������������������������������������
1000/1800 [===============>..............] - ETA: 26s - loss: 5.1559��������������������������������������������������������������������
1100/1800 [=================>............] - ETA: 23s - loss: 5.1318��������������������������������������������������������������������
1200/1800 [===================>..........] - ETA: 19s - loss: 5.1532��������������������������������������������������������������������
1300/1800 [====================>.........] - ETA: 16s - loss: 5.1489��������������������������������������������������������������������
1400/1800 [======================>.......] - ETA: 13s - loss: 5.1512��������������������������������������������������������������������
1500/1800 [========================>.....] - ETA: 9s - loss: 5.1642 �������������������������������������������������������������������
1600/1800 [=========================>....] - ETA: 6s - loss: 5.1549�������������������������������������������������������������������
1700/1800 [===========================>..] - ETA: 3s - loss: 5.1418�������������������������������������������������������������������
1800/1800 [==============================] - 59s - loss: 5.1325    
Epoch 2

 100/1800 [>.............................] - ETA: 56s - loss: 5.2637��������������������������������������������������������������������
 200/1800 [==>...........................] - ETA: 52s - loss: 5.1394��������������������������������������������������������������������
 300/1800 [====>.........................] - ETA: 49s - loss: 5.1117��������������������������������������������������������������������
 400/1800 [=====>........................] - ETA: 46s - loss: 5.0150��������������������������������������������������������������������
 500/1800 [=======>......................] - ETA: 42s - loss: 5.0150��������������������������������������������������������������������
 600/1800 [=========>....................] - ETA: 39s - loss: 4.9874��������������������������������������������������������������������
 700/1800 [==========>...................] - ETA: 36s - loss: 5.0387��������������������������������������������������������������������
 800/1800 [============>.................] - ETA: 32s - loss: 5.0565��������������������������������������������������������������������
 900/1800 [==============>...............] - ETA: 29s - loss: 5.0565��������������������������������������������������������������������
1000/1800 [===============>..............] - ETA: 26s - loss: 5.0813��������������������������������������������������������������������
1100/1800 [=================>............] - ETA: 23s - loss: 5.0942��������������������������������������������������������������������
1200/1800 [===================>..........] - ETA: 19s - loss: 5.0876��������������������������������������������������������������������
1300/1800 [====================>.........] - ETA: 16s - loss: 5.1234��������������������������������������������������������������������
1400/1800 [======================>.......] - ETA: 13s - loss: 5.1305��������������������������������������������������������������������
1500/1800 [========================>.....] - ETA: 9s - loss: 5.1256 �������������������������������������������������������������������
1600/1800 [=========================>....] - ETA: 6s - loss: 5.1316�������������������������������������������������������������������
1700/1800 [===========================>..] - ETA: 3s - loss: 5.1296�������������������������������������������������������������������
1800/1800 [==============================] - 60s - loss: 5.1325    
Epoch 3

 100/1800 [>.............................] - ETA: 56s - loss: 4.7664��������������������������������������������������������������������
 200/1800 [==>...........................] - ETA: 52s - loss: 5.0772��������������������������������������������������������������������
 300/1800 [====>.........................] - ETA: 49s - loss: 5.1394��������������������������������������������������������������������
 400/1800 [=====>........................] - ETA: 46s - loss: 5.1290��������������������������������������������������������������������
 500/1800 [=======>......................] - ETA: 42s - loss: 5.1311��������������������������������������������������������������������
 600/1800 [=========>....................] - ETA: 39s - loss: 5.1601��������������������������������������������������������������������
 700/1800 [==========>...................] - ETA: 36s - loss: 5.1157��������������������������������������������������������������������
 800/1800 [============>.................] - ETA: 33s - loss: 5.1497��������������������������������������������������������������������
 900/1800 [==============>...............] - ETA: 29s - loss: 5.1716��������������������������������������������������������������������
1000/1800 [===============>..............] - ETA: 26s - loss: 5.1891��������������������������������������������������������������������
1100/1800 [=================>............] - ETA: 23s - loss: 5.1695��������������������������������������������������������������������
1200/1800 [===================>..........] - ETA: 19s - loss: 5.1705��������������������������������������������������������������������
1300/1800 [====================>.........] - ETA: 16s - loss: 5.1585��������������������������������������������������������������������
1400/1800 [======================>.......] - ETA: 13s - loss: 5.1660��������������������������������������������������������������������
1500/1800 [========================>.....] - ETA: 9s - loss: 5.1587 �������������������������������������������������������������������
1600/1800 [=========================>....] - ETA: 6s - loss: 5.1394�������������������������������������������������������������������
1700/1800 [===========================>..] - ETA: 3s - loss: 5.1394�������������������������������������������������������������������
1800/1800 [==============================] - 59s - loss: 5.1325    
Epoch 4

 100/1800 [>.............................] - ETA: 55s - loss: 5.1394��������������������������������������������������������������������
 200/1800 [==>...........................] - ETA: 52s - loss: 5.1394��������������������������������������������������������������������
 300/1800 [====>.........................] - ETA: 49s - loss: 5.1117��������������������������������������������������������������������
 400/1800 [=====>........................] - ETA: 45s - loss: 5.1601��������������������������������������������������������������������
 500/1800 [=======>......................] - ETA: 42s - loss: 5.1477��������������������������������������������������������������������
 600/1800 [=========>....................] - ETA: 39s - loss: 5.1808��������������������������������������������������������������������
 700/1800 [==========>...................] - ETA: 36s - loss: 5.1334��������������������������������������������������������������������
 800/1800 [============>.................] - ETA: 32s - loss: 5.1290��������������������������������������������������������������������
 900/1800 [==============>...............] - ETA: 29s - loss: 5.1163��������������������������������������������������������������������
1000/1800 [===============>..............] - ETA: 26s - loss: 5.1311��������������������������������������������������������������������
1100/1800 [=================>............] - ETA: 23s - loss: 5.1431��������������������������������������������������������������������
1200/1800 [===================>..........] - ETA: 19s - loss: 5.1394��������������������������������������������������������������������
1300/1800 [====================>.........] - ETA: 16s - loss: 5.1298��������������������������������������������������������������������
1400/1800 [======================>.......] - ETA: 13s - loss: 5.1423��������������������������������������������������������������������
1500/1800 [========================>.....] - ETA: 9s - loss: 5.1338 �������������������������������������������������������������������
1600/1800 [=========================>....] - ETA: 6s - loss: 5.1161�������������������������������������������������������������������
1700/1800 [===========================>..] - ETA: 3s - loss: 5.1174�������������������������������������������������������������������
1800/1800 [==============================] - 59s - loss: 5.1325    

testing...

100/200 [==============>...............] - ETA: 1s��������������������������������������������������
200/200 [==============================] - 2s     
[[  0.00000000e+000   0.00000000e+000   0.00000000e+000   0.00000000e+000
    0.00000000e+000]
 [  0.00000000e+000   0.00000000e+000   0.00000000e+000   0.00000000e+000
    0.00000000e+000]
 [  0.00000000e+000   0.00000000e+000   0.00000000e+000   0.00000000e+000
    0.00000000e+000]
 [  0.00000000e+000   0.00000000e+000   0.00000000e+000   0.00000000e+000
    0.00000000e+000]
[  1.22857558e-291   0.00000000e+000   3.11779756e-297   0.00000000e+000
    0.00000000e+000]
.........
.........

almost all outputs are zero or very very small float num

@xieximeng2008
Copy link
Author

@elanmart I used your example ,but also have above problems. dataset : X_train (1800,3,64,64),
X_test(200,3,64,64) Y_train(1800,5),Y_test(200,5)
I just change the code as you listed

model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,validation_data = (X_test,Y_test),verbose = 0)
    preds = model.predict(X_test)
    preds[preds>= 0.5] = 1
    preds[preds<0.5] = 0
    print (preds)

Thanks for helping me!

@elanmart
Copy link

@xieximeng2008 I'd guess the problem is in your data, since the network worked well for me few days ago.

@arushi02
Copy link

@elanmart

Suppose I want to identify a house no 5436 from an image and I assume every image will have max 4 digits, so one image will be tagged with 4 one hot vectors like

[(0000010000), (0000100000), (0001000000), (0000001000)] and I pass this as a 2D matrix then will it give me probabilities for each element? In this kind of tagging, I want every row to have one element which is most probable (following a probability distribution).

@vosybac
Copy link

vosybac commented Jan 18, 2016

Does anyone know how to replace the default the validation score by the another scoring function printed at every epoch? The scoring function for validation set should be similar to the one implemented for test set. Many thanks.

clf.fit(xt, yt, batch_size=64, nb_epoch=300, validation_data=(xs, ys), class_weight=W, verbose=0)
preds = clf.predict(xs)
preds[preds>=0.5] = 1
preds[preds<0.5] = 0
print f1_score(ys, preds, average='macro')

@suraj-deshmukh
Copy link

@elanmart
i have image dataset, each having multiple label and y for particular image is [1,1,-1,-1,-1] where 1==class present and -1==class not present. my question is how to change y so that keras model will accept that y for trainning the data.

@alyato
Copy link

alyato commented Jul 11, 2016

@suraj-deshmukh ,Do you solve your problem how to load the multi-label data? How do you do it? Do you share your code? Thanks.

@suraj-deshmukh
Copy link

suraj-deshmukh commented Jul 12, 2016

@alyato , Hi I solved my problem but I lost all my codes :( due to hdd failure. But as I said in previous comment my y/target was [1,1,-1,-1,-1] and I converted it into [1,1,0,0,0] where 1 == presence and 0 == absence for all images and passed that data to ConvNet having binary crossentropy as loss function and sigmoid as activation function for output layer.

@alyato
Copy link

alyato commented Jul 13, 2016

@suraj-deshmukh ,Does i understand it like this.
for single label:(total 3)

x y
[1,2,3] [0]
[4,5,6] [1]
[7,8,9] [2]

So i load the train_data and train_label. The format of train_label is [0,1,2].
train_label.shape is (3,)
But for multi-label:(total 3)

x y
[1,2,3] [0,2]
[4,5,6] [1,2]
[7,8,9] [0,1]

Then The format of train_label is [ [1,0,1],[0,1,1],[1,1,0] ]
train_label.shape is (3,3)

Is that right?
If it are right,i also have one question.

for single label,The format of train_label is [0,1,2].And i need call the function (np_utils.to_categorical),converting it to the one-hot format

for multi-label ,The format of train_label is [ [1,0,1],[0,1,1],[1,1,0] ]
I don't call the function (np_utils.to_categorical)

@suraj-deshmukh
Copy link

suraj-deshmukh commented Jul 13, 2016

@alyato
yes you are right

@alyato
Copy link

alyato commented Jul 14, 2016

@suraj-deshmukh ,Thanks for your answer. But i also have some questions.

preds[preds>=0.5] = 1
preds[preds<0.5] = 0

  1. how to set the Threshold,such as 0.5
  2. If i gets my predict_test_label,how can i compare it with the real_test_label.

the predict_test_label is
[[1,0,1],
[0,1,1],
[1,1,0]]
and the real_test_label is
[[1,0,0],
[1,0,1],
[1,1,0]]

how to measure my model is better or worse?

@XuesongYang
Copy link

@elanmart
"in softmax when increasing score for one label, all others are lowered (it's a probability distribution). You don't want that when you have multiple labels."

I am kind of disagree with the conclusion. Maybe I am wrong.
softmax is just to calculate a normalized exponential value (probability) for each node in the output layer. Assuming there are two target labels out of seven for example, the neural network tries to predict top two posterior probabilities in the specific nodes, and the two probs are definitely the same.

@ritchieng
Copy link

Hi, I'm trying to classify an image with multiple digits. Say an image with "123" to output "123". There are up to 5 digits.

I'm stuck after I built the convolution layers. How do we output 5 digits each with 10 classes? Some suggested 5 independent fully connected layers after the final convolution layer. But how do we code this in Keras for the 5 independent FCs?

@janmatias
Copy link

@xieximeng2008 Did you ever find out why your network only returned values close to zero? I am in a similar situation where my network only returns zeroes. I am fine-tuning an InceptionV3 model. Loss function is binary_crossentropy, I am using sigmoid as activation for the final layer, and as an optimizer I use rmsprop.

@suraj-deshmukh
Copy link

@xieximeng2008 check this https://suraj-deshmukh.github.io/Multi-Label-Image-Classification/

@yuan6785
Copy link

yuan6785 commented Dec 8, 2016

like this! modify sgd to Adam, could dec loss! thank @elanmart
@xieximeng2008 , i use this cnn same with you!
cnn --- sigmoid binary_crossentropy adam, this is all!

@michelleowen
Copy link

This thread is really helpful!
I have another question. What if my response data is partially missing, i.e. say I have five classes, and most of the data only have partial information on responses, e.g. [1,0,NaN,NaN,1].
I know I can build individual model for each class, but what if I want to build one single model?

@janmatias
Copy link

@michelleowen I am in no way an expert, but could it maybe work to set the NaN values to 0.5? This might not work in general, and it might be that this value should be tweaked dependent on the problem.

@michelleowen
Copy link

@janmatias Yes, I agree it is one workaround, but not perfect. I am thinking to modify the loss function, if the true response is NaN, then don't penalize it in the loss function. However, I am not quite sure which part of the keras code I should modify.

@james97
Copy link

james97 commented Mar 6, 2017

Awesome! I still have a question. If the dataset is quite imbalanced, i.e. samples in some categories are much more than others, how can I adopt class_weight to solve this to get a multi-label prediction? Can anybody answer me? @suraj-deshmukh @xieximeng2008

@tobigue
Copy link

tobigue commented Nov 14, 2017

@pieroit @bryan831 you could try to give more weight to positive targets in the loss function.

If you use the tensorflow backend of keras you can use tf.nn.weighted_cross_entropy_with_logits like this: https://stackoverflow.com/a/47313183/979377

Would be interested to hear if this worked for you and how you set the POS_WEIGHT in relation to your number of classes!

@hpts23
Copy link

hpts23 commented Nov 15, 2017

Hi! I am facing a bit different problem in training multi-label classifier.
I use sigmoid and binary cross entropy for training,
however, the network's output got almost same values among images, like below.
I have 200 classes, and now its output is not appropriate.

    input_tensor = Input(shape=(img_rows, img_cols, n_channels))
    vgg16 = VGG16(include_top=False, weights='imagenet', input_tensor=input_tensor)
    top_model = Sequential()
    top_model.add(Flatten(input_shape=vgg16.output_shape[1:]))
    top_model.add(Dense(4096, activation='relu'))
    top_model.add(Dropout(0.5))
    top_model.add(Dense(4096, activation='relu'))
    top_model.add(Dropout(0.5))
    top_model.add(Dense(nb_classes, activation='sigmoid', init='glorot_uniform'))
    model = Model(input=vgg16.input, output=top_model(vgg16.output))
    model.compile(optimizer=optimizers.Adam(), loss='binary_crossentropy', metrics=['accuracy'])
image001:   [[0.94, 0.03, 0.01, 0.91, ... , 0.91]]
image002:    [[0.93, 0.02, 0.01, 0.93, ... , 0.93]]
image003:    [[0.91, 0.02, 0.01, 0.92, ... , 0.92]]

Please tell me how to deal with this problem.

@hpnhxxwn
Copy link

hpnhxxwn commented Dec 2, 2017

@pieroit @bryan831
I'm facing exactly the same issue as you do. I'm wondering did you use the method @tobigue suggested and how does that work? Could you show me how did you solve this problem? FYI I tried class_weight = {0:1, 1:20} but it did not work and error out, looks like it does not work for multi-dimensional output.

@tobigue
Copy link

tobigue commented Dec 7, 2017

@hpnhxxwn did you try out the code I posted in the stackoverflow answer? Should be easy for you to test with copy & paste if you use the tensorflow backend.

@bzamecnik
Copy link
Contributor

bzamecnik commented Jan 24, 2018

Instead of:

preds = clf.predict(xs)
preds[preds>=0.5] = 1
preds[preds<0.5] = 0

you can just write:

preds = (clf.predict(xs) >= 0.5).astype(int)

We threshold the probabilities to obtain a boolean vector which we in turn convert to integers. It's less imperative than two assignments. Possibly you could either keep the booleans without conversion.

@vijaycol
Copy link

vijaycol commented Apr 6, 2018

I am facing the problem with input shape of model for categorical classifier
x y
[1,2,3] [0]
[2,3,5] [1]
[2,1,6] [2]
[1,2,3] [0]
[2,3,5] [0]
[2,1,6] [2]
then i changed the y label into categorical as
[1,0,0]
[0,1,0]
[0,0,1]
[1,0,0]
[1,0,0]
[0,0,1]
and my x_train shape is (6000,3)
y_train shape is (6000,3)
x_test shape is (2000,3)
y_test shape is (2000,3)

i tried this model and getting value error

model=sequential()
model.add(Dense(1, input_shape(3,), activation="softmax"))
model.compile(Adam(lr=0.5), 'categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train,y_train,epochs=50, verbose=1)

Value error: Error when checking target: expected dense_1 to have shape(None,1) but got array with shape (6000,3)

i dont understand this error. help me to sort out this

@Manu912
Copy link

Manu912 commented Apr 9, 2018

facing the same problem as @vijaycol ... but in my case, it's about image segmentation. I am passing X_train of shape (209,256,256,3). i.e. 209 images,256x256 size of each and 3 channels. It's everywhere mentioned that Y_train should be one-hot encoder but this will give error while using one-hot encoder.What to do? Suggest any solution asap.

The above-discussed problem, I am facing in the following code.

model=Sequential()
model.add(Conv2D(32,kernel_size=(3,3),input_shape=(x,y,z),padding='same',data_format='channels_last',kernel_initializer='ones',bias_initializer='zeros'))
model.add(Activation('relu'))
model.compile(optimizer='sgd',loss='categorical_crossentropy')
model.fit(X_trainY_train,epochs=10,batch_size=1)

@SpecKROELLchen
Copy link

@hpts23 Hey, did you solve your problem? I have exactly the same issue when using vgg16 for binary classification.

@stevelizcano
Copy link

stevelizcano commented Apr 27, 2018 via email

@SpecKROELLchen
Copy link

That is of course what i am already doing. I use binary_crossentropy with a sigmoid activation.
But my CNN still does not do what it should.
I already posted here #10040, so i will explain my problem in more detail there.

@agupt013
Copy link

agupt013 commented May 7, 2018

Hi, I am trying to do a multi-label classification on an image dataset of size 2.2M. I have seen people often use flow_from_directory and flow to train the network in batches. I cannot go for flow from directory as it is a multi-label problem and for using flow I need to load all my data in an array.

Can someone please suggest a better way of doing it?

Thanks!

@mohapatras
Copy link

@agupt013 Why you cannot use flow or flow_from_directory for multi-label ? Can you give a valid reason for it ?

@akashsingularityucr
Copy link

Hi @mohapatras For each image I have 5 labels out of 20 classes. My understanding of flow_from_directory is that images are placed in a subdirectory of the respective class. I want to compute loss such that I pass all 5 labels and respective predictions to the loss function.

@luoshao23
Copy link

luoshao23 commented May 29, 2018

I have similar but slightly different problems. I have multi-labels in one sample. Of each label, the class is mutually exclusive. So the target is more like concat( [0, 0, 1], [0, 1], [0, 0, 0, 1, 0] ) for one row. In such a case, should I train 3 separate models using softmax for each label, or can I also use sigmoid with binary_crossentropy loss function in one model?

@tobigue
Copy link

tobigue commented May 29, 2018

@luoshao23 you can train a model with multiple outputs: https://keras.io/getting-started/functional-api-guide/#multi-input-and-multi-output-models. If 3 models or 1 model is a better choice probably depends on the task.

@luoshao23
Copy link

@tobigue Thank you for your answer! So does it mean it is not a good choice to model this problem with only one output that is concatenated together in one vector?

@sarthakahuja11
Copy link

sarthakahuja11 commented Jun 18, 2018

I need to classify attributes in a face like colour of eye, hair, skin; facial hair, lighting and so on. Each has few sub-categories in it. So should I directly apply sigmoid on all the labels or separately apply softmax on each subcategory like hair/eye colour etc?
Which one will be better in this case?
Or should I combine both as some subclasses are binary?

@tsterbak
Copy link

I wrote an explanatory blog post about multi-label classification and there is also an example with keras. https://www.depends-on-the-definition.com/guide-to-multi-label-classification-with-neural-networks/

@pieroit
Copy link

pieroit commented Sep 13, 2018

@tsterbak great tutorial! Everybody in this thread should read it

@wt-huang
Copy link

wt-huang commented Nov 2, 2018

Closing as this is resolved

@wt-huang wt-huang closed this as completed Nov 2, 2018
@srijandas07
Copy link

I think this is still an unsolved query. And a lot of ppl are struggling to implement the multi-labeled models. Even after using sigmoid activation and binary cross-entropy the predicted probability distribution is almost near to zero for all the classes per samples. I think we really need to dig into the K.binarycross-entropy loss which in return calls nn.sigmoid_cross_entropy_with_logits(labels=target, logits=output).

@tobigue
Copy link

tobigue commented Aug 26, 2019

@srijandas07 have you tried to give more weight to positive targets in the loss function?

If you use the tensorflow backend of keras you can use tf.nn.weighted_cross_entropy_with_logits like this: https://stackoverflow.com/a/47313183/979377

@srijandas07
Copy link

great! I will try to use this!!

@srijandas07
Copy link

@tobigue Even in tf.nn.weighted_cross_entropy_with_logits sigmoid has been used.
In that case, I don't think we need to add sigmoid activation in the last dense layer for the multi-label classification task.

@tobigue
Copy link

tobigue commented Sep 3, 2019

@srijandas07 There are probably other activations and loss functions to explore, but using the sigmoid is to my best knowledge the standard for multi-label classification. The weighted version allows you to give a higher penalty when the classifier predicted a 0 while the target was a 1, which should improve the problem of getting predictions that are all close to zero.

@srijandas07
Copy link

@tobigue I have removed the activation and used nn.binarycrossentropy with logits. And this seems to work. After going through the function, I could infer that they implicitly use sigmoid to compute the loss.

@abafna
Copy link

abafna commented Sep 30, 2019

@lemuriandezapada yeah,

labels = np.zeros(preds.shape)
labels[preds>0.5] = 1

@arushi02 in softmax when increasing score for one label, all others are lowered (it's a probability distribution). You don't want that when you have multiple labels.
No, you don't need Graph

Here's an example of one of my multilabel nets:

# Build a classifier optimized for maximizing f1_score (uses class_weights)

clf = Sequential()

clf.add(Dropout(0.3))
clf.add(Dense(xt.shape[1], 1600, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(1600, 1200, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(1200, 800, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(800, yt.shape[1], activation='sigmoid'))

clf.compile(optimizer=Adam(), loss='binary_crossentropy')

clf.fit(xt, yt, batch_size=64, nb_epoch=300, validation_data=(xs, ys), class_weight=W, verbose=0)

preds = clf.predict(xs)

preds[preds>=0.5] = 1
preds[preds<0.5] = 0

print f1_score(ys, preds, average='macro')

@xieximeng2008 What does it print during training?

Can we apply different weights for different "labels" using this approach of binary cross-entropy? How is W structured here? @elanmart

fchollet pushed a commit that referenced this issue Sep 22, 2023
* Init commit for distributed training with JAX.

* WIP

* Add unit test for data parallel distribution.

* Update the TODO message in the test.

* Reduce he scope of the XLA flag in unit test.

* Update unit test for setup/teardown

* Add JAX xla backend reset for unit test

* Multiple updates to the distribution.

1. Rename  the distribute.py to distribution.py (same for tests).
2. Merge the global state logic to the distribution class.
3. Update all the unit tests.

* Updates.

* Updates

* Formatting

* Update test for debug setup/teardown

* Further debug for unit test

* lift the config logic out for testing

* More debug for the unit test cleanup

* Fix the unit test with warning.

* Address review comments.

* Address review comments.
hubingallin pushed a commit to hubingallin/keras that referenced this issue Sep 22, 2023
* Init commit for distributed training with JAX.

* WIP

* Add unit test for data parallel distribution.

* Update the TODO message in the test.

* Reduce he scope of the XLA flag in unit test.

* Update unit test for setup/teardown

* Add JAX xla backend reset for unit test

* Multiple updates to the distribution.

1. Rename  the distribute.py to distribution.py (same for tests).
2. Merge the global state logic to the distribution class.
3. Update all the unit tests.

* Updates.

* Updates

* Formatting

* Update test for debug setup/teardown

* Further debug for unit test

* lift the config logic out for testing

* More debug for the unit test cleanup

* Fix the unit test with warning.

* Address review comments.

* Address review comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests