How to train a multi-label Classifier #741

xieximeng2008 · 2015-09-28T13:17:37Z

I need train a multi-label softmax classifier, but there is a lot of one-hot code labels in examples, so how to change code to do it?

elanmart · 2015-09-28T19:28:16Z

Don't use softmax. Use sigmoid units in the output layer and then use "binary_crossentrpy" loss.

holderm · 2015-09-28T19:41:28Z

That works in my case. However model.predict_classes is not "adapted" for this. As an example for a sample from the test set, where target label is 1 0 1 0 0 0 0 (I have 7 in total, )
model.predict(tSets[1,:]): 9.90e-01, 2.7e-07, 6.05e-13, 9.98e-01, 2.16e-05, 7.62e-05, 1.51e-04 (so that is correct), but
model.predict_classes(tSets[1,:]) gives just array([3]) (seems like it picks the highest value from model.predict. A quick fix might be numpy.around but maybe there is a more elegant solution?

elanmart · 2015-09-28T19:44:12Z

Getting classes from .predict() is one line of numpy code really.

lemuriandezapada · 2015-09-29T07:58:44Z

model.predict(blabla) > 0.5 ?

arushi02 · 2015-09-29T09:01:27Z

@elanmart Hi, why do you think using softmax is not a good idea?

Do you use a graph model, given we have multiple outputs?

xieximeng2008 · 2015-09-29T09:02:28Z

my loss is not convergence @holderm @elanmart

model.predict(Y_train[1,:])

it shows [ 0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
0.00000000e+000]
my complete code:

from __future__ import absolute_import
from __future__ import print_function
import scipy.io
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.optimizers import SGD, Adadelta, Adagrad
from keras.utils import np_utils, generic_utils
from six.moves import range

batch_size = 100
nb_classes = 5
nb_epoch = 5
data_augmentation = True

shapex, shapey = 64, 64

nb_filters = [32, 64]

nb_pool = [4, 3]

nb_conv = [5, 4]

image_dimensions = 3


mat = scipy.io.loadmat('E:\scene.mat')

X_train = mat['x_train']
Y_train = mat['y_train']
X_test =  mat['x_test']
Y_test =  mat['y_test']
print(X_train.shape)
print(X_test.shape)

model = Sequential()

model.add(Convolution2D(nb_filters[0], image_dimensions, nb_conv[0], nb_conv[0], border_mode='valid'))
model.add(Activation('relu'))

model.add(MaxPooling2D(poolsize=(nb_pool[0], nb_pool[0])))
model.add(Dropout(0.25))

model.add(Convolution2D(nb_filters[1], nb_filters[0], nb_conv[1], nb_conv[1], border_mode='valid'))
model.add(Activation('relu'))

model.add(MaxPooling2D(poolsize=(nb_pool[1], nb_pool[1])))
model.add(Dropout(0.25))

model.add(Flatten())

model.add(Dense(nb_filters[-1] * (((shapex - nb_conv[0]+1)/ nb_pool[0] -nb_conv[1]+1)/ nb_pool[1]) * (((shapey -nb_conv[0]+1)/ nb_pool[0] -nb_conv[1]+1)/ nb_pool[1]), 512))
model.add(Activation('relu'))
model.add(Dropout(0.5))

model.add(Dense(512, nb_classes,init='uniform'))
model.add(Activation('sigmoid'))


sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=sgd) 

if not data_augmentation:
    print("Not using data augmentation or normalization")

    X_train = X_train.astype("float32")
    X_test = X_test.astype("float32")
    X_train /= 255
    X_test /= 255
    model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch)
    score = model.evaluate(X_test, Y_test, batch_size=batch_size)
    print('Test score:', score)

else:
    print("Using real time data augmentation")

    # this will do preprocessing and realtime data augmentation
    datagen = ImageDataGenerator(
        featurewise_center=True,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=True,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=20,  # randomly rotate images in the range (degrees, 0 to 180)
        width_shift_range=0.2,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.2,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=True,  # randomly flip images
        vertical_flip=False)  # randomly flip images

    datagen.fit(X_train)
    model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch)
    score = model.evaluate(X_test, Y_test, batch_size=batch_size)
    print (model.predict(X_test[1,:]))

could you help me to find out where it is wrong, thx !

elanmart · 2015-09-29T11:52:43Z

@lemuriandezapada yeah,

labels = np.zeros(preds.shape)
labels[preds>0.5] = 1

@arushi02 in softmax when increasing score for one label, all others are lowered (it's a probability distribution). You don't want that when you have multiple labels.
No, you don't need Graph

Here's an example of one of my multilabel nets:

# Build a classifier optimized for maximizing f1_score (uses class_weights)

clf = Sequential()

clf.add(Dropout(0.3))
clf.add(Dense(xt.shape[1], 1600, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(1600, 1200, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(1200, 800, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(800, yt.shape[1], activation='sigmoid'))

clf.compile(optimizer=Adam(), loss='binary_crossentropy')

clf.fit(xt, yt, batch_size=64, nb_epoch=300, validation_data=(xs, ys), class_weight=W, verbose=0)

preds = clf.predict(xs)

preds[preds>=0.5] = 1
preds[preds<0.5] = 0

print f1_score(ys, preds, average='macro')

@xieximeng2008 What does it print during training?

xieximeng2008 · 2015-09-29T12:26:31Z

@elanmart Using real time data augmentation

Epoch 0

 100/1800 [>.............................] - ETA: 58s - loss: 8.1209��������������������������������������������������������������������
 200/1800 [==>...........................] - ETA: 55s - loss: 6.7125��������������������������������������������������������������������
 300/1800 [====>.........................] - ETA: 51s - loss: 6.2430��������������������������������������������������������������������
 400/1800 [=====>........................] - ETA: 48s - loss: 6.0284��������������������������������������������������������������������
 500/1800 [=======>......................] - ETA: 44s - loss: 6.1214��������������������������������������������������������������������
 600/1800 [=========>....................] - ETA: 40s - loss: 5.9915��������������������������������������������������������������������
 700/1800 [==========>...................] - ETA: 37s - loss: 5.8876��������������������������������������������������������������������
 800/1800 [============>.................] - ETA: 33s - loss: 5.7681��������������������������������������������������������������������
 900/1800 [==============>...............] - ETA: 30s - loss: 5.6844��������������������������������������������������������������������
1000/1800 [===============>..............] - ETA: 27s - loss: 5.6092��������������������������������������������������������������������
1100/1800 [=================>............] - ETA: 23s - loss: 5.5703��������������������������������������������������������������������
1200/1800 [===================>..........] - ETA: 20s - loss: 5.5240��������������������������������������������������������������������
1300/1800 [====================>.........] - ETA: 16s - loss: 5.4976��������������������������������������������������������������������
1400/1800 [======================>.......] - ETA: 13s - loss: 5.4809��������������������������������������������������������������������
1500/1800 [========================>.....] - ETA: 10s - loss: 5.4526��������������������������������������������������������������������
1600/1800 [=========================>....] - ETA: 6s - loss: 5.4486 �������������������������������������������������������������������
1700/1800 [===========================>..] - ETA: 3s - loss: 5.4596�������������������������������������������������������������������
1800/1800 [==============================] - 60s - loss: 5.4326    
Epoch 1

 100/1800 [>.............................] - ETA: 56s - loss: 5.1808��������������������������������������������������������������������
 200/1800 [==>...........................] - ETA: 52s - loss: 5.0979��������������������������������������������������������������������
 300/1800 [====>.........................] - ETA: 49s - loss: 5.1670��������������������������������������������������������������������
 400/1800 [=====>........................] - ETA: 45s - loss: 5.2326��������������������������������������������������������������������
 500/1800 [=======>......................] - ETA: 42s - loss: 5.2554��������������������������������������������������������������������
 600/1800 [=========>....................] - ETA: 39s - loss: 5.2430��������������������������������������������������������������������
 700/1800 [==========>...................] - ETA: 36s - loss: 5.2104��������������������������������������������������������������������
 800/1800 [============>.................] - ETA: 33s - loss: 5.1912��������������������������������������������������������������������
 900/1800 [==============>...............] - ETA: 29s - loss: 5.1716��������������������������������������������������������������������
1000/1800 [===============>..............] - ETA: 26s - loss: 5.1559��������������������������������������������������������������������
1100/1800 [=================>............] - ETA: 23s - loss: 5.1318��������������������������������������������������������������������
1200/1800 [===================>..........] - ETA: 19s - loss: 5.1532��������������������������������������������������������������������
1300/1800 [====================>.........] - ETA: 16s - loss: 5.1489��������������������������������������������������������������������
1400/1800 [======================>.......] - ETA: 13s - loss: 5.1512��������������������������������������������������������������������
1500/1800 [========================>.....] - ETA: 9s - loss: 5.1642 �������������������������������������������������������������������
1600/1800 [=========================>....] - ETA: 6s - loss: 5.1549�������������������������������������������������������������������
1700/1800 [===========================>..] - ETA: 3s - loss: 5.1418�������������������������������������������������������������������
1800/1800 [==============================] - 59s - loss: 5.1325    
Epoch 2

 100/1800 [>.............................] - ETA: 56s - loss: 5.2637��������������������������������������������������������������������
 200/1800 [==>...........................] - ETA: 52s - loss: 5.1394��������������������������������������������������������������������
 300/1800 [====>.........................] - ETA: 49s - loss: 5.1117��������������������������������������������������������������������
 400/1800 [=====>........................] - ETA: 46s - loss: 5.0150��������������������������������������������������������������������
 500/1800 [=======>......................] - ETA: 42s - loss: 5.0150��������������������������������������������������������������������
 600/1800 [=========>....................] - ETA: 39s - loss: 4.9874��������������������������������������������������������������������
 700/1800 [==========>...................] - ETA: 36s - loss: 5.0387��������������������������������������������������������������������
 800/1800 [============>.................] - ETA: 32s - loss: 5.0565��������������������������������������������������������������������
 900/1800 [==============>...............] - ETA: 29s - loss: 5.0565��������������������������������������������������������������������
1000/1800 [===============>..............] - ETA: 26s - loss: 5.0813��������������������������������������������������������������������
1100/1800 [=================>............] - ETA: 23s - loss: 5.0942��������������������������������������������������������������������
1200/1800 [===================>..........] - ETA: 19s - loss: 5.0876��������������������������������������������������������������������
1300/1800 [====================>.........] - ETA: 16s - loss: 5.1234��������������������������������������������������������������������
1400/1800 [======================>.......] - ETA: 13s - loss: 5.1305��������������������������������������������������������������������
1500/1800 [========================>.....] - ETA: 9s - loss: 5.1256 �������������������������������������������������������������������
1600/1800 [=========================>....] - ETA: 6s - loss: 5.1316�������������������������������������������������������������������
1700/1800 [===========================>..] - ETA: 3s - loss: 5.1296�������������������������������������������������������������������
1800/1800 [==============================] - 60s - loss: 5.1325    
Epoch 3

 100/1800 [>.............................] - ETA: 56s - loss: 4.7664��������������������������������������������������������������������
 200/1800 [==>...........................] - ETA: 52s - loss: 5.0772��������������������������������������������������������������������
 300/1800 [====>.........................] - ETA: 49s - loss: 5.1394��������������������������������������������������������������������
 400/1800 [=====>........................] - ETA: 46s - loss: 5.1290��������������������������������������������������������������������
 500/1800 [=======>......................] - ETA: 42s - loss: 5.1311��������������������������������������������������������������������
 600/1800 [=========>....................] - ETA: 39s - loss: 5.1601��������������������������������������������������������������������
 700/1800 [==========>...................] - ETA: 36s - loss: 5.1157��������������������������������������������������������������������
 800/1800 [============>.................] - ETA: 33s - loss: 5.1497��������������������������������������������������������������������
 900/1800 [==============>...............] - ETA: 29s - loss: 5.1716��������������������������������������������������������������������
1000/1800 [===============>..............] - ETA: 26s - loss: 5.1891��������������������������������������������������������������������
1100/1800 [=================>............] - ETA: 23s - loss: 5.1695��������������������������������������������������������������������
1200/1800 [===================>..........] - ETA: 19s - loss: 5.1705��������������������������������������������������������������������
1300/1800 [====================>.........] - ETA: 16s - loss: 5.1585��������������������������������������������������������������������
1400/1800 [======================>.......] - ETA: 13s - loss: 5.1660��������������������������������������������������������������������
1500/1800 [========================>.....] - ETA: 9s - loss: 5.1587 �������������������������������������������������������������������
1600/1800 [=========================>....] - ETA: 6s - loss: 5.1394�������������������������������������������������������������������
1700/1800 [===========================>..] - ETA: 3s - loss: 5.1394�������������������������������������������������������������������
1800/1800 [==============================] - 59s - loss: 5.1325    
Epoch 4

 100/1800 [>.............................] - ETA: 55s - loss: 5.1394��������������������������������������������������������������������
 200/1800 [==>...........................] - ETA: 52s - loss: 5.1394��������������������������������������������������������������������
 300/1800 [====>.........................] - ETA: 49s - loss: 5.1117��������������������������������������������������������������������
 400/1800 [=====>........................] - ETA: 45s - loss: 5.1601��������������������������������������������������������������������
 500/1800 [=======>......................] - ETA: 42s - loss: 5.1477��������������������������������������������������������������������
 600/1800 [=========>....................] - ETA: 39s - loss: 5.1808��������������������������������������������������������������������
 700/1800 [==========>...................] - ETA: 36s - loss: 5.1334��������������������������������������������������������������������
 800/1800 [============>.................] - ETA: 32s - loss: 5.1290��������������������������������������������������������������������
 900/1800 [==============>...............] - ETA: 29s - loss: 5.1163��������������������������������������������������������������������
1000/1800 [===============>..............] - ETA: 26s - loss: 5.1311��������������������������������������������������������������������
1100/1800 [=================>............] - ETA: 23s - loss: 5.1431��������������������������������������������������������������������
1200/1800 [===================>..........] - ETA: 19s - loss: 5.1394��������������������������������������������������������������������
1300/1800 [====================>.........] - ETA: 16s - loss: 5.1298��������������������������������������������������������������������
1400/1800 [======================>.......] - ETA: 13s - loss: 5.1423��������������������������������������������������������������������
1500/1800 [========================>.....] - ETA: 9s - loss: 5.1338 �������������������������������������������������������������������
1600/1800 [=========================>....] - ETA: 6s - loss: 5.1161�������������������������������������������������������������������
1700/1800 [===========================>..] - ETA: 3s - loss: 5.1174�������������������������������������������������������������������
1800/1800 [==============================] - 59s - loss: 5.1325

testing...

100/200 [==============>...............] - ETA: 1s��������������������������������������������������
200/200 [==============================] - 2s     
[[  0.00000000e+000   0.00000000e+000   0.00000000e+000   0.00000000e+000
    0.00000000e+000]
 [  0.00000000e+000   0.00000000e+000   0.00000000e+000   0.00000000e+000
    0.00000000e+000]
 [  0.00000000e+000   0.00000000e+000   0.00000000e+000   0.00000000e+000
    0.00000000e+000]
 [  0.00000000e+000   0.00000000e+000   0.00000000e+000   0.00000000e+000
    0.00000000e+000]
[  1.22857558e-291   0.00000000e+000   3.11779756e-297   0.00000000e+000
    0.00000000e+000]
.........
.........

almost all outputs are zero or very very small float num

xieximeng2008 · 2015-09-29T13:59:23Z

@elanmart I used your example ,but also have above problems. dataset : X_train (1800,3,64,64),
X_test(200,3,64,64) Y_train(1800,5),Y_test(200,5)
I just change the code as you listed

model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,validation_data = (X_test,Y_test),verbose = 0)
    preds = model.predict(X_test)
    preds[preds>= 0.5] = 1
    preds[preds<0.5] = 0
    print (preds)

Thanks for helping me!

elanmart · 2015-09-29T16:52:52Z

@xieximeng2008 I'd guess the problem is in your data, since the network worked well for me few days ago.

arushi02 · 2015-09-30T07:29:31Z

@elanmart

Suppose I want to identify a house no 5436 from an image and I assume every image will have max 4 digits, so one image will be tagged with 4 one hot vectors like

[(0000010000), (0000100000), (0001000000), (0000001000)] and I pass this as a 2D matrix then will it give me probabilities for each element? In this kind of tagging, I want every row to have one element which is most probable (following a probability distribution).

vosybac · 2016-01-18T08:55:27Z

Does anyone know how to replace the default the validation score by the another scoring function printed at every epoch? The scoring function for validation set should be similar to the one implemented for test set. Many thanks.

clf.fit(xt, yt, batch_size=64, nb_epoch=300, validation_data=(xs, ys), class_weight=W, verbose=0)
preds = clf.predict(xs)
preds[preds>=0.5] = 1
preds[preds<0.5] = 0
print f1_score(ys, preds, average='macro')

suraj-deshmukh · 2016-01-21T10:13:58Z

@elanmart
i have image dataset, each having multiple label and y for particular image is [1,1,-1,-1,-1] where 1==class present and -1==class not present. my question is how to change y so that keras model will accept that y for trainning the data.

alyato · 2016-07-11T14:08:21Z

@suraj-deshmukh ,Do you solve your problem how to load the multi-label data? How do you do it？ Do you share your code? Thanks.

suraj-deshmukh · 2016-07-12T07:25:33Z

@alyato , Hi I solved my problem but I lost all my codes :( due to hdd failure. But as I said in previous comment my y/target was [1,1,-1,-1,-1] and I converted it into [1,1,0,0,0] where 1 == presence and 0 == absence for all images and passed that data to ConvNet having binary crossentropy as loss function and sigmoid as activation function for output layer.

alyato · 2016-07-13T07:28:21Z

@suraj-deshmukh ,Does i understand it like this.
for single label:(total 3)

x y
[1,2,3] [0]
[4,5,6] [1]
[7,8,9] [2]

So i load the train_data and train_label. The format of train_label is [0,1,2].
train_label.shape is (3,)
But for multi-label:(total 3)

x y
[1,2,3] [0,2]
[4,5,6] [1,2]
[7,8,9] [0,1]

Then The format of train_label is [ [1,0,1],[0,1,1],[1,1,0] ]
train_label.shape is (3,3)

Is that right?
If it are right,i also have one question.

for single label,The format of train_label is [0,1,2].And i need call the function (np_utils.to_categorical),converting it to the one-hot format

for multi-label ,The format of train_label is [ [1,0,1],[0,1,1],[1,1,0] ]
I don't call the function (np_utils.to_categorical)

suraj-deshmukh · 2016-07-13T07:33:50Z

@alyato
yes you are right

alyato · 2016-07-14T08:02:10Z

@suraj-deshmukh ,Thanks for your answer. But i also have some questions.

preds[preds>=0.5] = 1
preds[preds<0.5] = 0

how to set the Threshold,such as 0.5
If i gets my predict_test_label,how can i compare it with the real_test_label.

the predict_test_label is
[[1,0,1],
[0,1,1],
[1,1,0]]
and the real_test_label is
[[1,0,0],
[1,0,1],
[1,1,0]]

how to measure my model is better or worse?

XuesongYang · 2016-08-03T01:52:42Z

@elanmart
"in softmax when increasing score for one label, all others are lowered (it's a probability distribution). You don't want that when you have multiple labels."

I am kind of disagree with the conclusion. Maybe I am wrong.
softmax is just to calculate a normalized exponential value (probability) for each node in the output layer. Assuming there are two target labels out of seven for example, the neural network tries to predict top two posterior probabilities in the specific nodes, and the two probs are definitely the same.

ritchieng · 2016-09-30T04:07:37Z

Hi, I'm trying to classify an image with multiple digits. Say an image with "123" to output "123". There are up to 5 digits.

I'm stuck after I built the convolution layers. How do we output 5 digits each with 10 classes? Some suggested 5 independent fully connected layers after the final convolution layer. But how do we code this in Keras for the 5 independent FCs?

janmatias · 2016-11-29T14:59:17Z

@xieximeng2008 Did you ever find out why your network only returned values close to zero? I am in a similar situation where my network only returns zeroes. I am fine-tuning an InceptionV3 model. Loss function is binary_crossentropy, I am using sigmoid as activation for the final layer, and as an optimizer I use rmsprop.

suraj-deshmukh · 2016-11-30T06:51:16Z

@xieximeng2008 check this https://suraj-deshmukh.github.io/Multi-Label-Image-Classification/

yuan6785 · 2016-12-08T08:04:24Z

like this! modify sgd to Adam， could dec loss! thank @elanmart
@xieximeng2008 , i use this cnn same with you!
cnn --- sigmoid binary_crossentropy adam, this is all!

michelleowen · 2017-01-23T19:33:55Z

This thread is really helpful!
I have another question. What if my response data is partially missing, i.e. say I have five classes, and most of the data only have partial information on responses, e.g. [1,0,NaN,NaN,1].
I know I can build individual model for each class, but what if I want to build one single model?

janmatias · 2017-01-24T10:35:35Z

@michelleowen I am in no way an expert, but could it maybe work to set the NaN values to 0.5? This might not work in general, and it might be that this value should be tweaked dependent on the problem.

michelleowen · 2017-01-24T16:38:19Z

@janmatias Yes, I agree it is one workaround, but not perfect. I am thinking to modify the loss function, if the true response is NaN, then don't penalize it in the loss function. However, I am not quite sure which part of the keras code I should modify.

james97 · 2017-03-06T14:37:03Z

Awesome! I still have a question. If the dataset is quite imbalanced, i.e. samples in some categories are much more than others, how can I adopt class_weight to solve this to get a multi-label prediction? Can anybody answer me? @suraj-deshmukh @xieximeng2008

tobigue · 2017-11-14T11:04:47Z

@pieroit @bryan831 you could try to give more weight to positive targets in the loss function.

If you use the tensorflow backend of keras you can use tf.nn.weighted_cross_entropy_with_logits like this: https://stackoverflow.com/a/47313183/979377

Would be interested to hear if this worked for you and how you set the POS_WEIGHT in relation to your number of classes!

hpts23 · 2017-11-15T09:02:46Z

Hi! I am facing a bit different problem in training multi-label classifier.
I use sigmoid and binary cross entropy for training,
however, the network's output got almost same values among images, like below.
I have 200 classes, and now its output is not appropriate.

    input_tensor = Input(shape=(img_rows, img_cols, n_channels))
    vgg16 = VGG16(include_top=False, weights='imagenet', input_tensor=input_tensor)
    top_model = Sequential()
    top_model.add(Flatten(input_shape=vgg16.output_shape[1:]))
    top_model.add(Dense(4096, activation='relu'))
    top_model.add(Dropout(0.5))
    top_model.add(Dense(4096, activation='relu'))
    top_model.add(Dropout(0.5))
    top_model.add(Dense(nb_classes, activation='sigmoid', init='glorot_uniform'))
    model = Model(input=vgg16.input, output=top_model(vgg16.output))
    model.compile(optimizer=optimizers.Adam(), loss='binary_crossentropy', metrics=['accuracy'])

image001:   [[0.94, 0.03, 0.01, 0.91, ... , 0.91]]
image002:    [[0.93, 0.02, 0.01, 0.93, ... , 0.93]]
image003:    [[0.91, 0.02, 0.01, 0.92, ... , 0.92]]

Please tell me how to deal with this problem.

hpnhxxwn · 2017-12-02T02:44:17Z

@pieroit @bryan831
I'm facing exactly the same issue as you do. I'm wondering did you use the method @tobigue suggested and how does that work? Could you show me how did you solve this problem? FYI I tried class_weight = {0:1, 1:20} but it did not work and error out, looks like it does not work for multi-dimensional output.

tobigue · 2017-12-07T14:05:04Z

@hpnhxxwn did you try out the code I posted in the stackoverflow answer? Should be easy for you to test with copy & paste if you use the tensorflow backend.

bzamecnik · 2018-01-24T05:44:17Z

Instead of:

preds = clf.predict(xs)
preds[preds>=0.5] = 1
preds[preds<0.5] = 0

you can just write:

preds = (clf.predict(xs) >= 0.5).astype(int)

We threshold the probabilities to obtain a boolean vector which we in turn convert to integers. It's less imperative than two assignments. Possibly you could either keep the booleans without conversion.

vijaycol · 2018-04-06T02:42:12Z

I am facing the problem with input shape of model for categorical classifier
x y
[1,2,3] [0]
[2,3,5] [1]
[2,1,6] [2]
[1,2,3] [0]
[2,3,5] [0]
[2,1,6] [2]
then i changed the y label into categorical as
[1,0,0]
[0,1,0]
[0,0,1]
[1,0,0]
[1,0,0]
[0,0,1]
and my x_train shape is (6000,3)
y_train shape is (6000,3)
x_test shape is (2000,3)
y_test shape is (2000,3)

i tried this model and getting value error

model=sequential()
model.add(Dense(1, input_shape(3,), activation="softmax"))
model.compile(Adam(lr=0.5), 'categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train,y_train,epochs=50, verbose=1)

Value error: Error when checking target: expected dense_1 to have shape(None,1) but got array with shape (6000,3)

i dont understand this error. help me to sort out this

Manu912 · 2018-04-09T19:35:46Z

facing the same problem as @vijaycol ... but in my case, it's about image segmentation. I am passing X_train of shape (209,256,256,3). i.e. 209 images,256x256 size of each and 3 channels. It's everywhere mentioned that Y_train should be one-hot encoder but this will give error while using one-hot encoder.What to do? Suggest any solution asap.

The above-discussed problem, I am facing in the following code.

model=Sequential()
model.add(Conv2D(32,kernel_size=(3,3),input_shape=(x,y,z),padding='same',data_format='channels_last',kernel_initializer='ones',bias_initializer='zeros'))
model.add(Activation('relu'))
model.compile(optimizer='sgd',loss='categorical_crossentropy')
model.fit(X_trainY_train,epochs=10,batch_size=1)

SpecKROELLchen · 2018-04-27T08:57:07Z

@hpts23 Hey, did you solve your problem? I have exactly the same issue when using vgg16 for binary classification.

stevelizcano · 2018-04-27T22:56:34Z

Hi, if you are doing multi-label classification, you need to use the multi-label binarizer instead of one hot. ᐧ

…

On Fri, Apr 27, 2018 at 1:57 AM, SpecKROELLchen ***@***.***> wrote: @hpts23 <https://github.com/hpts23> Hey, did you solve your problem? I have exactly the same issue when using vgg16 for binary classification. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#741 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AX71c5F1t9jWIKqzH8OQDKn_bXS1KEV6ks5tst2UgaJpZM4GE5pO> .

-- Best Regards, Stephen Lizcano +49 176 4762 5344 +1 (626) 695 4868

SpecKROELLchen · 2018-04-28T07:41:30Z

That is of course what i am already doing. I use binary_crossentropy with a sigmoid activation.
But my CNN still does not do what it should.
I already posted here #10040, so i will explain my problem in more detail there.

agupt013 · 2018-05-07T03:51:49Z

Hi, I am trying to do a multi-label classification on an image dataset of size 2.2M. I have seen people often use flow_from_directory and flow to train the network in batches. I cannot go for flow from directory as it is a multi-label problem and for using flow I need to load all my data in an array.

Can someone please suggest a better way of doing it?

Thanks!

mohapatras · 2018-05-17T05:19:12Z

@agupt013 Why you cannot use flow or flow_from_directory for multi-label ? Can you give a valid reason for it ?

akashsingularityucr · 2018-05-17T06:07:31Z

Hi @mohapatras For each image I have 5 labels out of 20 classes. My understanding of flow_from_directory is that images are placed in a subdirectory of the respective class. I want to compute loss such that I pass all 5 labels and respective predictions to the loss function.

luoshao23 · 2018-05-29T09:41:15Z

I have similar but slightly different problems. I have multi-labels in one sample. Of each label, the class is mutually exclusive. So the target is more like concat( [0, 0, 1], [0, 1], [0, 0, 0, 1, 0] ) for one row. In such a case, should I train 3 separate models using softmax for each label, or can I also use sigmoid with binary_crossentropy loss function in one model？

tobigue · 2018-05-29T09:49:21Z

@luoshao23 you can train a model with multiple outputs: https://keras.io/getting-started/functional-api-guide/#multi-input-and-multi-output-models. If 3 models or 1 model is a better choice probably depends on the task.

luoshao23 · 2018-05-29T10:14:05Z

@tobigue Thank you for your answer! So does it mean it is not a good choice to model this problem with only one output that is concatenated together in one vector?

sarthakahuja11 · 2018-06-18T12:11:46Z

I need to classify attributes in a face like colour of eye, hair, skin; facial hair, lighting and so on. Each has few sub-categories in it. So should I directly apply sigmoid on all the labels or separately apply softmax on each subcategory like hair/eye colour etc?
Which one will be better in this case?
Or should I combine both as some subclasses are binary?

tsterbak · 2018-09-13T11:39:06Z

I wrote an explanatory blog post about multi-label classification and there is also an example with keras. https://www.depends-on-the-definition.com/guide-to-multi-label-classification-with-neural-networks/

pieroit · 2018-09-13T11:58:32Z

@tsterbak great tutorial! Everybody in this thread should read it

wt-huang · 2018-11-02T21:01:19Z

Closing as this is resolved

srijandas07 · 2019-08-26T08:50:54Z

I think this is still an unsolved query. And a lot of ppl are struggling to implement the multi-labeled models. Even after using sigmoid activation and binary cross-entropy the predicted probability distribution is almost near to zero for all the classes per samples. I think we really need to dig into the K.binarycross-entropy loss which in return calls nn.sigmoid_cross_entropy_with_logits(labels=target, logits=output).

tobigue · 2019-08-26T12:33:28Z

@srijandas07 have you tried to give more weight to positive targets in the loss function?

If you use the tensorflow backend of keras you can use tf.nn.weighted_cross_entropy_with_logits like this: https://stackoverflow.com/a/47313183/979377

srijandas07 · 2019-08-26T13:20:24Z

great! I will try to use this!!

srijandas07 · 2019-08-27T10:46:31Z

@tobigue Even in tf.nn.weighted_cross_entropy_with_logits sigmoid has been used.
In that case, I don't think we need to add sigmoid activation in the last dense layer for the multi-label classification task.

tobigue · 2019-09-03T16:30:33Z

@srijandas07 There are probably other activations and loss functions to explore, but using the sigmoid is to my best knowledge the standard for multi-label classification. The weighted version allows you to give a higher penalty when the classifier predicted a 0 while the target was a 1, which should improve the problem of getting predictions that are all close to zero.

srijandas07 · 2019-09-03T16:44:59Z

@tobigue I have removed the activation and used nn.binarycrossentropy with logits. And this seems to work. After going through the function, I could infer that they implicitly use sigmoid to compute the loss.

abafna · 2019-09-30T22:40:43Z

@lemuriandezapada yeah,

labels = np.zeros(preds.shape)
labels[preds>0.5] = 1

@arushi02 in softmax when increasing score for one label, all others are lowered (it's a probability distribution). You don't want that when you have multiple labels.
No, you don't need Graph

Here's an example of one of my multilabel nets:

# Build a classifier optimized for maximizing f1_score (uses class_weights)

clf = Sequential()

clf.add(Dropout(0.3))
clf.add(Dense(xt.shape[1], 1600, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(1600, 1200, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(1200, 800, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(800, yt.shape[1], activation='sigmoid'))

clf.compile(optimizer=Adam(), loss='binary_crossentropy')

clf.fit(xt, yt, batch_size=64, nb_epoch=300, validation_data=(xs, ys), class_weight=W, verbose=0)

preds = clf.predict(xs)

preds[preds>=0.5] = 1
preds[preds<0.5] = 0

print f1_score(ys, preds, average='macro')

@xieximeng2008 What does it print during training?

Can we apply different weights for different "labels" using this approach of binary cross-entropy? How is W structured here? @elanmart

* Init commit for distributed training with JAX. * WIP * Add unit test for data parallel distribution. * Update the TODO message in the test. * Reduce he scope of the XLA flag in unit test. * Update unit test for setup/teardown * Add JAX xla backend reset for unit test * Multiple updates to the distribution. 1. Rename the distribute.py to distribution.py (same for tests). 2. Merge the global state logic to the distribution class. 3. Update all the unit tests. * Updates. * Updates * Formatting * Update test for debug setup/teardown * Further debug for unit test * lift the config logic out for testing * More debug for the unit test cleanup * Fix the unit test with warning. * Address review comments. * Address review comments.

alvarouc mentioned this issue Oct 14, 2015

How to make loss layer for multi-label classification? #761

Closed

alyato mentioned this issue Jul 12, 2016

how to do large scale multi-task multi-label classification? #1331

Closed

odanado mentioned this issue Apr 2, 2017

複数ラベルの予測 odanado/Indeed-ML#2

Open

wt-huang closed this as completed Nov 2, 2018

How to train a multi-label Classifier #741

How to train a multi-label Classifier #741

Comments

xieximeng2008 commented Sep 28, 2015

elanmart commented Sep 28, 2015

holderm commented Sep 28, 2015

elanmart commented Sep 28, 2015

lemuriandezapada commented Sep 29, 2015

arushi02 commented Sep 29, 2015

xieximeng2008 commented Sep 29, 2015

elanmart commented Sep 29, 2015

xieximeng2008 commented Sep 29, 2015

xieximeng2008 commented Sep 29, 2015

elanmart commented Sep 29, 2015

arushi02 commented Sep 30, 2015

vosybac commented Jan 18, 2016

suraj-deshmukh commented Jan 21, 2016

alyato commented Jul 11, 2016

suraj-deshmukh commented Jul 12, 2016 • edited Loading

alyato commented Jul 13, 2016

suraj-deshmukh commented Jul 13, 2016 • edited Loading

alyato commented Jul 14, 2016

XuesongYang commented Aug 3, 2016

ritchieng commented Sep 30, 2016

janmatias commented Nov 29, 2016

suraj-deshmukh commented Nov 30, 2016

yuan6785 commented Dec 8, 2016

michelleowen commented Jan 23, 2017

janmatias commented Jan 24, 2017

michelleowen commented Jan 24, 2017

james97 commented Mar 6, 2017 • edited Loading

tobigue commented Nov 14, 2017 • edited Loading

hpts23 commented Nov 15, 2017

hpnhxxwn commented Dec 2, 2017

tobigue commented Dec 7, 2017 • edited Loading

bzamecnik commented Jan 24, 2018 • edited Loading

vijaycol commented Apr 6, 2018

Manu912 commented Apr 9, 2018

SpecKROELLchen commented Apr 27, 2018

stevelizcano commented Apr 27, 2018 via email

SpecKROELLchen commented Apr 28, 2018

agupt013 commented May 7, 2018

mohapatras commented May 17, 2018

akashsingularityucr commented May 17, 2018

luoshao23 commented May 29, 2018 • edited Loading

tobigue commented May 29, 2018

luoshao23 commented May 29, 2018

sarthakahuja11 commented Jun 18, 2018 • edited Loading

tsterbak commented Sep 13, 2018

pieroit commented Sep 13, 2018

wt-huang commented Nov 2, 2018

srijandas07 commented Aug 26, 2019

tobigue commented Aug 26, 2019

srijandas07 commented Aug 26, 2019

srijandas07 commented Aug 27, 2019

tobigue commented Sep 3, 2019 • edited Loading

srijandas07 commented Sep 3, 2019

abafna commented Sep 30, 2019 • edited Loading

suraj-deshmukh commented Jul 12, 2016 •

edited

Loading

suraj-deshmukh commented Jul 13, 2016 •

edited

Loading

james97 commented Mar 6, 2017 •

edited

Loading

tobigue commented Nov 14, 2017 •

edited

Loading

tobigue commented Dec 7, 2017 •

edited

Loading

bzamecnik commented Jan 24, 2018 •

edited

Loading

luoshao23 commented May 29, 2018 •

edited

Loading

sarthakahuja11 commented Jun 18, 2018 •

edited

Loading

tobigue commented Sep 3, 2019 •

edited

Loading

abafna commented Sep 30, 2019 •

edited

Loading