Skip to content
This repository has been archived by the owner on May 24, 2018. It is now read-only.

multilabel configuration #194

Open
sundevil0405 opened this issue Jul 1, 2015 · 17 comments
Open

multilabel configuration #194

sundevil0405 opened this issue Jul 1, 2015 · 17 comments

Comments

@sundevil0405
Copy link

Hi,

We are trying to learn to use cxxnet for a multi-label problem.

We made the following settings:

label_width = 5
label_vec[0,5) = class
target = class
metric = error

but get the error:

Metric: unknown target = label  

Could any one kindly explain this for us or provide us an example of multi-label layer configuration ?

Thanks a lot,
YS

@sundevil0405 sundevil0405 changed the title copy shape mismatch error in multi-label setting multilabel configuration Jul 1, 2015
@sxjzwq
Copy link

sxjzwq commented Jul 2, 2015

modify
label_vec[0,5) = class
target = class

to
label_vec[0,5) = label
target = label

see this ...
#139

@sundevil0405
Copy link
Author

Thank you sxjzwq!

I followed your comments and it works. However, I met another error:

Segmentation fault (core dumped)

Is there anyway to fix this?

Thanks a lot!

@sxjzwq
Copy link

sxjzwq commented Jul 2, 2015

I guess it is caused by the input data. What's the size of your input ? For example, If it's 224_224_3, and if the size of some images in your data are smaller than 224, you will meet such problem.

You should resize your image when applying im2rec. Check the im2rec help and you will find those parameters.

@sundevil0405
Copy link
Author

Hi sxjzwq,
Our input is 512x512x3 and we actually have resized the images before running the code. Could you tell me how to check if some image does not have a right shape? Or is there any other possible reason?Thank you!

@sxjzwq
Copy link

sxjzwq commented Jul 2, 2015

I am not sure. May be you should check the format of your image list and re-generate the .rec file using the parameter resize=512. I only met this error when I include a subset of my data. After I check the subset I found some image size is smaller than my network input shape. So I resize them and the error gone. But there might be some other reasons in your case. Please check the input carefully. Good luck!

@sundevil0405
Copy link
Author

We will carefully check the input. Thanks a million!

@sxjzwq
Copy link

sxjzwq commented Jul 3, 2015

You're welcome! Please let me know your multi-label classification performance if it works. I am also working on training a multi-label classification network but it seems that my network parameter can not converge.

@sundevil0405
Copy link
Author

Sure! We are trying some simple settings and see what happens. We will let you know the performance if the setting works!! Thank you!

@sundevil0405
Copy link
Author

Hi Qi,
We tried multiple parameter settings. It seems the code does work on our data as well. The training error does not even change after multiple rounds, we basically observe things like
round 0:[ 1098] 1082 sec elapsed[1] train-error:0.305704
round 1:[ 1098] 2170 sec elapsed[2] train-error:0.305203
round 2:[ 1098] 3259 sec elapsed[3] train-error:0.305203
round 3:[ 1098] 4347 sec elapsed[4] train-error:0.305203
round 4:[ 1098] 5436 sec elapsed[5] train-error:0.305203
round 5:[ 1098] 6524 sec elapsed[6] train-error:0.305203
round 6:[ 1098] 7612 sec elapsed[7] train-error:0.305203
round 7:[ 1098] 8700 sec elapsed[8] train-error:0.305203
round 8:[ 1098] 9789 sec elapsed[9] train-error:0.305203
round 9:[ 1098] 10878 sec elapsed[10] train-error:0.305203
round 10:[ 1098] 11966 sec elapsed[11] train-error:0.305203
round 11:[ 1098] 13054 sec elapsed[12] train-error:0.305203
round 12:[ 1098] 14142 sec elapsed[13] train-error:0.305203
round 13:[ 1098] 15231 sec elapsed[14] train-error:0.305203
round 14:[ 1098] 16319 sec elapsed[15] train-error:0.305203
round 15:[ 1098] 17408 sec elapsed[16] train-error:0.305203
round 16:[ 1098] 18496 sec elapsed[17] train-error:0.305203

I think it would be good to have an example in cxxnet.

@sxjzwq
Copy link

sxjzwq commented Jul 4, 2015

Hi
May be you can set metric = logloss and try again. And which lose function are you using? Try muti_logistic. I got some positive results on a easy dataset now and my network is fine tuned based on vggnet16. But still trying on my real data.

@sundevil0405
Copy link
Author

Hi sxjzwq, thank you so much for your suggestion. We tried both l2 and softmax as the loss function. We will definitely try your suggestion and let you know if there is an improvement. Thanks again!

@sxjzwq
Copy link

sxjzwq commented Jul 9, 2015

start from vgg16.model
layer:fc7
wmat:eta = 0.0005
bias:eta = 0.0010
layer:fc8
wmat:eta = 0.0010
bias:eta = 0.0020

round 0:[ 2466] 11686 sec elapsed[1] train-logloss:0.092616 train-rmse:6.30993
round 1:[ 2466] 23366 sec elapsed[2] train-logloss:-nan train-rmse:5.79034
round 2:[ 2466] 35045 sec elapsed[3] train-logloss:-nan train-rmse:5.65325
round 3:[ 2466] 46721 sec elapsed[4] train-logloss:-nan train-rmse:5.56152
round 4:[ 2466] 58397 sec elapsed[5] train-logloss:-nan train-rmse:5.48876
round 5:[ 2466] 70074 sec elapsed[6] train-logloss:-nan train-rmse:5.42933

start from 0006.model
layer:fc7
wmat:eta = 0.0005
bias:eta = 0.0010
layer:fc8
wmat:eta = 0.0005
bias:eta = 0.0010

round 6:[ 2466] 11681 sec elapsed[7] train-logloss:-nan train-rmse:5.33734
round 7:[ 2466] 23361 sec elapsed[8] train-logloss:-nan train-rmse:5.27811
round 8:[ 2466] 35040 sec elapsed[9] train-logloss:-nan train-rmse:5.2354
round 9:[ 2466] 46719 sec elapsed[10] train-logloss:-nan train-rmse:5.19465
round 10:[ 2466] 58396 sec elapsed[11] train-logloss:-nan train-rmse:5.15824
round 11:[ 2466] 70071 sec elapsed[12] train-logloss:-nan train-rmse:5.12289

start from 0012.model
layer:fc7
wmat:eta = 0.00025
bias:eta = 0.00050
layer:fc8
wmat:eta = 0.00025
bias:eta = 0.00050

round 12:[ 2466] 11686 sec elapsed[13] train-logloss:-nan train-rmse:4.60376
round 13:[ 2466] 23383 sec elapsed[14] train-logloss:-nan train-rmse:4.48242
round 14:[ 2466] 35060 sec elapsed[15] train-logloss:-nan train-rmse:4.4032
round 15:[ 2466] 46732 sec elapsed[16] train-logloss:-nan train-rmse:4.33162
round 16:[ 2466] 58405 sec elapsed[17] train-logloss:-nan train-rmse:4.28349
round 17:[ 2466] 70076 sec elapsed[18] train-logloss:-nan train-rmse:4.2459

start from 0018.model
layer:fc7
wmat:eta = 0.00010
bias:eta = 0.00020
layer:fc8
wmat:eta = 0.00010
bias:eta = 0.00020

round 18:[ 2466] 11674 sec elapsed[19] train-logloss:-nan train-rmse:3.93583

Using RMSE metric will be helpful.

@sundevil0405
Copy link
Author

Hi Qi,
Thank you very much for your advice. We will try this. By the way, we tried your last suggestion but we also met the NAN problem. Hopefully it will work this time. Thanks again!!

@sxjzwq
Copy link

sxjzwq commented Jul 13, 2015

Hi Yashu

Yes, I don't know how to avoid the NAN problem when using logloss evaluation metric, but the RMSE metric seems works fine. I finally got the train-rmse 1.32312 on my data. And my multi-label classification mAP is bigger than 0.7, much better than using fc7-feature+multi_label_SVM.

I wish this information is helpful.

Best

@sundevil0405
Copy link
Author

Hi Qi,

That's a really good news! We actually followed your suggestion and

changed the RMSE metric. However, the speed seems extreme slow.. We've
pre-trained the network for ~3 days using two GTX titan black cards while
it only finishes ~300 rounds. How many rounds did your algorithm take? Is
that pre-train or fine-tuning?

Thank you very much,
Yashu

On Sun, Jul 12, 2015 at 10:22 PM, Qi Wu [email protected] wrote:

Hi Yashu

Yes, I don't know how to avoid the NAN problem when using logloss
evaluation metric, but the RMSE metric seems works fine. I finally got the
train-rmse 1.32312 on my data. And my multi-label classification mAP is
bigger than 0.7, much better than using fc7-feature+multi_label_SVM.

I wish this information is helpful.

Best


Reply to this email directly or view it on GitHub
#194 (comment).

  • Yashu

@sxjzwq
Copy link

sxjzwq commented Jul 14, 2015

Hi Yashu

I am using the pre-trained VGGNet16 (trained on ImageNet of course) as the initial model. And then fine tune the last FC layer (fc7) and the classification layer (change 1000 to 256, which is my label width). Also, I change the loss layer from softmax to multi_logistic. For all the other layers, I keep learning rate as 0, so the parameters will be fixed as the VGGNet.

I start my training with the learning rate = 0.001 and decrease it when the train-RMSE error doesn't decrease any more. I only trained 36 rounds and because my learning rate has become 0.000001, I stopped the training. The following is my training log:

start from vgg16.model
layer:fc7
wmat:eta = 0.0005
bias:eta = 0.0010
layer:fc8
wmat:eta = 0.0010
bias:eta = 0.0020

round 0:[ 2466] 11686 sec elapsed[1] train-logloss:0.092616 train-rmse:6.30993
round 1:[ 2466] 23366 sec elapsed[2] train-logloss:-nan train-rmse:5.79034
round 2:[ 2466] 35045 sec elapsed[3] train-logloss:-nan train-rmse:5.65325
round 3:[ 2466] 46721 sec elapsed[4] train-logloss:-nan train-rmse:5.56152
round 4:[ 2466] 58397 sec elapsed[5] train-logloss:-nan train-rmse:5.48876
round 5:[ 2466] 70074 sec elapsed[6] train-logloss:-nan train-rmse:5.42933

start from 0006.model
layer:fc7
wmat:eta = 0.0005
bias:eta = 0.0010
layer:fc8
wmat:eta = 0.0005
bias:eta = 0.0010

round 6:[ 2466] 11681 sec elapsed[7] train-logloss:-nan train-rmse:5.33734
round 7:[ 2466] 23361 sec elapsed[8] train-logloss:-nan train-rmse:5.27811
round 8:[ 2466] 35040 sec elapsed[9] train-logloss:-nan train-rmse:5.2354
round 9:[ 2466] 46719 sec elapsed[10] train-logloss:-nan train-rmse:5.19465
round 10:[ 2466] 58396 sec elapsed[11] train-logloss:-nan train-rmse:5.15824
round 11:[ 2466] 70071 sec elapsed[12] train-logloss:-nan train-rmse:5.12289

start from 0012.model
layer:fc7
wmat:eta = 0.00025
bias:eta = 0.00050
layer:fc8
wmat:eta = 0.00025
bias:eta = 0.00050

round 12:[ 2466] 11686 sec elapsed[13] train-logloss:-nan train-rmse:4.60376
round 13:[ 2466] 23383 sec elapsed[14] train-logloss:-nan train-rmse:4.48242
round 14:[ 2466] 35060 sec elapsed[15] train-logloss:-nan train-rmse:4.4032
round 15:[ 2466] 46732 sec elapsed[16] train-logloss:-nan train-rmse:4.33162
round 16:[ 2466] 58405 sec elapsed[17] train-logloss:-nan train-rmse:4.28349
round 17:[ 2466] 70076 sec elapsed[18] train-logloss:-nan train-rmse:4.2459

start from 0018.model
layer:fc7
wmat:eta = 0.00010
bias:eta = 0.00020
layer:fc8
wmat:eta = 0.00010
bias:eta = 0.00020

round 18:[ 2466] 11674 sec elapsed[19] train-logloss:-nan train-rmse:3.93583
round 19:[ 2466] 23353 sec elapsed[20] train-logloss:-nan train-rmse:3.68861
round 20:[ 2466] 35027 sec elapsed[21] train-logloss:-nan train-rmse:3.48819
round 21:[ 2466] 46701 sec elapsed[22] train-logloss:-nan train-rmse:3.29444
round 22:[ 2466] 58375 sec elapsed[23] train-logloss:-nan train-rmse:3.13445
round 23:[ 2466] 70048 sec elapsed[24] train-logloss:-nan train-rmse:2.98958

start from 0024.model
layer:fc7
wmat:eta = 0.00001
bias:eta = 0.00002
layer:fc8
wmat:eta = 0.00001
bias:eta = 0.00002

round 24:[ 2466] 11671 sec elapsed[25] train-logloss:-nan train-rmse:3.27728
round 25:[ 2466] 23347 sec elapsed[26] train-logloss:-nan train-rmse:2.95055
round 26:[ 2466] 35017 sec elapsed[27] train-logloss:-nan train-rmse:2.65933
round 27:[ 2466] 46689 sec elapsed[28] train-logloss:-nan train-rmse:2.35525
round 28:[ 2466] 58361 sec elapsed[29] train-logloss:-nan train-rmse:2.04922
round 29:[ 2466] 70034 sec elapsed[30] train-logloss:-nan train-rmse:1.72671

start from 0030.model
layer:fc7
wmat:eta = 0.000001
bias:eta = 0.000002
layer:fc8
wmat:eta = 0.000001
bias:eta = 0.000002

round 30:[ 2466] 11675 sec elapsed[31] train-logloss:-nan train-rmse:2.81689
round 31:[ 2466] 23350 sec elapsed[32] train-logloss:-nan train-rmse:2.46264
round 32:[ 2466] 35021 sec elapsed[33] train-logloss:-nan train-rmse:2.16123
round 33:[ 2466] 46691 sec elapsed[34] train-logloss:-nan train-rmse:1.86558
round 34:[ 2466] 58362 sec elapsed[35] train-logloss:-nan train-rmse:1.58915
round 35:[ 2466] 70034 sec elapsed[36] train-logloss:-nan train-rmse:1.32312

@sundevil0405
Copy link
Author

Hi Qi,

 Thank you so much for your advice. Our problem is not suitable for

fine-tuning so we decide to train the net directly. However, the toolbox
does not work and we decide to give up cxxnet and turn to caffe. Thank you
again for your help and hope we can discuss and collaborate someday : )

Best Regards,
Yashu

On Monday, July 13, 2015, Qi Wu [email protected] wrote:

Hi Yashu

I am using the pre-trained VGGNet16 (trained on ImageNet of course) as the
initial model. And then fine tune the last FC layer (fc7) and the
classification layer (change 1000 to 256, which is my label width). Also, I
change the loss layer from softmax to multi_logistic. For all the other
layers, I keep learning rate as 0, so the parameters will be fixed as the
VGGNet.

I start my training with the learning rate = 0.001 and decrease it when
the train-RMSE error doesn't decrease any more. I only trained 36 rounds
and because my learning rate has become 0.000001, I stopped the training.
The following is my training log:

start from vgg16.model
layer:fc7
wmat:eta = 0.0005
bias:eta = 0.0010
layer:fc8
wmat:eta = 0.0010
bias:eta = 0.0020

round 0:[ 2466] 11686 sec elapsed[1] train-logloss:0.092616
train-rmse:6.30993
round 1:[ 2466] 23366 sec elapsed[2] train-logloss:-nan train-rmse:5.79034
round 2:[ 2466] 35045 sec elapsed[3] train-logloss:-nan train-rmse:5.65325
round 3:[ 2466] 46721 sec elapsed[4] train-logloss:-nan train-rmse:5.56152
round 4:[ 2466] 58397 sec elapsed[5] train-logloss:-nan train-rmse:5.48876
round 5:[ 2466] 70074 sec elapsed[6] train-logloss:-nan train-rmse:5.42933

start from 0006.model
layer:fc7
wmat:eta = 0.0005
bias:eta = 0.0010
layer:fc8
wmat:eta = 0.0005
bias:eta = 0.0010

round 6:[ 2466] 11681 sec elapsed[7] train-logloss:-nan train-rmse:5.33734
round 7:[ 2466] 23361 sec elapsed[8] train-logloss:-nan train-rmse:5.27811
round 8:[ 2466] 35040 sec elapsed[9] train-logloss:-nan train-rmse:5.2354
round 9:[ 2466] 46719 sec elapsed[10] train-logloss:-nan train-rmse:5.19465
round 10:[ 2466] 58396 sec elapsed[11] train-logloss:-nan
train-rmse:5.15824
round 11:[ 2466] 70071 sec elapsed[12] train-logloss:-nan
train-rmse:5.12289

start from 0012.model
layer:fc7
wmat:eta = 0.00025
bias:eta = 0.00050
layer:fc8
wmat:eta = 0.00025
bias:eta = 0.00050

round 12:[ 2466] 11686 sec elapsed[13] train-logloss:-nan
train-rmse:4.60376
round 13:[ 2466] 23383 sec elapsed[14] train-logloss:-nan
train-rmse:4.48242
round 14:[ 2466] 35060 sec elapsed[15] train-logloss:-nan train-rmse:4.4032
round 15:[ 2466] 46732 sec elapsed[16] train-logloss:-nan
train-rmse:4.33162
round 16:[ 2466] 58405 sec elapsed[17] train-logloss:-nan
train-rmse:4.28349
round 17:[ 2466] 70076 sec elapsed[18] train-logloss:-nan train-rmse:4.2459

start from 0018.model
layer:fc7
wmat:eta = 0.00010
bias:eta = 0.00020
layer:fc8
wmat:eta = 0.00010
bias:eta = 0.00020

round 18:[ 2466] 11674 sec elapsed[19] train-logloss:-nan
train-rmse:3.93583
round 19:[ 2466] 23353 sec elapsed[20] train-logloss:-nan
train-rmse:3.68861
round 20:[ 2466] 35027 sec elapsed[21] train-logloss:-nan
train-rmse:3.48819
round 21:[ 2466] 46701 sec elapsed[22] train-logloss:-nan
train-rmse:3.29444
round 22:[ 2466] 58375 sec elapsed[23] train-logloss:-nan
train-rmse:3.13445
round 23:[ 2466] 70048 sec elapsed[24] train-logloss:-nan
train-rmse:2.98958

start from 0024.model
layer:fc7
wmat:eta = 0.00001
bias:eta = 0.00002
layer:fc8
wmat:eta = 0.00001
bias:eta = 0.00002

round 24:[ 2466] 11671 sec elapsed[25] train-logloss:-nan
train-rmse:3.27728
round 25:[ 2466] 23347 sec elapsed[26] train-logloss:-nan
train-rmse:2.95055
round 26:[ 2466] 35017 sec elapsed[27] train-logloss:-nan
train-rmse:2.65933
round 27:[ 2466] 46689 sec elapsed[28] train-logloss:-nan
train-rmse:2.35525
round 28:[ 2466] 58361 sec elapsed[29] train-logloss:-nan
train-rmse:2.04922
round 29:[ 2466] 70034 sec elapsed[30] train-logloss:-nan
train-rmse:1.72671

start from 0030.model
layer:fc7
wmat:eta = 0.000001
bias:eta = 0.000002
layer:fc8
wmat:eta = 0.000001
bias:eta = 0.000002

round 30:[ 2466] 11675 sec elapsed[31] train-logloss:-nan
train-rmse:2.81689
round 31:[ 2466] 23350 sec elapsed[32] train-logloss:-nan
train-rmse:2.46264
round 32:[ 2466] 35021 sec elapsed[33] train-logloss:-nan
train-rmse:2.16123
round 33:[ 2466] 46691 sec elapsed[34] train-logloss:-nan
train-rmse:1.86558
round 34:[ 2466] 58362 sec elapsed[35] train-logloss:-nan
train-rmse:1.58915
round 35:[ 2466] 70034 sec elapsed[36] train-logloss:-nan
train-rmse:1.32312


Reply to this email directly or view it on GitHub
#194 (comment).

  • Yashu

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants