Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train from scratch #50

Open
aoluming opened this issue Aug 14, 2020 · 35 comments
Open

Train from scratch #50

aoluming opened this issue Aug 14, 2020 · 35 comments

Comments

@aoluming
Copy link

Anyone try to train from scratch on Ucf101 on C3D? The accuracy keep 1%. I use other models implemented by myself and the accuracy is also 1%. The learning rate is 1e-5. Does anyone have some idea on it?

@Farabi-shafkat
Copy link

I am also trying to train from scratch and after 13 or so epochs the train and test accuracies are 43% and 28% almost. Im also using a custom architecture. So maybe the bug is in your code. Its not possible to provide any more solutions without knowing the specifics of your code.

@aoluming
Copy link
Author

@Farabi-shafkat
Thank you for your kind reply! Have you ever tried to train from scratch on C3D? https://github.com/aoluming/Cost_model ,this is my custom architecture, which I follow the paper 'Collaborative Spatiotemporal Feature Learning for Video Action Recognition' in CVPR2019. This paper is not opoen-source. I would be really appreciated if you can check the code for me.

@Farabi-shafkat
Copy link

Hello, no I have not trained from scratch on c3d. And i am by no means an expert, i have seen your code but i could not find any bug in your custom network implementation. However there is one thing that might be wrong. check this thread out.
#30 (comment)

@aoluming
Copy link
Author

aoluming commented Aug 16, 2020

Hello, you are such a modest man and thank you for doing so much for me. I will try this thread in my code. @Farabi-shafkat

@libb999
Copy link

libb999 commented Aug 18, 2020

有人试着用Ucf 101从零开始在C3D上训练吗?准确度保持1%。我使用的是由我自己实现的其他模型,准确率也是1%。学习率为1E-5。有人对此有什么想法吗?

Anyone try to train from scratch on Ucf101 on C3D? The accuracy keep 1%. I use other models implemented by myself and the accuracy is also 1%. The learning rate is 1e-5. Does anyone have some idea on it?

I get the same acc=1% with train from scratch,in this code

@aoluming
Copy link
Author

@libb999 同学你用的就是c3d么,有尝试用其他模型么,用这篇repo的c3d加他给的pretrain几个迭代acc就上97,从0训练就1%,我觉得很离谱,你觉得可能是哪里出问题了么。

@libb999
Copy link

libb999 commented Aug 19, 2020

是的,我用的也是c3d,情况跟你一模一样

@aoluming
Copy link
Author

@libb999 我用其他模型也是百分之1,但是我在训练过程中print了输出,发现了一个问题。就是我在网络中不加dropout的话,输出的类索引基本是一个或两个固定值,不论跑多少epoch都是这样。我不知道是网络的问题还是训练代码的问题,或者是data的问题。但是加了pretrain就很高,说明data可能就没问题

@libb999
Copy link

libb999 commented Aug 19, 2020

我感觉代码有问题

@aoluming
Copy link
Author

同感,但是检查了loss,print了梯度,感觉是没有问题的,是有梯度回传的,但是就是loss降不下去 @libb999

@shanchao0906
Copy link

有人解决了吗,训练精度一直很低

@shanchao0906
Copy link

image

@BryceWayne
Copy link

I ran into the same issue. If you reduce the number of classes then the model will converge. For instance, if you reduce ufc101 to 7 classes and then train from scratch the model will converge to 95% validation accuracy. Training from scratch is known to take forever.

@HuangZuShu
Copy link

Is there anyone who solve this problem? I also met the same problem. The loss quickly converge ,but the accuracy is only 1% in top 1 and 5% in top ten.

@BryceWayne
Copy link

The loss should be computed with the outputs. I have good training now.

@BryceWayne
Copy link

Is there anyone who solve this problem? I also met the same problem. The loss quickly converge ,but the accuracy is only 1% in top 1 and 5% in top ten.

Make sure to check that the loss is computed with the outputs.
image

@HuangZuShu
Copy link

Is there anyone who solve this problem? I also met the same problem. The loss quickly converge ,but the accuracy is only 1% in top 1 and 5% in top ten.

Make sure to check that the loss is computed with the outputs.
image

Thank you for your reply!My loss function is computed with the outputs, you can see in the picture following, and I couldn't find any problem.
image

@skyqwe123
Copy link

@jfzhang95 I meet the same error, when I train from scratch on ucf101. The accuracy is very low about(0.001). Do you have any good suggestions?Thanks

@skyqwe123
Copy link

同感,但是检查了loss,print了梯度,感觉是没有问题的,是有梯度回传的,但是就是loss降不下去 @libb999

@aoluming 请问解决了吗?

@alonelysnake
Copy link

同感,但是检查了loss,print了梯度,感觉是没有问题的,是有梯度回传的,但是就是loss降不下去 @libb999

@aoluming 请问解决了吗?

在原代码里写的是loss = criterion(outputs, labels),但实际上应该是probs才对吧。如果把这块改了准备结果会不会变好些。
` if phase == 'train':
outputs = model(inputs)

            else:
                with torch.no_grad():
                    outputs = model(inputs)

            probs = nn.Softmax(dim=1)(outputs)
            preds = torch.max(probs.data, 1)[1]
            labels=labels.long()
            loss = criterion(probs, labels)`

@Krystal0606
Copy link

同感,但是检查了loss,print了梯度,感觉是没有问题的,是有梯度回传的,但是就是loss降不下去 @libb999

@aoluming 请问解决了吗?

在原代码里写的是loss = criterion(outputs, labels),但实际上应该是probs才对吧。如果把这块改了准备结果会不会变好些。
` if phase == 'train':
outputs = model(inputs)

            else:
                with torch.no_grad():
                    outputs = model(inputs)

            probs = nn.Softmax(dim=1)(outputs)
            preds = torch.max(probs.data, 1)[1]
            labels=labels.long()
            loss = criterion(probs, labels)`

您好,请问您这个解决了吗?精度有没有提升呢?

@Krystal0606
Copy link

您好,请问预训练模型怎么加呢?我从0训练在20个epoch左右精度就开始上不去了,训练集精度一直在0.22-0.24之间,验证集精度0.25-0.27之间震荡,没有出现上述提到的只有1%的情况,想请问这是怎么回事呀? @aoluming

@alonelysnake
Copy link

同感,但是检查了loss,print了梯度,感觉是没有问题的,是有梯度回传的,但是就是loss降不下去 @libb999

@aoluming 请问解决了吗?

在原代码里写的是loss = criterion(outputs, labels),但实际上应该是probs才对吧。如果把这块改了准备结果会不会变好些。
` if phase == 'train':
outputs = model(inputs)

            else:
                with torch.no_grad():
                    outputs = model(inputs)

            probs = nn.Softmax(dim=1)(outputs)
            preds = torch.max(probs.data, 1)[1]
            labels=labels.long()
            loss = criterion(probs, labels)`

您好,请问您这个解决了吗?精度有没有提升呢?

试过了还是不行。我看您说您的精度在0.22-0.24左右,请问您对代码做过哪些修改吗?还是设置好路径和超参数后就直接运行了?

@Krystal0606
Copy link

没有做修改,我是按照他的数据处理方法对ucf101进行处理并从0开始训练,到20个epoch左右精度就上不去了。不知道您有没有使用预训练模型跑过呢? @alonelysnake

@Krystal0606
Copy link

想起来,改了一下学习率,从1e-5改成1e-3,不过改动前后差别不大。 @alonelysnake

@alonelysnake
Copy link

@Krystal0606 用不用预训练模型我都试过了,结果都基本1%左右。如果不改我之前提到的那个地方,在学习率是1e-3时loss会报nan,1e-4及以下时loss在9左右。把那个地方改了之后学习率在1e-3的时候也可以跑了,loss降到4左右,但准确率依然保持不变。

@alonelysnake
Copy link

@Krystal0606 我看我前几个epoch的loss和准确率都一直在波动,所以所有的都只训练了五到十次。不知道您的训练是开始时和我一样,然后突然从一个epoch开始提高,还是从一开始就一直在提高呢?

@Krystal0606
Copy link

我的损失值是一直在下降,精度也是一直在提高的,但是精度到22左右就开始震荡了,在学习率为1e-3时没有出现loss为nan的情况,我的loss一开始就差不多4左右最后是降到3左右。不知道这是什么情况 @ @alonelysnake

@alonelysnake
Copy link

@Krystal0606 我在知乎上看到一个人用了预训练模型,代码也没有改动,20个epoch后准确率百分之九十几。这么来看不同电脑上跑出来的结果差异好大,有没有可能是随机种子的问题?我对这方面没研究过。

@Taylor-X76
Copy link

Taylor-X76 commented May 25, 2021

train 10 epoch,C3D的ACC也是1%
另外我改了loss = criterion(probs, labels)
不然ACC会nan

@1009qjm
Copy link

1009qjm commented Oct 27, 2021

一样的超参数,我的训练精度到了70几就不上升了,数据集也是ucf101

@Robin-WZQ
Copy link

同感,但是检查了loss,print了梯度,感觉是没有问题的,是有梯度回传的,但是就是loss降不下去 @libb999

@aoluming 请问解决了吗?

在原代码里写的是loss = criterion(outputs, labels),但实际上应该是probs才对吧。如果把这块改了准备结果会不会变好些。 ` if phase == 'train': outputs = model(inputs)

            else:
                with torch.no_grad():
                    outputs = model(inputs)

            probs = nn.Softmax(dim=1)(outputs)
            preds = torch.max(probs.data, 1)[1]
            labels=labels.long()
            loss = criterion(probs, labels)`

这里的话应该不需要修改,nn.CrossEntropyLoss自带了softmax的功能,https://blog.csdn.net/LIsaWinLee/article/details/107683641

@Eunchan24
Copy link

@aoluming
Hello, I have the same problem as you.
The accuracy is only between 0.02 and 0.03.
Did you solve this problem?
Your help would be greatly appreciated.

@232525
Copy link

232525 commented Jan 6, 2022

你们可以试试把Batch Size设置的大一点,比如说16、20,应该会有奇迹+-_-+

@Wangdanchunbufuz
Copy link

有遇到过一直卡在这地方不动的情况,求助
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests