-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is anyone trying to build CNTK with CUDA 11.1? #3835
Comments
I and my co-worker are working on this at https://github.com/haryngod/CNTK/tree/2.7-cuda-11.1 |
If you manage to successfully build it, i'll definitely be using it! I'm still stuck using GTX 1000 series cards, would love to upgrade. Unfortunately, i have zero experience in compiling cntk so i can't help you in this. |
Interesting! Hope you succeed in building CNTK with CUDA 11 and maybe newer Python version too. |
How is it going? I have met the same error. |
@kassinvin I add /FORCE:MULTIPLE in CNTKv2LibraryDLL > preperence > linker > command line. It will be ok. |
I'm also trying to get this working. The thing I'm stuck on is GPUTensor.cu. It gives a heap error. If you comment out some of the template instantiations (I tried the <float... ones) at the bottom it compiles. I tried to split it into two (with some instantiations in one file, and others in another) , but get multiply defined symbols. Also, tried the repo linked by @haryngod above, but it seems to be set up to use cuda 10 still. I'm not sure if it's supposed to be working yet? Thanks! |
In answer to my own question I added the /FORCE:MULTIPLE thing (suggested by @haryngod) to the MathsCuda and Maths projects too. I seem to have a working cntk.exe! (The 01_OneHidden.cntk example in the Images\GettingStarted folder seems to run.anyway). I did achieve this by a) Commenting out various cudnn calls in SparseMatrix and RNN classes that I suspected I wasn't using (I only use CNNS) and copying the cublasLt64_11.dll dll over manually from the cuda install. I also updated cubblas calls to _v7 where it was a simple replacement. This may be of help to some people. The change to Cuda 11.1 was enacted by modifying various lines in CNTK.Cpp.props D. |
Ok, more advice from my experiments. It turned out I was using an older version of cudnn (cudnn-10.0-v7.3.1) which isn't really designed to work with cuda11.X, and I do suspect that while cntk.exe ran, it wasn't learning properly. I've now replaced this with cudnn-11.1-v8.0.5.39 (I needed to change CUDNN_PATH env variable to point to this). This then throws some new errors as the following functions don't exist: cudnnGetConvolutionForwardAlgorithm These are all used in CuDnnConvolutionEngine.cu I got past this by adding the following near the top of that file (after the includes):
|
Sorry for spaming everyone, but now with cudnn-11.1-v8.0.5.39 I'm getting an exception thrown on the cudnnConvolutionForward call in CuDnnConvolutionEngine.cu. The output is:
(This is calling cntk.exe configFile=02_OneConv.cntk in Examples\Image\GettingStarted) I checked the algorithm being used (m_fwdAlgo.selectedAlgo) is #1, but the workspace.BufferSize() is zero. Any ideas how to fix this gratefully recieved! D. |
No idea if I'm talking to myself, but the exception I reported above is due to the fact that the workspace size calculation in CNTK seems broken (too small) in 3 places in CuDnnConvolutionEngine.cu. Slightly hacky, but replacing the CNTK workspace object with an inline allocation seems to have my c++ code training!
Maybe that helps someone! |
Thanks for sharing the info here @dmagee! Sounds like trying to setup cntk with the latest Cuda is a non trivial task. |
No worries. You're absolutely right. the Nvidia cuDnn library it is based on has changed api, so lots of things need updating. I've just fixed the bits that needed for training CNNs with CNTK.exe or the C++ interface. I've only commented out various other bits (to do with RNNs and Sparce matrices), and not touched any python (I don't use the python api). I'm afraid I don'treally have time to package all this up, but hopefully posting what I've done here can help someone who does. |
Thanks, @dmagee for sharing lots of your experience. I'm not sure you, we met the same problem. |
Another issue I found was in cudnnCommon.cpp on the line: auto err = cudnnDestroy(*src); This causes an crash somewhere in the nvidia cudnn library. Essentially a single instance of cudnnHandle_t is allocated when doing prediction and assigned as a shared_ptr within an instance of the CNTK CuDnn class. It is destroyed on the program exit as the destructors are called. I've no idea why this causes a crash (seemingly the same pointer that was allocated is destroyed, and there are no other relevant calls to cudnnDestroy), or why it only happens when doing prediction, and not learning (in c++ anyway), but my solution was to comment out everything in this tidy up code:
Again (like my other solution above) is a horrible hack as it doesn't actually fix the bug, but it does mean my programs don't crash right at the end. If there was lots of instances of CuDnn created obviously it would be a memory leak, but in my code at least it only seems to do it once and tidy up right at the end. Hopefully helps someone! |
I also get this error on RTX 3060 if I have more than 512 neurons on one layer. With RTX 2060 it works without any error with the same files and nvidia drivers. Loading data... About to throw exception 'CUBLAS failure 13: CUBLAS_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=PC1; expr=cublasgemmHelper(cuHandle, transA, transB, m, n, k, &alpha, a.Data(), (int) a.m_numRows, b.Data(), (int) b.m_numRows, &beta, c.Data(), (int) c.m_numRows)' Unhandled Exception: System.ApplicationException: CUBLAS failure 13: CUBLAS_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=PC1; expr=cublasgemmHelper(cuHandle, transA, transB, m, n, k, &alpha, a.Data(), (int) a.m_numRows, b.Data(), (int) b.m_numRows, &beta, c.Data(), (int) c.m_numRows) [CALL STACK] |
@dmagee I've faced the same issue. I think this issue has occurred in PyTorch(issue link) as well. Then it's the PyTorch PR. Even I read this, I have no idea how I fix this yet. |
I'm trying to get CNTK working on latest CUDA 11 too on Windows. I was wondering why I can't find any Azure Pipeline yml files, so I could use a custom pipeline agent for testing instead of local dev. Anyone know link to Azure DevOps pipelines? Also very interested in whatever changes needed for CUDA 11 to work. |
Hello everyone, based on the work by @haryngod and others I have managed to build CNTK with CUDA 11.4 and cuDNN 8.2.2 and made nuget packages for this. This is detailed in a quick blog post at: https://nietras.com/2021/08/05/reviving-cntk-with-cuda-11-4/ As mentioned there, I had hoped to release the nuget packages on nuget.org but could not due to size limit. Instead packages can be downloaded and you can add them to your own feeds. |
@nietras Amazing work! Thank you for your contributions!! |
@nietras amazing work. Thank you very much. |
@JeppeThagaardVP thanks. I have not hit this issue myself. Do you have some simple reproduction code showing this e. g. in C#? so I don't have to guess around dimension order etc. |
@nietras, not immediately but will work on getting it. In the meantime, I can try to explain what the pipeline that fails is trying to achieve. Input: dim = 2x2x3 The convolution operation throws a cuDNN failure 9: CUDNN_STATUS_NOT_SUPPORTED, which it did not before. Any help would be super appreciated :) |
@JeppeThagaardVP basically the only change to conv is to to use cudnnGetConvolutionForwardAlgorithm_v7 as can be seen in https://github.com/nietras/CNTK/pull/6/files. However, one thing I don't understand in your example code is the stride Based on our higher level API I made some simple example code and with stride |
@nietras that's weird, it does not work on my end. Can I get you to post the exact convolutional map, and parameters for the CNTK::Convolution operation? The params for stride is similar to this example (https://github.com/microsoft/CNTK/blob/b7d4945a8e604268b344e6286e8993bacdba6e5c/Tests/EndToEndTests/CNTKv2Library/Common/Image.h): auto convFunction = Convolution(convParams, input, { hStride, vStride, numInputChannels }); |
@JeppeThagaardVP I am not using the C++ API hence they are not comparable. However, I have saved a simple model that does what I expect/assume you want in a Perhaps you can try load this model and evaluate it. It works on my PC 😅 Note I am only doing evaluate, which might be the issue... |
hello, trying CNTK with CUDA 11.4 on my project, CNTK and FasterRCNN model, inference c++. Thank You very much!! |
I managed to do more tests and I found that the code related to the change from cudnnGetConvolutionForwardAlgorithm to cudnnGetConvolutionForwardAlgorithm_v7 ( and the same for the Backward part) is wrong. The change just selected the first algorithm without considering that the old call had a parameter to specify the current maximum workspace size available. Taking the first algorithm from the function could happen that the allocated workspace is not enough leading to a CUDNN_STATUS_NOT_SUPPORTED error. The correct way is to iterate on all algorithms that are given back to choose one with size <= workspace size. Old Code New One
For the other parts (backward) I can do a pull request.... |
@dmagee the problem that you saw is not a real one, CNTK increases the workspace in an incremental way starting processing the images...so at the start of the program workspace size is 0 and then it gets incremented...this is also the reason why the first images of the batch are usually slower than the others... |
First of all: @nietras & @sigfrid696 thanks a lot for the great effort. I am working with @JeppeThagaardVP on this and we are very close to have it working here, and wonder if @sigfrid696 has a pull request which I can to cover cudnnGetConvolutionBackwardDataAlgorithm_v7, cudnnGetConvolutionBackwardFilterAlgorithm_v7... also |
I gave @sigfrid696's proposal a shot my self and the below changes made CNTK work it our application using Cuda 11.4. cudnnGetConvolutionForwardAlgorithm_v7:
cudnnGetConvolutionBackwardDataAlgorithm:
cudnnGetConvolutionBackwardFilterAlgorithm_v7:
@sigfrid696 I wonder if you would mind to review and make a pull request. After all it was you who solved it:-) I although wonder about the if (!noMem) checks: Can they ever happen and if so I guess the functions will always return CUDNN_STATUS_EXECUTION_FAILED |
Thank You @JohanDore ! Within tomorrow I'll review your code and make the pull request... It seems very similar to mine. The problem that I have is the fact that launching my app not from Vs Studio (release mode with ctrl F5) results in an unmanaged crash. I suspect the problem is not on the changes we're discussing here but other parts of the porting (uninitialized memory) . Regarding the nomem condition, it is triggered on the start of the application but it is not a problem because the function V7 also returns conv algorithms that don't need user workspace memory (size 0) |
@sigfrid696 I had to apply quite some project edits to get it to compile. The changes probably doesn't deserve a Pull Request but maybe they can help you: CNTK_2_8_1_6a20c25a0a8dd7ec21cbb8926f7085e11afd41b3.zip. BTW the zip also include the changes above |
thank you @JohanDore |
@JohanDore @JeppeThagaardVP @sigfrid696 thanks to your work I have released a new version 2.8.2 with your fixes. I hope this solves your issues. I still don't understand why 100 algo count is needed given enumerations for algos have less than 10 😅 If someone cares to explain I am all ears. https://github.com/nietras/CNTK/releases/tag/v2.8.2 |
@nietras hi nietras ed everyone! New v7 functions are capable of giving back more than 10 algos... The old functions accepted a parameter to filter size but the new don't... So if you don't increase the return number, the algorithms with no extra memory needed (size 0) are not given back: for example in my tests I had the algo back in twelve position. Hope to have clarified :) |
@sigfrid696 thanks that helps! And thanks again for doing it and apologies for generating extra work with my changes 😅 |
With the aim of keeping alive CNTK project, I made in the last months some more changes to the original repo.
@nietras let me know if you are interested to merge also these modifications...I think I'll update my repo with these mods in the next days... |
@sigfrid696 we don't use these specific features but yes I would be interested in a PR for that. If you could continue on a branch from my forks master in your fork that would make it easier I think. To avoid merge hell. It would be great if someone else listening was using this and would be able to test/verify it works for others too. :) |
I admit, I'm not so expert using git-hub. I would continue with the fork I did from your master I believe...the same of the previous PR |
@sigfrid696 it can be a bit daunting perhaps for someone new to the pull-request flow https://guides.github.com/introduction/flow/. If you are continuing from the latest |
To be clear what I mean is you need to update to the latest |
I updated to the latest |
I know MS announced that they won't support CNTK anymore.
However, I would like to know who is trying to build CNTK with CUDA 11.1 like me.
If someone trying and have some tips for this, I hope we discuss this.
Now I changed
Now I build Common to ReaderLib.
When I build CNTKv2LibraryDll, I've got blow errors.
Note that I already build Common to CNTK with CUDA 10.1.
The text was updated successfully, but these errors were encountered: