-
Notifications
You must be signed in to change notification settings - Fork 950
From Zero to Lasagne on Ubuntu 14.04
This guide provides step-by-step instructions to get Lasagne up and running on Ubuntu 14.04 (and possibly others), including BLAS and CUDA support.
If you run into trouble or have any suggestions for improvements, please let us know on the mailing list: https://groups.google.com/forum/#!forum/lasagne-users
Also let us know if you successfully used or adapted the steps for other versions of Ubuntu.
This installs the bare minimum needed: A compiler, pip, numpy and scipy, Theano and Lasagne.
Installing the prerequisites is fairly easy. Open a terminal and run:
sudo apt-get install -y gcc g++ gfortran build-essential git wget libopenblas-dev python-dev python-pip python-nose python-numpy python-scipy
Still in a terminal, run:
pip install --user --upgrade --no-deps https://github.com/Theano/Theano/archive/master.zip
pip install --user --upgrade --no-deps https://github.com/Lasagne/Lasagne/archive/master.zip
This will install the bleeding-edge versions of Theano and Lasagne for your user. Whenever you want to update to the latest version, just run the two commands again.
To test your installation, download Lasagne's MNIST example. We assume you want to put it into your home directory, in the subdirectory code/mnist
. Still in a terminal, run:
mkdir -p ~/code/mnist
cd ~/code/mnist
wget https://github.com/Lasagne/Lasagne/raw/master/examples/mnist.py
python mnist.py mlp 5
If everything was installed correctly, this will download the MNIST dataset (11 MiB) and train a simple network on it for 5 epochs. Including compilation, this should take between 1 and 5 minutes.
BLAS (Basic Linear Algebra Subprograms) is a specification for a set of linear algebra building blocks that many other libraries depend on, including numpy and Theano. Several vendors and open-source projects provide optimized implementations of these routines. The installation instructions above already install a precompiled version of OpenBLAS that should be usable by Theano.
To test whether Theano can use OpenBLAS, download Theano's BLAS check and run it. We assume you want to put it in a temporary directory. Still in a terminal, run:
cd /tmp
wget https://github.com/Theano/Theano/raw/master/theano/misc/check_blas.py
python check_blas.py
If everything works correctly, near the end of the output, it should say:
Total execution time: 31.37s on CPU (with direct Theano binding to blas).
The execution time may be very different, the important point is direct Theano binding to blas
.
For improved performance, you may want to compile OpenBLAS for your specific CPU architecture. The easiest way to do so is to use the source package of Ubuntu's OpenBLAS. The following commands should do the trick:
# create a directory to work in
cd /tmp
mkdir OpenBLAS
cd OpenBLAS
# obtain the source code and install tools needed to build it
apt-get source openblas
sudo apt-get build-dep openblas
sudo apt-get install build-essential dpkg-dev cdbs devscripts patch
# change some configuration options
cd openblas-*
nano -w Makefile.rule # uncomment "NO_WARMUP = 1" and "NO_AFFINITY = 1"
# compile (will take about one minute)
fakeroot debian/rules custom
# if compilation went through, you can install the new OpenBLAS
sudo dpkg -i ../libopenblas-base*.deb
# and set it up as usual
sudo update-alternatives --config libblas.so.3gf
sudo update-alternatives --config liblapack.so.3gf
Run the test again, as described in the previous section. Most likely, performance will have improved.
For even better performance, you may want to try compiling OpenBLAS from the original source as described on http:https://www.openblas.net/. (Please feel free to extend this guide accordingly.)
If compilation fails, it's possible that your CPU architecture is newer than what Ubuntu's OpenBLAS supports. The easiest solution in this case is to specify an alternative architecture manually. To do so, you would need to edit another file:
nano -w debian/rules
Where it says LANG=C debian/rules TARGET=custom build binary
, replace custom
with one of the architectures in the TargetList.txt
file, the latest one that your CPU supports. Then run the compilation again (fakeroot debian/rules custom
) and continue from there.
To be able to train networks on an Nvidia GPU using CUDA, we will need to install the proprietary Nvidia driver and CUDA and adapt some configuration files.
First we need to install another prerequisite:
sudo apt-get install linux-headers-generic
Without this, the driver module cannot be compiled.
At https://developer.nvidia.com/cuda-downloads, choose the download for Linux > x86_64 > Ubuntu > 14.04 > deb (network).
Save it somewhere locally, we assume /tmp/cuda.deb
, and install it from a terminal:
sudo dpkg -i /tmp/cuda.deb
It is important to understand that this has not installed anything yet, it just added Nvidia's package repository to Ubuntu's repository list. Run the following command to update Ubuntu's package database with the new packages available:
sudo apt-get update
Again, this didn't install anything.
There are two options how to proceed from here.
If you don't care, you can just install CUDA along with the examples and the latest driver using:
sudo apt-get install cuda
To have better control about what's installed when, you can use the repository to only install the latest driver. Run the following in a terminal to see all available Nvidia driver versions:
aptitude search nvidia-3 -F'%p' | grep -vF ':i386'
For example, this may produce:
...
nvidia-346
nvidia-346-dev
nvidia-346-updates
nvidia-346-updates-dev
nvidia-346-updates-uvm
nvidia-346-uvm
nvidia-352
nvidia-352-dev
nvidia-352-updates
nvidia-352-updates-dev
nvidia-352-uvm
Usually, you will want to install the latest driver. You need both the normal and the "uvm" version. Here, it would be:
sudo apt-get install nvidia-352 nvidia-352-uvm
Now install the toolkit via the runfile from https://developer.nvidia.com/cuda-downloads, at Linux > x86_64 > Ubuntu > 14.04 > runfile (local).
When it asks where to install the toolkit, accept the default location (/usr/local/cuda-x.y
) and let it create the symlink (/usr/local/cuda
).
You do not need to install the samples, and you must not install the driver.
Note: When installing the CUDA toolkit via the runfile, never install the driver from the runfile. Always install it via apt-get
as explained before, so the package manager knows. In the worst case, you may end up with an unbootable system otherwise!
Independently of whether you chose Option A or Option B above, there are a few configuration files we need to create or adapt now.
To make the CUDA compiler available to all users, adapt /etc/environment
:
nano -w /etc/environment
Add :/usr/local/cuda/bin
to the end of the list. If instead you want to make it available for the current user only, add export PATH=/usr/local/cuda/bin:"$PATH"
to the end of your ~/.profile
file.
To make the libraries available to all users, run the following:
sudo sh -c "echo /usr/local/cuda/lib64 > /etc/ld.so.conf.d/cuda.conf"
sudo ldconfig
If instead you want to make it available for the current user only, add export LD_LIBRARY_PATH=/usr/local/cuda/lib64:"$LD_LIBRARY_PATH"
to the end of your ~/.profile
file.
Finally, if you haven't done so since the driver installation, you will need to reboot your machine and cross fingers. This is required after every driver update, otherwise CUDA will stop working -- take care when you're maintaining a GPU server.
For a first sanity check, run nvidia-smi
from a terminal. It should display information about all supported GPU devices.
To also try CUDA, compile a simple test program.
cd /tmp
wget https://gist.github.com/f0k/0d6431e3faa60bffc788f8b4daa029b1/raw/2e37a83a97f5df27e53326ec16879fcbd94850bf/cuda_check.c
nvcc -o cuda_check cuda_check.c -lcuda
If everything worked, you've got a program you can run now:
./cuda_check
It will produce output similar to the following:
Found 2 device(s).
Device: 0
Name: GeForce GTX 970
Compute Capability: 5.2
Multiprocessors: 13
CUDA Cores: 2496
Concurrent threads: 26624
GPU clock: 1329 MHz
Memory clock: 3505 MHz
Total Memory: 4093 MiB
Free Memory: 4014 MiB
Device: 1
Name: Tesla K40c
Compute Capability: 3.5
Multiprocessors: 15
CUDA Cores: 2880
Concurrent threads: 30720
GPU clock: 875.5 MHz
Memory clock: 3004 MHz
Total Memory: 11519 MiB
Free Memory: 11421 MiB
If it doesn't, the driver may not have been loaded properly. This can often be fixed by running any CUDA program as root, such as the one we just compiled:
sudo ./cuda_check
Afterwards it should also work as a normal user. To do this automatically on every boot, do the following:
sudo cp -a cuda_check /root
sudo sh -c "echo '@reboot root /root/cuda_check' > /etc/cron.d/cuda"
More recent GPUs support a boosting mode with increased core frequency. It can be enabled as follows:
sudo nvidia-smi -i 0 -pm 1 # set persistence mode
sudo nvidia-smi -i 0 -ac <mem>,<core> # set gpu boost
Where 0
is the device number, and <mem>,<core>
is a pair of frequencies to set (e.g., 3004,875
for a Tesla K40c). The highest supported frequencies for a device can be listed with:
nvidia-smi -i 0 --query-supported-clocks=mem --format=csv,nounits | head -n2
nvidia-smi -i 0 --query-supported-clocks=gr --format=csv,nounits | head -n2
To enable boost automatically on every boot, place a shell script executing the correct commands in /root/gpu_boost.sh
and add it to /etc/cron.d/cuda
as shown in the previous section.
To make Theano use CUDA automatically for all your scripts, all that's left to do is to create a configuration file in your home directory:
echo "[global]
device=gpu
floatX=float32" > ~/.theanorc
Again, we will use Theano's BLAS check to test the installation:
cd /tmp
wget -N https://github.com/Theano/Theano/raw/master/theano/misc/check_blas.py
python check_blas.py
Near the end, it should say:
Total execution time: 0.61s on GPU.
Again, the execution time varies wildly depending on the GPU (it could be over 10 seconds), the critical part is on GPU
.
Finally, for improved performance especially for ConvNets, you should install Nvidia's cuDNN library. After registering at https://developer.nvidia.com/cudnn, you can download it from https://developer.nvidia.com/rdp/cudnn-download.
You will obtain a .tar.gz file. Extract it directly into your CUDA installation:
cd /usr/local
tar -xzf <your-downloaded-file>.tar.gz
And update the shared library cache:
sudo ldconfig
Theano and Lasagne should now be able to use cuDNN. To check, run:
python -c "from theano.sandbox.cuda.dnn import dnn_available as da; print(da() or da.msg)"
If everything is configured correctly, it will say something like:
Using gpu device 0: GeForce GTX 970 (CNMeM is disabled, CuDNN 4007)
True
Otherwise you will receive an error message you can search for online.
For somewhat improved performance, you can adapt your .theanorc
file to include some additional flags:
nano -w ~/.theanorc
Append the following lines:
[dnn.conv]
algo_fwd = time_once
algo_bwd_data = time_once
algo_bwd_filter = time_once
[lib]
cnmem = .45
For more configuration options, consult the Theano documentation.