# NVIDIA drivers There are three levels of NVIDIA drivers: * _NVIDIA_ GPU drivers. * _CUDA_ libraries and tools. * _cuDNN_ libraries. ## Summary * Drivers: * If not installing CUDA you can use the `nvidia` proprietary drivers from the [Graphics Drivers](https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa) PPA. Or maybe just use the drivers from the CUDA repository anyhow. as they tend to be more stable. * If installing CUDA the NVIDIA online repository has matching versions of the `nvidia` proprietary drivers, even if they are updated less often than the PPA ones, and they should be used. * Install the latest 2-4 versions of the `libcuda` libraries from the NVIDIA online repository, after installing the `.deb` that points to it. * Install the latest `libcuDNN` libraries from the cuDNN repository, but it must match the version(s) of the CUDA library. The repositories for drivers+CUDA and for cuDNN are: - `http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64` - `http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64` ## nVidia GPU drivers (by themselves) ### Description Linux graphics drivers come in some related parts: * Kernel driver, which has a KMS layer and a device specific layer. * X11 driver. * OpenGL libraries, which usually are the MesaGL libraries. There are two driver collections for NVIDIA cards in Linux: * The `nouveau` one uses standard KMS, a matching X11 driver, and the standard MesaGL libraries under `/usr/lib/x86_64-linux-gnu/mesa/`. * The `nvidia` libraries don't quite use KMS, a matching driver, and have their own OpenGL drivers under `/usr/lib/nvidia-`_390_`/`. We are interested only in the `nvidia` drivers because only those support CUDA. ### Installation * The drivers in the ULTS archives are too old, ad-hoc installation causes a lot of issues, so we use the _Graphics Drivers_ PPA (the stable one, not the beta one) or use the driver packages in the online NVIDIA CUDA repository if CUDA is installed. * The kernel driver and the user level versions must match **exactly**, that is both major and minor version. This implies reboots on upgrades. See below how to "pin" packages to prevent upgrades. ## NVIDIA GPU drivers together with other drivers ### Description Many systems have both a minimal GPU, usually built into the CPU or the motherboard, plus one or more NVIDIA GPUs for intensive calculation. That is a situation that often requires a custom two-head `xorg.conf` or installing two different GPU drivers for the kernel and the X server. Some easy ways to deal with that are: * Use the BIOS to disable the built-in GPU * Remove the non-NVIDIA GPU if it is a card * Blacklist the non-NVIDIA driver module. * Configure `xorg.conf` to ignore the non-NVIDIA card. But if the X server uses the NVIDIA GPUs it makes uses of a part of its memory and this can reduce the batch size for training machine learning systems, and this can significantly (according to local informants) increase learning times. So the optimal configuration usually is to leave both GPUs enabled and only use the non-NVIDIA one for the X server, thus leaving the NVIDIA card entirely available to CUDA. The problem with that is that the NVIDIA driver is "invasive" and its packaging tries to override other GPU drivers, and in particular the NVIDIA driver requires a custom version of the OpenGL libraries. ### Installation Resolving the conflicts mentioned above is possible: 1. Ensure the other driver needed is installed (usually already installed as part of the `xorg-server-` packages). 1. Ensure that the NVIDIA proprietary driver is installed as per the previous section. 1. Ensure that these packages are not installed: * `bumblebee` * `bbswitch-dkms` * `nvidia-prime` 1. In `/etc/default/grub` add to `GRUB_CMDLINE_LINUX` the option `nogpumanager` and run `update-grub2`. 1. Ensure that different parts of the NVIDIA libraries have different priorities than the Xorg/Mesa libraries. The assumption here is that no OpenGL applications will run on the NVIDIA cards: 1. Ensure the NVIDIA driver is configured running: update-alternatives --set i386-linux-gnu_gl_conf \ /usr/lib/nvidia-NNN/alt_ld.so.conf update-alternatives --set x86_64-linux-gnu_gl_conf \ /usr/lib/nvidia-NNN/ld.so.conf 2. To prioritize MesaGL for X windows applications add to `/etc/ld.so.conf` as the first two lines these: /usr/lib/x86_64-linux-gnu/mesa /usr/lib/x86_64-linux-gnu/mesa-egl and then run `ldconfig`. 1. Ensure that the X server uses the Xorg/Mesa version of `glx` module by having in `/etc/X11/xorg.conf` in the `Files` section these lines (comments may be omitted): ModulePath "/usr/lib/xorg/modules" 1. Ensure that in `/etc/X11/xorg.conf`: * In the `ServerFlags` section the option `AutoAddGPUs` is `false`. * In the active `Layout` section refers to a `Screen` sections that refers to a `Device` section only for the `nvidia` driver. A suitable `xorg.conf` may look like: # vim:set ft=xf86conf sw=2: Section "ServerLayout" Identifier "intel-only" Screen 0 "intel" EndSection Section "Monitor" Identifier "generic" VendorName "noname" EndSection Section "Screen" Identifier "intel" Device "intel-GPU" Monitor "generic" EndSection Section "Device" Identifier "intel-GPU" Driver "intel" # Depends on the workstation Option "monitor-HDMI1" "generic" EndSection Section "ServerFlags" Option "AutoAddGpu" "false" Option "DontVTSwitch" "false" Option "DontZap" "false" EndSection Section "Files" ModulePath "/usr/lib/xorg/modules" EndSection ## CUDA libraries ### Description Versioning: * The CUDA libraries come in several versions, and usually the latest 2-3 should be kept installed at the same time, as applications tend to be compiled not for the latest. * Usually older versions of CUDA are compatible with newer drivers, but often new CUDA versions need newer driver versions: * [NVIDIA/CUDA compatibility table](https://docs.nvidia.com/deploy/cuda-compatibility/index.html#binary-compatibility) * [CUDA/cuDNN compatibility table](https://docs.nvidia.com/deeplearning/sdk/cudnn-support-matrix/index.html#cudnn-cuda-hardware-versions). It is particularly important to note that since CUDA 10.0 the `libcuda` runtime shared object is compatible with older driver versions, so a newer driver does not have to be updated and the system rebooted to install new CUDA versions. Installation options: * There are several ways to install the CUDA libraries: * With an installer. * From the ULTS repository. * From a local repository downloaded from the NVIDIA website. * From the NVIDIA repository. * Only installing from the NVIDIA repository gets both recent versions and is more easily manageable. ### Installation Once only: * From the [downloads page](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_type=debnetwork&target_arch=x86_64&target_distro=Ubuntu&target_version=1604) get and install with `dpkg` the `.deb` that contains the sources list for the repository. * Ensure that the `cuda` package is not installed: it will pull in the latest `cuda-`_M_`-`_N_ which is usually not wanted and may trigger a wholesale update ahead of need. Usually: * Usual `apt-get update`. * Install the latest 2-3-4 versions. For example in 2018-06 that would be 9.2, 9.1, 9.0 and 8.0, and in 2022 th1 11.2 version. This should be done by installing the relevant `cuda-`_M_`-`_N_ rather than the `cuda` package. * This also installs the `cuda-drivers` package that pulls in the matching NVIDIA proprietary drivers from the same repository. ## cuDNN libraries ### Description The [_Deep Neural Network_](https://developer.nvidia.com/cudnn) libraries are only available to registered developers, they cannot be freely downloaded. **Warning**: some people think that the cuDNN libraries should be installed per-person, because different applications used by different persons are linked to different cuDNN versions, which may not be compatible. However that mostly they are and a set of default system shared version (as long as it is recent) is useful. In particular currently (2018-12) there are these versions: * 7.4.[12], 7.3.[01]: separate packages for CUDA 9.0, 9.2, 10.0. * 7.2.1, 7.1.4: for CUDA 8.0, 9.0, 9.2. * 7.1.[123], 7.0,5,: for CUDA 8.0, 9.0, 9.1. ### Installation There are two ways: 1. Download the latest `.deb`s from the NVIDIA developer website, one by one, and install them with `dpkg`. 2. There is a cuDNN Debian/Ubuntu repository provided by NVIDIA: deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 / ## Preventing updates Preventing updates to the kernel driver components may be useful to avoid the need to reboot, as kernel driver and driver library versions must match exactly (both major and minor version numbers. There are two way to prevent updates: * [Pinning](https://www.debian.org/doc/manuals/apt-howto/ch-apt-get.en.html#s-pin) with something like this in `/etc/apt/preferences`: Package: nvidia-* libxnvctrl0 Pin: version NNN.MM-* Pin-Priority: 1050 Package: cuda-drivers libcuda1-* Pin: version NNN.MM-* Pin-Priority: 1050 (two blocks are better because `nvidia-`_NNN_ and `cuda-drivers` sometimes have different yet compatible versions). This pinning may be suitable for a server where driver updates should probably only be made intentionally. _Note_: there are two approaches to this kind of pinning and this is the more rigid one, where specific major and minor numbers are given a higher priority than any candidate from a repo. * Preventing [`unattended-upgrade`](https://help.ubuntu.com/community/AutomaticSecurityUpdates) from upgrading with something like: // List of packages to not update (regexp are supported) Unattended-Upgrade::Package-Blacklist { "nvidia-.*"; "libcuda[0-9]-.*"; "libcuda-[0-9].*"; // "vim"; // "libc6"; // "libc6-dev"; // "libc6-i686"; }; This is probably not necessary because by default `unattended-upgrade` only installs updates from the _`$DISTRO`_`_security` archive. ## Example list of installed packages (2018-06-13) * These are the "meta" packages that pull in many others: ii cuda 9.2.88-1 amd64 CUDA meta-package ii cuda-8-0 8.0.61-1 amd64 CUDA 8.0 meta-package ii cuda-9-0 9.0.176-1 amd64 CUDA 9.0 meta-package ii cuda-9-1 9.1.85-1 amd64 CUDA 9.1 meta-package ii cuda-9-2 9.2.88-1 amd64 CUDA 9.2 meta-package * The NVIDIA proprietary driver, where `nvidia-`_NNN_ is the "root" package that pulls in the others: ii libcuda1-396 396.26-0ubuntu1 amd64 NVIDIA CUDA runtime library ii nvidia-396 396.26-0ubuntu1 amd64 NVIDIA binary driver - version 396.26 ii nvidia-396-dev 396.26-0ubuntu1 amd64 NVIDIA binary Xorg driver development files ii nvidia-opencl-icd-396 396.26-0ubuntu1 amd64 NVIDIA OpenCL ICD * The NVIDIA auxiliary packages with some tools: ii nvidia-modprobe 396.26-0ubuntu1 amd64 Load the NVIDIA kernel driver and create device files ii nvidia-prime 0.8.2 amd64 Tools to enable NVIDIA's Prime ii nvidia-settings 396.26-0ubuntu1 amd64 Tool for configuring the NVIDIA graphics driver * The cuDNN libraries: ii libcudnn7 7.1.3.16-1+cuda9.1 amd64 cuDNN runtime libraries ii libcudnn7-dev 7.1.3.16-1+cuda9.1 amd64 cuDNN development libraries and headers ii libcudnn7-doc 7.1.3.16-1+cuda9.1 amd64 cuDNN documents and samples