# NVIDIA drivers

There are three levels of NVIDIA drivers:

*   _NVIDIA_ GPU drivers.
*   _CUDA_ libraries and tools.
*   _cuDNN_ libraries.

## Summary

*   Drivers:
    *   If not installing CUDA you can use the `nvidia` proprietary
        drivers from the
	
	[Graphics Drivers](https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa)

        PPA. Or maybe just use the drivers from the CUDA repository
        anyhow. as they tend to be more stable.
    *   If installing CUDA the NVIDIA online repository has matching
        versions of the `nvidia` proprietary drivers, even if they are
        updated less often than the PPA ones, and they should be used.
*   Install the latest 2-4 versions of the `libcuda` libraries from the
    NVIDIA online repository, after installing the `.deb` that points to
    it.
*   Install the latest `libcuDNN` libraries from the cuDNN repository,
    but it must match the version(s) of the CUDA library.

The repositories for drivers+CUDA and for cuDNN are:

-   `http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64`
-   `http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64`

## nVidia GPU drivers (by themselves)

### Description

Linux graphics drivers come in some related parts:

*   Kernel driver, which has a KMS layer and a device specific layer.
*   X11 driver.
*   OpenGL libraries, which usually are the MesaGL libraries.

There are two driver collections for NVIDIA cards in Linux:

*   The `nouveau` one uses standard KMS, a matching X11 driver,
    and the standard MesaGL libraries under
    `/usr/lib/x86_64-linux-gnu/mesa/`.
*   The `nvidia` libraries don't quite use KMS, a matching driver, and
    have their own OpenGL drivers under `/usr/lib/nvidia-`_390_`/`.

We are interested only in the `nvidia` drivers because only those
support CUDA.

### Installation

*   The drivers in the ULTS archives are too old, ad-hoc installation
    causes a lot of issues, so we use the _Graphics Drivers_ PPA (the
    stable one, not the beta one)
    <https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa> or use
    the driver packages in the online NVIDIA CUDA repository if CUDA is
    installed.
*   The kernel driver and the user level versions must match
    **exactly**, that is both major and minor version. This implies
    reboots on upgrades. See below how to "pin" packages to prevent
    upgrades.

## NVIDIA GPU drivers together with other drivers

### Description

Many systems have both a minimal GPU, usually built into the CPU or the
motherboard, plus one or more NVIDIA GPUs for intensive calculation.

That is a situation that often requires a custom two-head `xorg.conf` or
installing two different GPU drivers for the kernel and the X server.

Some easy ways to deal with that are:

*   Use the BIOS to disable the built-in GPU
*   Remove the non-NVIDIA GPU if it is a card
*   Blacklist the non-NVIDIA driver module.
*   Configure `xorg.conf` to ignore the non-NVIDIA card.

But if the X server uses the NVIDIA GPUs it makes uses of a part of its
memory and this can reduce the batch size for training machine learning
systems, and this can significantly (according to local informants)
increase learning times.

So the optimal configuration usually is to leave both GPUs enabled and
only use the non-NVIDIA one for the X server, thus leaving the NVIDIA
card entirely available to CUDA.

The problem with that is that the NVIDIA driver is "invasive" and its
packaging tries to override other GPU drivers, and in particular the
NVIDIA driver requires a custom version of the OpenGL libraries.

### Installation

Resolving the conflicts mentioned above is possible:

1.  Ensure the other driver needed is installed (usually already
    installed as part of the `xorg-server-` packages).
1.  Ensure that the NVIDIA proprietary driver is installed as
    per the previous section.
1.  Ensure that these packages are not installed:
    *   `bumblebee`
    *   `bbswitch-dkms`
    *   `nvidia-prime`
1.  In `/etc/default/grub` add to `GRUB_CMDLINE_LINUX` the option
    `nogpumanager` and run `update-grub2`.
1.  Ensure that different parts of the NVIDIA libraries have different
    priorities than the Xorg/Mesa libraries. The assumption here is that
    no OpenGL applications will run on the NVIDIA cards:
    1.  Ensure the NVIDIA driver is configured running:

            update-alternatives --set i386-linux-gnu_gl_conf \
              /usr/lib/nvidia-NNN/alt_ld.so.conf
            update-alternatives --set x86_64-linux-gnu_gl_conf \
              /usr/lib/nvidia-NNN/ld.so.conf
    2.  To prioritize MesaGL for X windows applications add to
        `/etc/ld.so.conf` as the first two lines these:

            /usr/lib/x86_64-linux-gnu/mesa
            /usr/lib/x86_64-linux-gnu/mesa-egl
        and then run `ldconfig`.
1.  Ensure that the X server uses the Xorg/Mesa version of `glx`
    module by having in `/etc/X11/xorg.conf` in the `Files` section
    these lines (comments may be omitted):

          ModulePath            "/usr/lib/xorg/modules"
1.  Ensure that in `/etc/X11/xorg.conf`:
    * In the `ServerFlags` section the option `AutoAddGPUs` is `false`.
    * In the active `Layout` section refers to a `Screen` sections that
      refers to a `Device` section only for the `nvidia` driver.

A suitable `xorg.conf` may look like:

    # vim:set ft=xf86conf sw=2:

    Section "ServerLayout"
      Identifier            "intel-only"
      Screen                 0 "intel"
    EndSection

    Section "Monitor"
      Identifier            "generic"
      VendorName            "noname"
    EndSection

    Section "Screen"
      Identifier            "intel"
      Device                "intel-GPU"
      Monitor               "generic"
    EndSection

    Section "Device"
      Identifier            "intel-GPU"
      Driver                "intel"

      # Depends on the workstation
      Option                "monitor-HDMI1" "generic"
    EndSection

    Section  "ServerFlags"
      Option                "AutoAddGpu"    "false"
      Option                "DontVTSwitch"  "false"
      Option                "DontZap"       "false"
    EndSection

    Section "Files"
      ModulePath            "/usr/lib/xorg/modules"
    EndSection

## CUDA libraries

### Description

Versioning:

*   The CUDA libraries come in several versions, and usually the latest
    2-3 should be kept installed at the same time, as applications tend
    to be compiled not for the latest.
*   Usually older versions of CUDA are compatible with newer drivers,
    but often new CUDA versions need newer driver versions:
    *   [NVIDIA/CUDA compatibility table](https://docs.nvidia.com/deploy/cuda-compatibility/index.html#binary-compatibility)
    *   [CUDA/cuDNN compatibility table](https://docs.nvidia.com/deeplearning/sdk/cudnn-support-matrix/index.html#cudnn-cuda-hardware-versions).

It is particularly important to note that since CUDA 10.0 the `libcuda`
runtime shared object is compatible with older driver versions, so a
newer driver does not have to be updated and the system rebooted to
install new CUDA versions.

Installation options:

*   There are several ways to install the CUDA libraries:
    *   With an installer.
    *   From the ULTS repository.
    *   From a local repository downloaded from the NVIDIA website.
    *   From the NVIDIA repository.
*   Only installing from the NVIDIA repository gets both recent
    versions and is more easily manageable.

### Installation

Once only:

*   From the
    [downloads page](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_type=debnetwork&target_arch=x86_64&target_distro=Ubuntu&target_version=1604)
    get and install with `dpkg` the `.deb` that contains the sources
    list for the repository.
*   Ensure that the `cuda` package is not installed: it will pull in
    the latest `cuda-`_M_`-`_N_ which is usually not wanted and may
    trigger a wholesale update ahead of need.

Usually:

*   Usual `apt-get update`.
*   Install the latest 2-3-4 versions. For example in 2018-06 that would
    be 9.2, 9.1, 9.0 and 8.0, and in 2022 th1 11.2 version. This should
    be done by installing the relevant `cuda-`_M_`-`_N_ rather than the
    `cuda` package.
*   This also installs the `cuda-drivers` package that pulls in the
    matching NVIDIA proprietary drivers from the same repository.

## cuDNN libraries

### Description

The [_Deep Neural Network_](https://developer.nvidia.com/cudnn)
libraries are only available to registered developers, they cannot be
freely downloaded.

**Warning**: some people think that the cuDNN libraries should be
installed per-person, because different applications used by different
persons are linked to different cuDNN versions, which may not be
compatible. However that mostly they are and a set of default system
shared version (as long as it is recent) is useful.

In particular currently (2018-12) there are these versions:

*   7.4.[12], 7.3.[01]: separate packages for CUDA 9.0, 9.2, 10.0.
*   7.2.1, 7.1.4: for CUDA 8.0, 9.0, 9.2.
*   7.1.[123], 7.0,5,: for CUDA 8.0, 9.0, 9.1.

### Installation

There are two ways:

1.  Download the latest `.deb`s from the NVIDIA developer website, one
    by one, and install them with `dpkg`.
2.  There is a cuDNN Debian/Ubuntu repository provided by NVIDIA:

        deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 /

## Preventing updates

Preventing updates to the kernel driver components may be useful to
avoid the need to reboot, as kernel driver and driver library versions
must match exactly (both major and minor version numbers.

There are two way to prevent updates:

*   [Pinning](https://www.debian.org/doc/manuals/apt-howto/ch-apt-get.en.html#s-pin)
    with something like this in `/etc/apt/preferences`:
    
        Package: nvidia-* libxnvctrl0
        Pin: version NNN.MM-*
        Pin-Priority: 1050

        Package: cuda-drivers libcuda1-*
        Pin: version NNN.MM-*
        Pin-Priority: 1050

    (two blocks are better because `nvidia-`_NNN_ and `cuda-drivers`
    sometimes have different yet compatible versions). This pinning may
    be suitable for a server where driver updates should probably only
    be made intentionally. _Note_: there are two approaches to this kind
    of pinning and this is the more rigid one, where specific major and
    minor numbers are given a higher priority than any candidate from a
    repo.
*   Preventing
    [`unattended-upgrade`](https://help.ubuntu.com/community/AutomaticSecurityUpdates)
    from upgrading with something like:
        
        // List of packages to not update (regexp are supported)
        Unattended-Upgrade::Package-Blacklist {
                "nvidia-.*";
                "libcuda[0-9]-.*";
                "libcuda-[0-9].*";
        //      "vim";
        //      "libc6";
        //      "libc6-dev";
        //      "libc6-i686";
        };
    This is probably not necessary because by default
    `unattended-upgrade` only installs updates from
    the _`$DISTRO`_`_security` archive.

## Example list of installed packages (2018-06-13)

*   These are the "meta" packages that pull in many others:

        ii  cuda                   9.2.88-1        amd64        CUDA meta-package
        ii  cuda-8-0               8.0.61-1        amd64        CUDA 8.0 meta-package
        ii  cuda-9-0               9.0.176-1       amd64        CUDA 9.0 meta-package
        ii  cuda-9-1               9.1.85-1        amd64        CUDA 9.1 meta-package
        ii  cuda-9-2               9.2.88-1        amd64        CUDA 9.2 meta-package
*   The NVIDIA proprietary driver, where `nvidia-`_NNN_ is the "root"
    package that pulls in the others:

        ii  libcuda1-396           396.26-0ubuntu1 amd64        NVIDIA CUDA runtime library
        ii  nvidia-396             396.26-0ubuntu1 amd64        NVIDIA binary driver - version 396.26
        ii  nvidia-396-dev         396.26-0ubuntu1 amd64        NVIDIA binary Xorg driver development files
        ii  nvidia-opencl-icd-396  396.26-0ubuntu1 amd64        NVIDIA OpenCL ICD
*   The NVIDIA auxiliary packages with some tools:

        ii  nvidia-modprobe        396.26-0ubuntu1 amd64        Load the NVIDIA kernel driver and create device files
        ii  nvidia-prime           0.8.2           amd64        Tools to enable NVIDIA's Prime
        ii  nvidia-settings        396.26-0ubuntu1 amd64        Tool for configuring the NVIDIA graphics driver
*   The cuDNN libraries:

        ii  libcudnn7              7.1.3.16-1+cuda9.1 amd64     cuDNN runtime libraries
        ii  libcudnn7-dev          7.1.3.16-1+cuda9.1 amd64     cuDNN development libraries and headers
        ii  libcudnn7-doc          7.1.3.16-1+cuda9.1 amd64     cuDNN documents and samples