How can I run Keras on a GPU

GPU support for Tensorflow on Windows 10

# AI workshop

A few words of encouragement in advance

I've never had a notebook that worked as smoothly and consistently as my new 15-inch Microsoft Surface Book 2. Really cool stuff. It's a shame that all of these sentences end with a "but". Here is the "but" of this story:
If you intend to implement and optimize Deep Neural Networks DNN, it is essential that the calculations take place on the GPU. Of course, the calculations can also run on the CPU, which is easy to install. The disadvantage of this, however, is that it takes a long time to train larger models. The secret behind sophisticated AI is that the perfect configuration of a model is only achieved after several training units - which takes a lot of computing power and thus time.
If you ask AI gurus like François Chollet (the maker of the Keras framework), they would say: “Use Linux or GPU cloud computers!” I have to agree with them; installing GPU support is really easy on Linux and super complicated on Windows.
Nonetheless, it is possible to force your Windows 10 to open the gates to the GPU API, and I'll show you exactly how to do that in this blog post. Then you can train your neural networks much faster and easily regain the time that you now have to invest.
An important note at the beginning: At the beginning I thought that I could ignore the recommended driver and software versions and have always used the latest versions. That was a really bad, frustrating, and time consuming idea. Once I used the versions that I recommend in this post too, it was a lot easier.
So get yourself a coffee, put some cool music on your ears and take off

1. Get Python into shape

First of all, it makes sense to bring the Python environment into a predefined form.
If you don't already have Python installed, this can be done via the Anaconda distribution ( With this installation you should make sure to use the bundle for Python 3.xx 64 bit, and not 2.xx.

Fig 1: Download Anaconda for Python 3.7

First, pip (pip is the Python packaging manager) needs to be updated to the latest version:
  1. Start the cmd as an administrator
  2. Run "exe -m pip install -upgrade pip
  3. Run "pip –version”To check the version. For me it is "pip 18.1 from c: \ programdata \ anaconda3 \ lib \ site-packages \ pip (python 3.6)

I have uninstalled all TensorFlow and Keras libraries if they are not version 1.10.0. No joke: I recommend using the exact version mentioned to save time. Otherwise the combinations do not match. Run "pip show tensorflow tensorflow-gpu keras”.

Fig. 2: Check whether and which version of Tensorflow is installed

Don't worry if TensorFlow isn't already installed, we'll do that later.

2. Install Microsoft Visual Studio 2017

To be precise: I don't actually need Visual Studio for GPU support and can uninstall it later. However, it is necessary to use Visual Studio to compile the GPU driver samples and verify that the installation was successful. It's the first big stop on our way to be able to test whether the operating system part is working properly. Therefore I download the "Visual Studio IDE Community Edition 2017":
Now I choose the packages to be installed. In the end, there is only one thing that really matters. The "Windows 10 SDK (10.0.15063.0)" has to be installed.

Fig. 3: Select all necessary packages for the Visual Studio installation

3. Windows Update

Your Windows should be up to date. Everything worked much better for me after installing the last NVIDIA driver update.

Fig. 4: The update of the NVIDIA display indicates whether the NVIDIA graphics driver was installed successfully or not

To find out whether the NVIDIA driver is correctly installed and configured, you can do the following:
Step 1: Open the "Performance" tab in the task manager:

Fig 5: The CPU 1 must run under NVIDIA GeForce GTX 1060

Fig. 6: In the second step, the installed drivers are checked in the device manager.

The driver must be installed in "C: \ Program Files \ NVIDIA Corporation \".
Again, it's important to be picky about the version numbers. I found out that there isn't a lot of version tolerance in the tools.
In the third step, I will now check the NVIDIA settings.

Fig 7: The NVIDIA high-performance processor should be selected here

If you right-click on the desktop, you get the menu item "NVIDIA Control Panel" (if not, try downloading and installing the driver from NVIDIA. That didn't work for me because Windows removed the driver after every restart but maybe….).

4. Install CUDA 9.0

CUDA is the API gateway to your NVIDIA graphics processor to have C ++ access to it. over CUAD a deep neural network can be massively calculated in parallel. However, you need exactly the version "cuda_9.0.176.1_windows“And all patches. It is a little difficult to find as they are already older versions. Some screenshots can be found here:

Fig. 8: The older versions are hidden behind the "Legacy Releases" button

Fig. 9: The exact CUDA Toolkit version number is important here

Fig 10: Download the Baseinstaller and all patches

Now you load the Basic installer and all Patches down.
Then you install the "Base installer". The program warned me that there was a version conflict. But I ignored this warning and continued the installation.

Fig 11: The security warning can be ignored

After that, I was able to install all the patches in ascending order.

5. Install CUDNN

For Deep Neuronal Networks cuDNN I need the CUDA extension, specifically the version: "cudnn-9.0-windows10-x64-v7.3.0.29". It is a bit difficult to find the download, as you first have to create an account with NVIDIA (I created an account directly and did not use my Google or Facebook account, as the login definitely works that way).

Fig 12: cuDNN download page after you have logged in.

Click on "Download cuDNN

Fig. 13: cuDNN download page after ticking “I Agree…”

Here I first select "I Agree To the Terms often he ...". Then the link "Archived cuDNN releases“Appears.

Fig 14: The correct version of cuDNN is extremely important

Fig 15: Download of cuDNN for Windows 10

cuDNN is a ZIP file that you simply unzip and copy the content into the CUDA installation directory:
  1. Copy . \ cuda \ bin \ cudnn64_7.dll to C: \ Program Files \ NVIDIA GPU Computing Toolkit \ CUDA \ v9.0 \ bin
  2. Copy . \ cuda \ include \ cudnn.h to C: \ Program Files \ NVIDIA GPU Computing Toolkit \ CUDA \ v9.0 \ include
  3. Copy . \ cuda \ lib \ x64 \ cudnn.lib to C: \ Program Files \ NVIDIA GPU Computing Toolkit \ CUDA \ v9.0 \ lib \ x64

6. Check whether the CUDA installation worked

I will now check whether the Windows part was successful so that I can then concentrate on the pure Python part. This small intermediate step is extremely helpful for troubleshooting.
First I check whether all system variables are set correctly:

Fig. 16: The system variables CUDA_PATH and CUDA_PATH_V9_0 are normally set automatically.

Fig. 17: The three CUDA entries should have been set automatically in the Windows PATH.

I'm compiling the CUDA sample code to verify that CUDA has installed properly. To do this, I start Visual Studio and open the corresponding project. The sample code projects should be stored here: "C: \ ProgramData \ NVIDIA Corporation \ CUDA Samples \ v9.0
  1. Via the menu itemDate -> Open -> Project / Project folder ... lets the project Samples_vs2017.sln" to open

    Fig. 18: Select the samples for Visual Studio 2017 here

  2. After the project folder has been opened, I select the project in the "Project folder explorer"deviceQuery

    Fig. 19: I have only selected and compiled the "DeviceQuery"

    Fig. 20: The "success message" in the output is important here

  3. Now I choose with a right click in the context menu "Create“To build the project. The output looks like this:
  4. After that, I quit Visual Studio and start mine cmd
  5. The program was after C: \ ProgramData \ NVIDIA Corporation \ CUDA Samples \ v9.0 \ bin \ win64 \ Debug”Compiled.
    Consequently: "CDC: \ ProgramData \ NVIDIA Corporation \ CUDA Samples \ v9.0 \ bin \ win64 \ Debug
  6. I start "deviceQuery.exe”. The output should look something like this. The only important thing here is that the GPU was addressed under Device 0

    Fig. 21: The deviceQuery.exe application tells you in the last line whether CUDA is running or not

  7. Lean back and enjoy the sight!

7. Installation of Keras and Tensorflow

The final steps are no longer difficult. If the deviceQuery.exe can be executed from above, it is now only a question of the correct versions. So:

  1. Start the cmd as an administrator
  2. Install TensorFlow 1.10.0.
    pip install tensorflow == 1.10.0
  3. Install TensorFlow-gpu 1.10.0
    pip install tensorflow-gpu == 1.10.0
  4. Install Keras 2.2.4
    pip install keras == 2.2.4

I downloaded the following script. Actually, this is just a kind of Hello World for TensorFlow GPU (Caution: You should convince yourself beforehand that it is really harmless and cannot cause any damage.)
I saved the script locally and now let it run:

  1. Open your cmd
  2. Change to the directory with the script
  3. "Python"
  4. The output should be something like the one shown in the illustration ... The lines marked in yellow are important and show that the GPU has been addressed.

Fig 22. A little hidden you are told whether TensorFlow is running on the GPU or CPU.

Since this is a little small in the screenshot, I copied the crucial text here again: "Created TensorFlow device (/ job: localhost / replica: 0 / task: 0 / device: GPU: 0 with 2044 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060, pci bus id: 0000: 02: 00.0, compute capability: 6.1)
It has to be GPU and not CPU.

8. Run Keras Code on the GPU

Now when I run Keras Code, the following error may appear:
InternalError: Blas GEMM launch failed: a.shape = (128, 784), b.shape = (784, 484), m = 128, n = 484, k = 784
[[Node: dense_1 / MatMul = MatMul [T = DT_FLOAT, _class = ["loc: @ training / RMSprop / gradients / dense_1 / MatMul_grad / MatMul_1"], transpose_a = false, transpose_b = false, _device = "/ job: localhost / replica: 0 / task: 0 / device: GPU: 0 “] (_ arg_dense_1_input_0_0 / _37, dense_1 / kernel / read)]]
To avoid this, you always have to copy the following configuration into the code:
If you really want to be sure that the code is running on the GPU, you can run the following Python script. This is a Keras model that is to be built with 500 epochs. Of course, this doesn't make any technical sense, but it creates the time while it's running to open the Task Manager and view the GPU load.

Fig. 23: The GPU 1 is loaded as long as you are training the model.

If you stop the script with "Ctrl" + "C", you will see that the load drops to 0 immediately.

Final remark

It would be nicer if there was a simple Install.exe that installs everything and doesn't rush us through this dark valley of different versions. Nonetheless, it works now and is it the amount of mistakes that you have already made or seen in your life that make you a senior?
Have fun with your DNNs!