解決 TensorFlow 2.0 程式出現 cuDNN failed to initialize 錯誤問題

動機:在執行 TensorFlow 2.0 程式(使用GPU)時,卻出現cuDNN failed to initialize錯誤(如下圖),要如何解決呢?!

錯誤訊息文字,如下:
tensorflow.python.framework.errors_impl.UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

執行環境
1.OS: Ubuntu 19.10
2.GPU: GeForce RTX 2070 (DriverVersion: 435.21)
3.CUDA: 10.1(nvidia-smi)、10.0.130(nvcc --version)
4.Python: 3.7.5rc1

操作步驟
1.參攷[1],瞭解 TensorFlow 默認情況下會映射幾乎所有GPU內存,所以需在運行時分配內存
...程式碼如下:

def solve_cudnn_error():
    gpus = tf.config.experimental.list_physical_devices('GPU')
    if gpus:
        try:
            # Currently, memory growth needs to be the same across GPUs
            for gpu in gpus:
                tf.config.experimental.set_memory_growth(gpu, True)
            logical_gpus = tf.config.experimental.list_logical_devices('GPU')
            print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
        except RuntimeError as e:
            # Memory growth must be set before GPUs have been initialized
            print(e)

2.為了方便給所有Python程式呼叫,筆者將上述程式存成 solve_cudnn_error.py 再供其它程式內使用,語法如下:

from solve_cudnn_error import *

solve_cudnn_error()

心得
1. 為了這個 error 找了好久,之前為了 RTX 2070 安裝  Ubuntu 18.04 畫面花了許多時間,也為了 TensorFlow 2.0 只支援 CUDA 10.0 浪費了好些時光,如今執行Python程式也要花時間,Time flies like an arrow.
2. 同樣的硬體不知在 Windows 10平台上的問題如何?!

參攷
1. tensorflow 2.0 GPU 에러 | GPU 메모리 부족할 때, https://inpages.tistory.com/155

留言