解決 TensorFlow 2.0 程式出現 cuDNN failed to initialize 錯誤問題

動機：在執行 TensorFlow 2.0 程式(使用GPU)時，卻出現cuDNN failed to initialize錯誤(如下圖)，要如何解決呢?!

錯誤訊息文字，如下：
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

執行環境：
1.OS: Ubuntu 19.10
2.GPU: GeForce RTX 2070 (DriverVersion: 435.21)
3.CUDA: 10.1(nvidia-smi)、10.0.130(nvcc --version)
4.Python: 3.7.5rc1

操作步驟：
1.參攷[1]，瞭解 TensorFlow 默認情況下會映射幾乎所有GPU內存，所以需在運行時分配內存
...程式碼如下：

def solve_cudnn_error():
    gpus = tf.config.experimental.list_physical_devices('GPU')
    if gpus:
        try:
            # Currently, memory growth needs to be the same across GPUs
            for gpu in gpus:
                tf.config.experimental.set_memory_growth(gpu, True)
            logical_gpus = tf.config.experimental.list_logical_devices('GPU')
            print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
        except RuntimeError as e:
            # Memory growth must be set before GPUs have been initialized
            print(e)

2.為了方便給所有Python程式呼叫，筆者將上述程式存成 solve_cudnn_error.py 再供其它程式內使用，語法如下：

from solve_cudnn_error import *

solve_cudnn_error()

心得：
1. 為了這個 error 找了好久，之前為了 RTX 2070 安裝 Ubuntu 18.04 畫面花了許多時間，也為了 TensorFlow 2.0 只支援 CUDA 10.0 浪費了好些時光，如今執行Python程式也要花時間，Time flies like an arrow.
2. 同樣的硬體不知在 Windows 10平台上的問題如何?!

參攷：

1. tensorflow 2.0 GPU 에러 | GPU 메모리 부족할 때, https://inpages.tistory.com/155

2. Use a GPU, https://www.tensorflow.org/guide/gpu

留言

匿名表示…

超級感謝大大!!

2020年9月9日上午11:19:00 [GMT+8]

亞當斯寫道…

感謝分享~

2020年11月2日晚上11:26:00 [GMT+8]

解決了我的問題。
分享問題訊息如下：

tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found. (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node conv1d_1/convolution}}]] [[dense_1/Sigmoid/_401]] (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node conv1d_1/convolution}}]]

2020年11月26日凌晨3:59:00 [GMT+8]

Thanks!

2021年1月13日晚上10:33:00 [GMT+8]