Failed call to cuinit: cuda_error_no_device: no cuda-capable device is detected

TensorFlow-gpu报错 failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

Failed call to cuinit: cuda_error_no_device: no cuda-capable device is detected

小白启程

Failed call to cuinit: cuda_error_no_device: no cuda-capable device is detected
于 2022-02-12 20:34:22 发布
Failed call to cuinit: cuda_error_no_device: no cuda-capable device is detected
2218
Failed call to cuinit: cuda_error_no_device: no cuda-capable device is detected
收藏 1

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。

记录错误,和解决方式

错误

failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
这个也是我今天运行TensorFlow-GPU的一个cnn-svm-classifier的程序遇到的,并不是程序的问题是我们显卡的事。

解决方式

import tensorflow as tf

config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.InteractiveSession(config=config)

只需要把这几行代码加入到代码首部,并不需要下面写这句代码。os.environ['CUDA_VISIBLE_DEVICES'] = '/gpu:0'

Liyao Lyu

Guest

  • #1

Liyao Lyu Asks: "failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected" when using MPI
I'm trying to use mpi4py with tensorflow. The test code is shown below

Code:

import sys
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
tf.config.experimental.set_visible_devices(devices=gpus[0], device_type='GPU')

When I run

It gives me good result

Code:

2022-04-18 21:41:38.760540: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-04-18 21:41:40.404001: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2022-04-18 21:41:40.480838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:81:00.0 name: NVIDIA A100-SXM4-40GB computeCapability: 8.0
coreClock: 1.41GHz coreCount: 108 deviceMemorySize: 39.59GiB deviceMemoryBandwidth: 1.41TiB/s
2022-04-18 21:41:40.480883: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-04-18 21:41:40.492554: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2022-04-18 21:41:40.492590: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2022-04-18 21:41:40.496439: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2022-04-18 21:41:40.498045: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2022-04-18 21:41:40.499642: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2022-04-18 21:41:40.502452: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2022-04-18 21:41:40.503149: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2022-04-18 21:41:40.505705: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0

But if I use MPi and run

Code:

mpirun -np 2 python3 test2.py

It cannot find the gpu then.

Code:

2022-04-18 21:43:02.599105: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2022-04-18 21:43:04.340856: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2022-04-18 21:43:04.340964: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2022-04-18 21:43:04.366697: E tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2022-04-18 21:43:04.366725: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: g013.anvil.rcac.purdue.edu
2022-04-18 21:43:04.366731: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: g013.anvil.rcac.purdue.edu
2022-04-18 21:43:04.366788: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 470.57.2
2022-04-18 21:43:04.366804: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 470.57.2
2022-04-18 21:43:04.366810: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 470.57.2
Traceback (most recent call last):
  File "/home/x-lyuliyao/ML2/test2.py", line 4, in <module>
    tf.config.experimental.set_visible_devices(devices=gpus[0], device_type='GPU')
IndexError: list index out of range
2022-04-18 21:43:04.368168: E tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2022-04-18 21:43:04.368191: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: g013.anvil.rcac.purdue.edu
2022-04-18 21:43:04.368198: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: g013.anvil.rcac.purdue.edu
2022-04-18 21:43:04.368283: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 470.57.2
2022-04-18 21:43:04.368302: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 470.57.2
2022-04-18 21:43:04.368308: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 470.57.2
Traceback (most recent call last):
  File "/home/x-lyuliyao/ML2/test2.py", line 4, in <module>
    tf.config.experimental.set_visible_devices(devices=gpus[0], device_type='GPU')
IndexError: list index out of range```

Here is my relevant package version

mpi 1.0 mpich
mpi4py 3.1.1 py39h0a00275_0 https://conda.deepmodeling.org mpich 3.3.2 hc856adb_0
tensorflow-base 2.5.0 gpu_py39h7c1560b_0 https://conda.deepmodeling.org tensorflow-estimator 2.5.0 pyh7b7c402_0
cudatoolkit 11.3.1 h2bc3f7f_2
cudnn 8.2.1 cuda11.3_0

SolveForum.com may not be responsible for the answers or solutions given to any question asked by the users. All Answers or responses are user generated answers and we do not have proof of its validity or correctness. Please vote for the answer that helped you in order to help others find out which is the most helpful answer. Questions labeled as solved may be solved or may not be solved depending on the type of question and the date posted for some posts may be scheduled to be deleted periodically. Do not hesitate to share your response here to help other visitors like you. Thank you, solveforum.

/forums/whats-new/news-feed

  • Iniyavan
  • A moment ago
  • Technology Forum
  • Replies: 0

Iniyavan Asks: Azure SQL DB - Scale Up - vCore Hyperscale or Azure Managed Instance?
We are currently using Azure SQL DB for our data warehouse project. Though it is an OLAP application, it has partial OLTP functionality as well. Our DB's current configuration is Basic, DTU based. Now the requirement is that the DB size may increase till 10 TB. We need to scale up the DB. Which is the best suitable model? When analyzed, I found there are two options - vCore (Hyperscale) or Azure Managed Instance? Which one is the best?

SolveForum.com may not be responsible for the answers or solutions given to any question asked by the users. All Answers or responses are user generated answers and we do not have proof of its validity or correctness. Please vote for the answer that helped you in order to help others find out which is the most helpful answer. Questions labeled as solved may be solved or may not be solved depending on the type of question and the date posted for some posts may be scheduled to be deleted periodically. Do not hesitate to share your thoughts here to help others.

  • Mugil Karthikeyan
  • A moment ago
  • Technology Forum
  • Replies: 0

Mugil Karthikeyan Asks: PyODBC takes 6 seconds to establish a connection with Azure SQL Server
PyODBC takes ~7 seconds to establish a connection with Azure SQL Server, is there a way to minimize this?

Code:

import os
import sys
import logging, logging.handlers
import getopt
import pyodbc
from database import *
    
# set up logging
logging.getLogger().setLevel(logging.INFO)

console = logging.StreamHandler()
console.setFormatter(logging.Formatter('%(asctime)s %(name)-12s %(levelname)s %(message)s'))
console.setLevel(logging.INFO)
logging.getLogger().addHandler(console)

logger = logging.getLogger("testapp")
def connect():
    return pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)

def purgeStoreData(conn, div_id, purge_days, lookback_days, store_start, store_end):
    store_list = get_store_list(conn, div_id, store_start, store_end)
    cursor = conn.cursor()
    for store in store_list:
       logger.info("Store %s ...", store)
       cursor.execute("some query")

if __name__ == "__main__":
    try:
        conn = connect()
        purgeStoreData(conn, DIV_ID, PURGE_DAYS, LOOKBACK_DAYS, STORE_START, STORE_END)
        logger.info("*** Completed succesfully")
    finally:
        conn.close()

Is there a way to display the network latency ?

SolveForum.com may not be responsible for the answers or solutions given to any question asked by the users. All Answers or responses are user generated answers and we do not have proof of its validity or correctness. Please vote for the answer that helped you in order to help others find out which is the most helpful answer. Questions labeled as solved may be solved or may not be solved depending on the type of question and the date posted for some posts may be scheduled to be deleted periodically. Do not hesitate to share your thoughts here to help others.

  • user8400863
  • A moment ago
  • Technology Forum
  • Replies: 0

user8400863 Asks: Large Sensor Data best option. Tables SQL Vs Azure Table [closed]
I would like some advice on the best option, as I am having slow retrieval( over 30 seconds) of data on my web-based API.

I have multiple IoT Sensors(over 100), which will also be growing that TX updated sensor data to my IoT Hub, which then gets saved onto database or storage.

Previously I used to save all my sensor data to a SQL table, but as data grew, I found it was getting very slow, therefore I used Azure Table Storage. Each Sensor has got its own individual table storage, the partition key is the month and year(e.g. 202012), and the row key is a timestamp(e.g. 0002518033824243332546).

This proved to be much faster, as the amount of sensor data had reduced, as each sensor has its own table, but as the table grows for any particular sensor and I need to retrieve data across a longer period of time(1 month) this becomes very slow again. Each sensor TX's an update every 1 min, therefore each day produces 1400 records and 1 month would have about 44,640 records.

Is there any better solution for my requirement?

Would having an individual SQL table for each sensor be a good idea? How many tables can there be in SQL storage?

Thank You

SolveForum.com may not be responsible for the answers or solutions given to any question asked by the users. All Answers or responses are user generated answers and we do not have proof of its validity or correctness. Please vote for the answer that helped you in order to help others find out which is the most helpful answer. Questions labeled as solved may be solved or may not be solved depending on the type of question and the date posted for some posts may be scheduled to be deleted periodically. Do not hesitate to share your thoughts here to help others.

  • Nilesh Patel
  • A moment ago
  • Technology Forum
  • Replies: 0

Nilesh Patel Asks: Error while doing assessment using Data migration assistant (DMA) tool
Getting error while doing assessment using DMA tool for choosing assessment type as Integration Services.

Error message :

The assessment of database 'Package store' on server 'STP637388' failed.

The error message: 'Failed to assess SSIS packages or projects: 'Could not load file or assembly 'Microsoft.SqlServer.ManagedDTS, Version=11.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91' or one of its dependencies. The system cannot find the file specified.'.'.

SolveForum.com may not be responsible for the answers or solutions given to any question asked by the users. All Answers or responses are user generated answers and we do not have proof of its validity or correctness. Please vote for the answer that helped you in order to help others find out which is the most helpful answer. Questions labeled as solved may be solved or may not be solved depending on the type of question and the date posted for some posts may be scheduled to be deleted periodically. Do not hesitate to share your thoughts here to help others.

  • Danielle
  • A moment ago
  • Technology Forum
  • Replies: 0

Danielle Asks: Error importing database to Azure SQL Database
I've created a backup of my local database through "Export Data Tier Application" and I saved the file at Azure Blob.

At Azure Portal, I choose my SQL Server and import a new database. I select the backup from the Blob, and wait a long time for the DB creation. It stucks at 1% all the time.

After 40 minutes, I get this message every single time I try to create the database:

The ImportExport operation with Request Id 'f6743e06-592d-4531-b319-4297b345f744e' failed due to 'Could not import package. Warning SQL0: A project which specifies SQL Server 2019 or Azure SQL Database Managed Instance as the target platform may experience compatibility issues with Microsoft Azure SQL Database v12. Warning SQL72012: The object [data_0] exists in the target, but it will not be dropped even though you selected the 'Generate drop statements for objects that are in the target database but that are not in the source' check box. Warning SQL72012: The object [log] exists in the target, but '.

This is very frustrating, its just a database with tables (with no data) that only weights 25 megs. Im following every single tutorial to make this work, every single step, and I always get that error, no matter which database name I choose.

Any help will be appreciated.

Thanks.

SolveForum.com may not be responsible for the answers or solutions given to any question asked by the users. All Answers or responses are user generated answers and we do not have proof of its validity or correctness. Please vote for the answer that helped you in order to help others find out which is the most helpful answer. Questions labeled as solved may be solved or may not be solved depending on the type of question and the date posted for some posts may be scheduled to be deleted periodically. Do not hesitate to share your thoughts here to help others.