How to Install ONNX Runtime GPU on NVIDIA DGX Spark

A complete, working guide for building and installing onnxruntime-gpu on NVIDIA DGX Spark with CUDA 13 and Grace–Blackwell architecture.

Background

As I recently wrote, I created a Handball Computer Vision system. For now, it was always running on Google Colab. But since always downloading and installing all the packages is super annonying and my dad got a DGX Spark a few days ago, I decided to try to run my notebook on that machine. But since it is an ARM64 system, some dependencies make some trouble. Specifically roboflow inference-gpu, since it needs onnxruntime-gpu. And that package currently has no prebuild wheels for DGX Spark. So here is a simple guide to install onnxruntime-gpu properly for DGX Spark.

How to Install ONNX Runtime GPU on NVIDIA DGX Spark

The NVIDIA DGX Spark is one of the first systems shipping with the new Grace–Blackwell architecture (GB100 / GB200) and CUDA 13. Unfortunately, as of early 2025:

  • ONNX Runtime does not provide GPU wheels for ARM64
  • There are no prebuilt wheels for CUDA 13
  • The ONNX docs do not cover GB200’s compute capability (12.1)
  • Installing ONNX Runtime GPU with pip install onnxruntime-gpu fails with
    No matching distribution found

But the GPU hardware works perfectly — we just need to build ONNX Runtime manually. This guide shows exactly how to do that.


Why you need this guide

If you try to install ONNX Runtime GPU on DGX Spark, you’ll encounter:

ERROR: No matching distribution found for onnxruntime-gpu

or:

OSError: CUDA_HOME environment variable is not set

or:

Unsupported compute capability sm_121

This is normal: ONNX Runtime simply hasn’t shipped wheels for ARM64 + CUDA 13 + Compute Capability 12.1 yet. But a working GPU build is absolutely possible — you just need the right flags.


1. Create a Clean Conda Environment

conda create -n ort-gpu python=3.11 -y
conda activate ort-gpu

Install build tools:

pip install cmake ninja packaging numpy

Install cuDNN for CUDA 13:

sudo apt install -y cudnn9-cuda-13

2. Clone ONNX Runtime

git clone --recursive https://github.com/microsoft/onnxruntime.git
cd onnxruntime

3. Set CUDA Environment Variables (Permanent)

DGX Spark installs CUDA at:

/usr/local/cuda-13.0

We persist these paths via conda activation scripts:

mkdir -p \$CONDA_PREFIX/etc/conda/activate.d
mkdir -p \$CONDA_PREFIX/etc/conda/deactivate.d
cat > \$CONDA_PREFIX/etc/conda/activate.d/env_vars.sh << 'EOF'
export CUDA_HOME=/usr/local/cuda-13.0
export PATH=\$CUDA_HOME/bin:\$PATH
export LD_LIBRARY_PATH=\$CUDA_HOME/lib64:\$LD_LIBRARY_PATH
EOF
cat > \$CONDA_PREFIX/etc/conda/deactivate.d/env_vars.sh << 'EOF'
unset CUDA_HOME
EOF

Reload the environment:

conda deactivate
conda activate ort-gpu

Verify:

echo \$CUDA_HOME
which nvcc

Expected:

/usr/local/cuda-13.0
/usr/local/cuda-13.0/bin/nvcc

4. Build ONNX Runtime GPU for CUDA 13 (Grace–Blackwell)

⚠️ Important: Grace–Blackwell GPUs use Compute Capability 12.1, which ONNX Runtime does not detect automatically. We must explicitly set:

CMAKE_CUDA_ARCHITECTURES=121

Full build command

sh build.sh \
  --config Release \
  --build_dir build/cuda13 \
  --parallel 20 \
  --nvcc_threads 20 \
  --use_cuda \
  --cuda_version 13.0 \
  --cuda_home /usr/local/cuda-13.0 \
  --cudnn_home /usr/local/cuda-13.0 \
  --build_wheel \
  --skip_tests \
  --cmake_generator Ninja \
  --use_binskim_compliant_compile_flags \
  --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=121 onnxruntime_BUILD_UNIT_TESTS=OFF

This takes 20–40 minutes depending on load.

5. Install the Wheel

pip install build/cuda13/Release/dist/*.whl

6. Verify ONNX Runtime GPU

import onnxruntime as ort
print("Version:", ort.__version__)
print("Providers:", ort.get_available_providers())

Expected:

Version: 1.xx.x
Providers: ['CUDAExecutionProvider', 'CPUExecutionProvider']

If you see CUDAExecutionProvider, your GPU build works.

7. Example Usage

import onnxruntime as ort
session = ort.InferenceSession(
    "model.onnx",
    providers=["CUDAExecutionProvider"]
)
result = session.run(None, {"input": my_tensor})

Troubleshooting

ErrorSolution
❌ No matching distribution found for onnxruntime-gpuONNX Runtime provides no ARM64 GPU wheels. Solution: always build from source.
❌ CUDA_HOME is not setYour environment is missing CUDA paths. Fix: set persistent env vars (section 3).
❌ unsupported gpu architecture sm_121Grace–Blackwell = Compute Capability 12.1. Fix: set CMAKE_CUDA_ARCHITECTURES=121.
❌ nvcc not foundCheck: /usr/local/cuda-13.0/bin/nvcc