Background
As I recently wrote, I created a Handball Computer Vision system. For now, it was always running on Google Colab. But since always downloading and installing all the packages is super annonying and my dad got a DGX Spark a few days ago, I decided to try to run my notebook on that machine. But since it is an ARM64 system, some dependencies make some trouble. Specifically roboflow inference-gpu, since it needs onnxruntime-gpu. And that package currently has no prebuild wheels for DGX Spark. So here is a simple guide to install onnxruntime-gpu properly for DGX Spark.
How to Install ONNX Runtime GPU on NVIDIA DGX Spark
The NVIDIA DGX Spark is one of the first systems shipping with the new Grace–Blackwell architecture (GB100 / GB200) and CUDA 13. Unfortunately, as of early 2025:
- ONNX Runtime does not provide GPU wheels for ARM64
- There are no prebuilt wheels for CUDA 13
- The ONNX docs do not cover GB200’s compute capability (12.1)
- Installing ONNX Runtime GPU with
pip install onnxruntime-gpufails withNo matching distribution found
But the GPU hardware works perfectly — we just need to build ONNX Runtime manually. This guide shows exactly how to do that.
Why you need this guide
If you try to install ONNX Runtime GPU on DGX Spark, you’ll encounter:
ERROR: No matching distribution found for onnxruntime-gpu
or:
OSError: CUDA_HOME environment variable is not set
or:
Unsupported compute capability sm_121
This is normal: ONNX Runtime simply hasn’t shipped wheels for ARM64 + CUDA 13 + Compute Capability 12.1 yet. But a working GPU build is absolutely possible — you just need the right flags.
1. Create a Clean Conda Environment
conda create -n ort-gpu python=3.11 -y
conda activate ort-gpu
Install build tools:
pip install cmake ninja packaging numpy
Install cuDNN for CUDA 13:
sudo apt install -y cudnn9-cuda-13
⸻
2. Clone ONNX Runtime
git clone --recursive https://github.com/microsoft/onnxruntime.git
cd onnxruntime
⸻
3. Set CUDA Environment Variables (Permanent)
DGX Spark installs CUDA at:
/usr/local/cuda-13.0
We persist these paths via conda activation scripts:
mkdir -p \$CONDA_PREFIX/etc/conda/activate.d
mkdir -p \$CONDA_PREFIX/etc/conda/deactivate.d
cat > \$CONDA_PREFIX/etc/conda/activate.d/env_vars.sh << 'EOF'
export CUDA_HOME=/usr/local/cuda-13.0
export PATH=\$CUDA_HOME/bin:\$PATH
export LD_LIBRARY_PATH=\$CUDA_HOME/lib64:\$LD_LIBRARY_PATH
EOF
cat > \$CONDA_PREFIX/etc/conda/deactivate.d/env_vars.sh << 'EOF'
unset CUDA_HOME
EOF
Reload the environment:
conda deactivate
conda activate ort-gpu
Verify:
echo \$CUDA_HOME
which nvcc
Expected:
/usr/local/cuda-13.0
/usr/local/cuda-13.0/bin/nvcc
⸻
4. Build ONNX Runtime GPU for CUDA 13 (Grace–Blackwell)
⚠️ Important: Grace–Blackwell GPUs use Compute Capability 12.1, which ONNX Runtime does not detect automatically. We must explicitly set:
CMAKE_CUDA_ARCHITECTURES=121
Full build command
sh build.sh \
--config Release \
--build_dir build/cuda13 \
--parallel 20 \
--nvcc_threads 20 \
--use_cuda \
--cuda_version 13.0 \
--cuda_home /usr/local/cuda-13.0 \
--cudnn_home /usr/local/cuda-13.0 \
--build_wheel \
--skip_tests \
--cmake_generator Ninja \
--use_binskim_compliant_compile_flags \
--cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=121 onnxruntime_BUILD_UNIT_TESTS=OFF
This takes 20–40 minutes depending on load.
⸻
5. Install the Wheel
pip install build/cuda13/Release/dist/*.whl
⸻
6. Verify ONNX Runtime GPU
import onnxruntime as ort
print("Version:", ort.__version__)
print("Providers:", ort.get_available_providers())
Expected:
Version: 1.xx.x
Providers: ['CUDAExecutionProvider', 'CPUExecutionProvider']
If you see CUDAExecutionProvider, your GPU build works.
⸻
7. Example Usage
import onnxruntime as ort
session = ort.InferenceSession(
"model.onnx",
providers=["CUDAExecutionProvider"]
)
result = session.run(None, {"input": my_tensor})
⸻
Troubleshooting
| Error | Solution |
|---|---|
| ❌ No matching distribution found for onnxruntime-gpu | ONNX Runtime provides no ARM64 GPU wheels. Solution: always build from source. |
| ❌ CUDA_HOME is not set | Your environment is missing CUDA paths. Fix: set persistent env vars (section 3). |
| ❌ unsupported gpu architecture sm_121 | Grace–Blackwell = Compute Capability 12.1. Fix: set CMAKE_CUDA_ARCHITECTURES=121. |
| ❌ nvcc not found | Check: /usr/local/cuda-13.0/bin/nvcc |