Setup Fedora 43 on NVIDIA Nodes
Quick reference for Fedora 43 Server installs on lab nodes (NVIDIA GPU + Mellanox NIC).
0) Baseline OS prep
On a fresh Fedora 43 install:
- Update packages.
- Create users + add SSH keys.
- Install kernel headers/devel for DKMS.
sudo dnf -y upgrade
sudo dnf -y install kernel-devel-matched kernel-headers
1) NVIDIA drivers (Fedora 43 use Fedora 42 CUDA repo)
Fedora 43 doesn’t have official NVIDIA repos yet; Fedora 42 works.
Add the repo:
sudo dnf config-manager addrepo --from-repofile=https://developer.download.nvidia.com/compute/cuda/repos/fedora42/x86_64/cuda-fedora42.repo
Install open kernel modules (all our lab hardware supports this):
Compute-only nodes (servers)
sudo dnf -y install nvidia-driver-cuda kmod-nvidia-open-dkms
Workstations (GUI)
sudo dnf -y install nvidia-open
2) Mellanox NICs via DOCA (OFED/ROCE only)
Use the RHEL/Rocky 10 DOCA repo. On Fedora 43, doca-all / doca-networking depend on Python
3.12 and won’t work; doca-ofed and doca-roce work fine (and are all we need).
Create the repo file:
sudo tee /etc/yum.repos.d/doca.repo >/dev/null <<'EOF'
[doca]
name=DOCA Online Repo
baseurl=https://linux.mellanox.com/public/repo/doca/3.2.1/rhel10/x86_64/
enabled=1
gpgcheck=0
EOF
Install:
sudo dnf clean all
sudo dnf -y install doca-ofed
Reboot and verify:
sudo reboot
After reboot:
nvidia-smishould see the GPU- Mellanox tooling should see the NIC
nvidia-smi
sudo mst start
sudo mst status
3) NVIDIA Container Toolkit (Podman + Docker)
Some tools require Docker, so install it:
sudo dnf -y install docker docker-compose
sudo systemctl enable --now docker
sudo usermod -aG docker $USER
Add NVIDIA Container Toolkit repo:
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
Install toolkit:
sudo dnf -y install nvidia-container-toolkit
Podman: CDI config + test
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
podman run --rm \
--device nvidia.com/gpu=all \
--security-opt=label=disable \
docker.io/nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi
Docker: configure runtime + test
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
sudo docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi
4) Experiment network (100GbE + 100Gb IB between two nodes)
4.1 Identify Mellanox device + set link types
sudo mst start
sudo mst status
Set port 1 = Ethernet and port 2 = InfiniBand (use the correct device from mst status):
sudo mlxconfig -d /dev/mst/mt4123_pciconf0 set LINK_TYPE_P1=2 LINK_TYPE_P2=1
Reboot to apply:
sudo reboot
4.2 Bring up InfiniBand (Subnet Manager)
Pick one node to run the subnet manager:
sudo systemctl enable --now opensmd
sudo systemctl status opensmd --no-pager
Verify IB link status:
ibstat
sudo iblinkinfo
Expected:
State: ActivePhysical state: LinkUp- On ConnectX-6 dual 100G: typically negotiates
4x 25Gbps
4.3 (Optional) IPoIB sanity test
# Host 1
sudo ip link set ibs93f1 up
sudo ip addr add 10.10.10.1/24 dev ibs93f1
# Host 2
sudo ip link set ibs93f1 up
sudo ip addr add 10.10.10.2/24 dev ibs93f1
Ping:
ping -c 3 10.10.10.2 # from node 1
ping -c 3 10.10.10.1 # from node 2 created: Feb 5, 2026
edited: Feb 10, 2026
Topic: Useful Snippets