GPU Challenges — Test Your CUDA & Python GPU Skills

GPU Challenges

Solve GPU programming challenges in CUDA C++ or Python (PyTorch).

Challenge Details

Vector Addition

easy

Write the classic CUDA warm-up kernel: each thread computes one element of the output vector and guards against running past the end of the array.

Your Goal
  • Compute the global thread index with blockIdx.x, blockDim.x, and threadIdx.x.
  • Only write to c[i] when the computed index is inside the vector length.
  • Keep the host launch configuration aligned with the problem size N = 1024.
Focus Areas
  • 1D grid and block indexing
  • Boundary checks in simple kernels
  • Host-to-device and device-to-host memory flow
What Success Looks Like
  • The first few printed results should equal a[i] + b[i].
  • No out-of-bounds writes should happen for the last block.
  • The kernel should work for any n, not only exact multiples of 256.
vector-add.cuPractice Mode
Terminal Output
Select a challenge and write your solution, then run it.
Need more credits? Upgrade your plan →