GPU Challenges — Test Your CUDA & Python GPU Skills

GPU Challenges

Solve GPU programming challenges in CUDA C++ or Python (PyTorch).

Challenge Details

Parallel Histogram Computation

medium

Build a 256-bin histogram in parallel, using atomic updates so many threads can safely increment the same bin.

Your Goal
  • Read one byte value per thread and map it to the matching histogram bin.
  • Use atomicAdd to avoid race conditions on shared output bins.
  • Keep the kernel bounds-safe for DATA_SIZE inputs.
Focus Areas
  • Atomics for correctness
  • High-contention write patterns
  • Comparing GPU results against a CPU reference
What Success Looks Like
  • The GPU histogram should match the CPU histogram for the checked bins.
  • The histogram buffer must be zero-initialized before launching the kernel.
  • Each thread should process at most one valid input element in this starter version.
histogram.cuPractice Mode
Terminal Output
Select a challenge and write your solution, then run it.
Need more credits? Upgrade your plan →