GPU Challenges — Test Your CUDA & Python GPU Skills

GPU Challenges

Solve GPU programming challenges in CUDA C++ or Python (PyTorch).

Challenge Details

Parallel Sum Reduction

hard

Build a block-level reduction in shared memory, then combine the partial sums on the host to recover the full array sum.

Your Goal
  • Load one input value per thread into shared memory.
  • Reduce values inside the block using synchronized halving steps.
  • Write one partial sum per block into the output array.
Focus Areas
  • Shared memory staging
  • Synchronization with __syncthreads()
  • Tree reduction patterns
What Success Looks Like
  • The final host-side accumulated value should match the expected sum.
  • Shared memory reads and writes should stay inside the allocated block buffer.
  • Every reduction step should synchronize before the next one starts.
parallel-reduction.cuPractice Mode
Terminal Output
Select a challenge and write your solution, then run it.
Need more credits? Upgrade your plan →