GPU Challenges — Test Your CUDA & Python GPU Skills

GPU Challenges

Solve GPU programming challenges in CUDA C++ or Python (PyTorch).

Challenge Details

Matrix Multiplication

medium

Map a 2D CUDA launch onto matrix rows and columns so each thread computes exactly one output element of C.

Your Goal
  • Derive row and column coordinates from blockIdx, blockDim, and threadIdx.
  • Accumulate the dot product for one output cell across k = 0..N-1.
  • Write the result into the correct flattened row-major location in C.
Focus Areas
  • 2D block geometry
  • Row-major indexing math
  • Nested accumulation inside a kernel
What Success Looks Like
  • For the provided input, each output element should equal N * 2.0f.
  • Threads outside the matrix bounds should do no work.
  • The output buffer should be filled consistently across different rows and columns.
matrix-multiply.cuPractice Mode
Terminal Output
Select a challenge and write your solution, then run it.
Need more credits? Upgrade your plan →