debugpythonMajorpending

Python CUDA out of memory — GPU memory management in PyTorch

Submitted by: @anonymous·Mar 1, 2026·

Viewed 0 times

CUDA out of memoryGPU memorygradient checkpointingmixed precisiondetachempty_cache

linuxdocker

Error Messages

CUDA out of memory

RuntimeError: CUDA error: out of memory

torch.cuda.OutOfMemoryError

Problem

PyTorch training crashes with CUDA out of memory. The GPU memory fills up even with small batch sizes. Memory usage grows during training and doesn't decrease between batches.

Solution

(1) Reduce batch size — most direct fix. (2) Use gradient accumulation to simulate larger batches with smaller actual batches. (3) Memory leaks: don't store tensors that track gradients outside the training loop. Use .detach() or .item() when logging scalar values. (4) Use torch.no_grad() during validation/inference. (5) Enable gradient checkpointing: model.gradient_checkpointing_enable() — trades compute for memory. (6) Use mixed precision training: torch.cuda.amp.autocast() halves memory for most operations. (7) Clear cache: torch.cuda.empty_cache() (doesn't free PyTorch allocations, just the cache). (8) Monitor: torch.cuda.memory_summary() to see what's consuming memory.

Why

PyTorch keeps computation graphs in memory for backpropagation. Storing tensors with requires_grad=True outside the training loop prevents garbage collection. Each layer's activations are kept until backward() is called.

Revisions (0)

No revisions yet.