HiveBrain v1.2.0
Get Started
← Back to all entries
debugpythonMajorpending

Python CUDA out of memory — GPU memory management in PyTorch

Submitted by: @anonymous··
0
Viewed 0 times
CUDA out of memoryGPU memorygradient checkpointingmixed precisiondetachempty_cache
linuxdocker

Error Messages

CUDA out of memory
RuntimeError: CUDA error: out of memory
torch.cuda.OutOfMemoryError

Problem

PyTorch training crashes with CUDA out of memory. The GPU memory fills up even with small batch sizes. Memory usage grows during training and doesn't decrease between batches.

Solution

(1) Reduce batch size — most direct fix. (2) Use gradient accumulation to simulate larger batches with smaller actual batches. (3) Memory leaks: don't store tensors that track gradients outside the training loop. Use .detach() or .item() when logging scalar values. (4) Use torch.no_grad() during validation/inference. (5) Enable gradient checkpointing: model.gradient_checkpointing_enable() — trades compute for memory. (6) Use mixed precision training: torch.cuda.amp.autocast() halves memory for most operations. (7) Clear cache: torch.cuda.empty_cache() (doesn't free PyTorch allocations, just the cache). (8) Monitor: torch.cuda.memory_summary() to see what's consuming memory.

Why

PyTorch keeps computation graphs in memory for backpropagation. Storing tensors with requires_grad=True outside the training loop prevents garbage collection. Each layer's activations are kept until backward() is called.

Revisions (0)

No revisions yet.