Gradient Distortion Under BF16 Quantization

Cosine similarity between clean and corrupted gradients drops from 0.99 to ~0.55 once PPO clipping is included. The noise fraction (relative L2 error) exceeds the signal by step 10.

Cosine Similarity (higher = more aligned)
Relative L2 Error (lower = less noise)