Model Size
Optimizer
AdamW (12 bytes/p)
8-bit Adam (6 bytes/p)
SGD+momentum (8 bytes/p)
Precision
bf16 (2 bytes/p)
fp32 (4 bytes/p)
QLoRA 4-bit (0.5 bytes/p)