Pengxiao Song e027051b38
fix: add torch.cuda.empty_cache() during autoregressive inference
Without releasing cached GPU memory, usage will keep growing during autoregressive prediction, leading to significant memory increase or OOM. Calling torch.cuda.empty_cache() prevents this accumulation.
2025-09-02 10:26:27 +08:00
..
2025-07-01 10:57:41 +08:00
2025-07-01 10:57:41 +08:00