3 Commits

Author SHA1 Message Date
Pengxiao Song
e027051b38
fix: add torch.cuda.empty_cache() during autoregressive inference
Without releasing cached GPU memory, usage will keep growing during autoregressive prediction, leading to significant memory increase or OOM. Calling torch.cuda.empty_cache() prevents this accumulation.
2025-09-02 10:26:27 +08:00
quant
38a643b761 update kronos model code 2025-09-01 21:22:27 +08:00
shiyu-coder
9f946dec6b initial 2025-07-01 10:57:41 +08:00