3.1 KiB
3.1 KiB
Kronos Finetuning on Your Custom csv Dataset
Supports fine-tuning training with custom CSV data using configuration files
Quick Start
1. Configuration Setup
First edit the config.yaml file to set the correct paths and parameters:
# Data configuration
data:
data_path: "/path/to/your/data.csv"
lookback_window: 512
predict_window: 48
# ... other parameters
# Model path configuration
model_paths:
pretrained_tokenizer: "/path/to/pretrained/tokenizer"
pretrained_predictor: "/path/to/pretrained/predictor"
base_save_path: "/path/to/save/models"
# ... other paths
2. Run Training
Using train_sequential
# Complete training
python train_sequential.py --config configs/config_ali09988_candle-5min.yaml
# Skip existing models
python train_sequential.py --config configs/config_ali09988_candle-5min.yaml --skip-existing
# Only train tokenizer
python train_sequential.py --config configs/config_ali09988_candle-5min.yaml --skip-basemodel
# Only train basemodel
python train_sequential.py --config configs/config_ali09988_candle-5min.yaml --skip-tokenizer
Run each stage separately
# Only train tokenizer
python finetune_tokenizer.py --config configs/config_ali09988_candle-5min.yaml
# Only train basemodel (requires fine-tuned tokenizer first)
python finetune_base_model.py --config configs/config_ali09988_candle-5min.yaml
DDP Training
# Choose communication protocol yourself, nccl can be replaced with gloo
DIST_BACKEND=nccl \
torchrun --standalone --nproc_per_node=8 train_sequential.py --config configs/config_ali09988_candle-5min.yaml
Configuration Description
Main Configuration Items
-
data: Data-related configuration
data_path: CSV data file pathlookback_window: Lookback window sizepredict_window: Prediction window sizetrain_ratio/val_ratio/test_ratio: Dataset split ratios
-
training: Training-related configuration
epochs: Number of training epochsbatch_size: Batch sizetokenizer_learning_rate: Tokenizer learning ratepredictor_learning_rate: Predictor learning rate
-
model_paths: Model path configuration
pretrained_tokenizer: Pre-trained tokenizer pathpretrained_predictor: Pre-trained predictor pathbase_save_path: Model save root directoryfinetuned_tokenizer: Fine-tuned tokenizer path (for basemodel training)
-
experiment: Experiment control
train_tokenizer: Whether to train tokenizertrain_basemodel: Whether to train basemodelskip_existing: Whether to skip existing models
Training Process
-
Tokenizer Fine-tuning Stage
- Load pre-trained tokenizer
- Fine-tune on custom data
- Save fine-tuned tokenizer to
{base_save_path}/tokenizer/best_model/
-
Basemodel Fine-tuning Stage
- Load fine-tuned tokenizer and pre-trained predictor
- Fine-tune on custom data
- Save fine-tuned basemodel to
{base_save_path}/basemodel/best_model/
Data Format: Ensure CSV file contains the following columns: timestamps, open, high, low, close, volume, amount