89 lines
2.9 KiB
Markdown
89 lines
2.9 KiB
Markdown
# Kronos Finetuning on Your Custom csv Dataset
|
|
|
|
Supports fine-tuning training with custom CSV data using configuration files
|
|
|
|
## 1. Prepare Your Data
|
|
|
|
**Data Format**: Ensure CSV file contains the following columns: `timestamps`, `open`, `high`, `low`, `close`, `volume`, `amount`
|
|
|
|
A good csv data should be like:
|
|
|
|
| timestamps | open | close | high | low | volume | amount |
|
|
|------------|------|-------|------|-----|--------|--------|
|
|
| 2019/11/26 9:35 | 182.45215 | 184.45215 | 184.95215 | 182.45215 | 15136000 | 0 |
|
|
| 2019/11/26 9:40 | 184.35215 | 183.85215 | 184.55215 | 183.45215 | 4433300 | 0 |
|
|
| ... | ... | ... | ... | ... | ... | ... |
|
|
| ... | ... | ... | ... | ... | ... | ... |
|
|
|
|
You can check "data/HK_ali_09988_kline_5min_all.csv" to find out the proper format.
|
|
|
|
## 2. Training
|
|
|
|
### Configuration Setup
|
|
|
|
First edit the `config.yaml` file to set the correct paths and parameters:
|
|
|
|
```yaml
|
|
# Data configuration
|
|
data:
|
|
data_path: "/path/to/your/data.csv"
|
|
lookback_window: 512
|
|
predict_window: 48
|
|
# ... other parameters
|
|
|
|
# Model path configuration
|
|
model_paths:
|
|
pretrained_tokenizer: "/path/to/pretrained/tokenizer"
|
|
pretrained_predictor: "/path/to/pretrained/predictor"
|
|
base_save_path: "/path/to/save/models"
|
|
# ... other paths
|
|
```
|
|
|
|
### Run Training
|
|
|
|
Using train_sequential
|
|
|
|
```bash
|
|
# Complete training
|
|
python train_sequential.py --config configs/config_ali09988_candle-5min.yaml
|
|
|
|
# Skip existing models
|
|
python train_sequential.py --config configs/config_ali09988_candle-5min.yaml --skip-existing
|
|
|
|
# Only train tokenizer
|
|
python train_sequential.py --config configs/config_ali09988_candle-5min.yaml --skip-basemodel
|
|
|
|
# Only train basemodel
|
|
python train_sequential.py --config configs/config_ali09988_candle-5min.yaml --skip-tokenizer
|
|
```
|
|
|
|
Run each stage separately
|
|
|
|
```bash
|
|
# Only train tokenizer
|
|
python finetune_tokenizer.py --config configs/config_ali09988_candle-5min.yaml
|
|
|
|
# Only train basemodel (requires fine-tuned tokenizer first)
|
|
python finetune_base_model.py --config configs/config_ali09988_candle-5min.yaml
|
|
```
|
|
|
|
DDP Training
|
|
```bash
|
|
# Choose communication protocol yourself, nccl can be replaced with gloo
|
|
DIST_BACKEND=nccl \
|
|
torchrun --standalone --nproc_per_node=8 train_sequential.py --config configs/config_ali09988_candle-5min.yaml
|
|
```
|
|
## 2. Training Results
|
|
|
|

|
|
|
|

|
|
|
|

|
|
|
|

|
|
|
|

|
|
|
|
|