まずはダウンロードから行います。
https://github.com/karpathy/llama2.c
https://github.com/karpathy/llama2.c.git
cd llama2.c
トレーニング用に準備を行います。
pip install datasets --break-system-packages
pip install sentencepiece --break-system-packages
python tinystories.py download
python tinystories.py pretokenize
トレーニングを実行します。
python train.py
train.pyのI/O
のコメントのところを調整する。
デフォルトはつぎのようになっています。
# -----------------------------------------------------------------------------
# I/O
out_dir = "out"
eval_interval = 2000
log_interval = 1
eval_iters = 100
eval_only = False # if True, script exits right after the first eval
always_save_checkpoint = False # if True, always save a checkpoint after each eval
init_from = "scratch" # 'scratch' or 'resume'
# wandb logging
wandb_log = False # disabled by default
wandb_project = "llamac"
wandb_run_name = "run" + datetime.now().strftime("%Y_%m_%d_%H_%M_%S")
# data
batch_size = 128 # if gradient_accumulation_steps > 1, this is the micro-batch size
max_seq_len = 256
vocab_source = "llama2" # llama2|custom; use Lllama 2 vocab from Meta, or custom trained
vocab_size = 32000 # the Llama 2 tokenizer has 32K tokens
# model
dim = 288
n_layers = 6
n_heads = 6
n_kv_heads = 6
multiple_of = 32
dropout = 0.0
# adamw optimizer
gradient_accumulation_steps = 4 # used to simulate larger batch sizes
learning_rate = 5e-4 # max learning rate
max_iters = 100000 # total number of training iterations
weight_decay = 1e-1
beta1 = 0.9
beta2 = 0.95
grad_clip = 1.0 # clip gradients at this value, or disable if == 0.0
# learning rate decay settings
decay_lr = True # whether to decay the learning rate
warmup_iters = 1000 # how many steps to warm up for
# system
device = "cuda" # examples: 'cpu', 'cuda', 'cuda:0', 'cuda:1' etc., or try 'mps' on macbooks
dtype = "bfloat16" # float32|bfloat16|float16
compile = True # use PyTorch 2.0 to compile the model to be faster
GPUは無いものとしてつぎのように設定しました。
パラメータ | デフォルト | 設定値 |
---|---|---|
batch_size | 128 | 8 |
compile | True | False |
device | cuda | cpu |
drop_out | 0.0 | 0.1 |
eval_interval | 2000 | 20 |
learning_rate | 5e-4 | 4e-4 |
max_iters | 100000 | 1000 |
warmup_iters | 1000 | 10 |