Today we release v0.4.0 of torchtune with some exciting new additions! Some notable ones include full support for activation offloading, recipes for Llama3.2V 90B and QLoRA variants, new documentation, and Qwen2.5 models!
Activation offloading (#1443, #1645, #1847)Activation offloading is a memory-saving technique that asynchronously moves checkpointed activations that are not currently running to the CPU. Right before the GPU needs the activations for the microbatch’s backward pass, this functionality prefetches the offloaded activations back from the CPU. Enabling this functionality is as easy as setting the following options in your config:
enable_activation_checkpointing: True enable_activation_offloading: True
In experiments with Llama3 8B, activation offloading used roughly 24% less memory while inflicting a performance slowdown of under 1%.
Llama3.2V 90B with QLoRA (#1880, #1726)We added model builders and configs for the 90B version of Llama3.2V, which outperforms the 11B version of the model across common benchmarks. Because this model size is larger, we also added the ability to run the model using QLoRA and FSDP2.
# Download the model first tune download meta-llama/Llama-3.2-90B-Vision-Instruct --ignore-patterns "original/consolidated*" # Run with e.g. 4 GPUs tune run --nproc_per_node 4 lora_finetune_distributed --config llama3_2_vision/90B_qloraQwen2.5 model family has landed (#1863)
We added builders for Qwen2.5, the cutting-edge models from the Qwen family of models! In their own words "Compared to Qwen2, Qwen2.5 has acquired significantly more knowledge (MMLU: 85+) and has greatly improved capabilities in coding (HumanEval 85+) and mathematics (MATH 80+)."
Get started with the models easily:
tune download Qwen/Qwen2.5-1.5B-Instruct --ignore-patterns None tune run lora_finetune_single_device --config qwen2_5/1.5B_lora_single_deviceNew documentation on using custom recipes, configs, and components (#1910)
We heard your feedback and wrote up a simple page on how to customize configs, recipes, and individual components! Check it out here
What's Changedmax_seq_length
in eval recipe by @SalmanMohammadi in #1773max_seq_length
to vision eval config by @SalmanMohammadi in #1802TiedEmbeddingTransformerDecoder
by @SalmanMohammadi in #1815vqa_dataset
, update docs by @krammnic in #1820torchao
check for TensorCoreTiledLayout
by @joecummings in #1886GemmaTransformerDecoder
by @SalmanMohammadi in #1892Full Changelog: v0.3.1...v0.4.0
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4