RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/OptimalScale/LMFlow below:

OptimalScale/LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

An extensible, convenient, and efficient toolbox for finetuning large machine learning models, designed to be user-friendly, speedy and reliable, and accessible to the entire community.

Important

❗ [2025-07-09] We have a major update to LMFlow with full Accelerate support and extensive streamlining. If you're looking for the previous version, please use git checkout v0.0.10, or check out the v0.0.10 branch. View all releases here.

[2024-12-02] Support Hymba, a new family of small language models featuring a hybrid-head parallel architecture. Check out Post-training Hymba for more details.
[2024-07-01] 🏆 LMFlow receives the Best Demo Paper Award at NAACL 2024! 🎉
[2024-06-30] Expanding Optimization Options! We now support custom optimizer training with a variety of optimizers. Dive into the details and try out the new features with our updated script at custom_optimizers.
[2024-04-25] 🚀 Support conversation template! We've preset the latest Llama-3 and Phi-3 conversation templates as well as some frequently used templates such as chatml (see all templates here), and we are working on adding more preset templates. Adding corresponding --conversation_template in the shell script and you are all set! 🚀

More news...

[2024-03-27] Support LISA, enabling 7B training in 24G memory without offloading!
[2023-09-11] Support speculative decoding. Check out speculative_decoding for the usage and acceleration details.
[2023-08-14] Support long context inference with position interpolation (Linear & NTK scaling ) for LLaMA models. Check out postion_interpolation for more details.
[2023-08-07] Support Flash Attention-2. Check out flash_attention for more details.
[2023-08-02] Support Llama2, ChatGLM2, and Baichuan models.
[2023-07-23] LMFlow multimodal chatbot is now available! Support multimodal inputs of images and texts. Online Demo is also provided (We hold the service on a single GPU, hence one may experience "queuing" or "application busy" sometimes when multiple users are accessing at the same time, please wait and attempt again later when such event happens)
[2023-06-22] LMFlow paper is out! Check out our implementation details at https://arxiv.org/abs/2306.12420
[2023-06-16] Our finetuned Robin-33B-V2 scored an impressive 64.1 on the Huggingface LLM leaderboard in our offline evaluation, outperforming major open-source LLMs! All checkpoints (7B, 13B, 33B, and 65B) are released! Checkout the performance here.
[2023-06-07] LMFlow is now officially available on PyPI! Install it with pip install lmflow-finetune!
[2023-05-30] Release Robin-13B-v2 and Robin-33B-v2!
[2023-05-15] Release LMFlow-data, the training dataset of Robin-7B-v2. A new test data is also released.
[2023-05-09] Release Robin-7B-v2, achieving competitive performance on chitchat, commonsense reasoning and instruction-following tasks. Refer to our comprehensive study.
[2023-05-08] Release LMFlow Benchmark, an automatic evaluation framework for open-source chat-style LLMs. Benchmark results on 31 popular models are reported. Participate in LMFlow Benchmark.
[2023-04-21] Release Robin-7B (based on LLaMA-7B), and two models for commercial use: Parakeets-2.7B (based on GPT-NEO-2.7B) and Cokatoo-7B (based on StableLM-7B) Download here
[2023-04-15] Inference: Support streaming output and ChatGLM.
[2023-04-10] We propose a new alignment algorithm: Reward rAnked FineTuning (RAFT), which is more efficient than conventional (PPO-based) RLHF. [Paper]
[2023-04-02] Web service is online!
[2023-04-01] Release three instruction-tuned checkpoints and three medical checkpoints in model zoo: LLaMA-7B-tuned, LLaMA-13B-tuned, LLaMA-33B-tuned, LLaMA-7B-medical, LLaMA-13B-medical, and LLaMA-33B-medical.
[2023-03-27] Support full tuning and lora tuning for all decoder models.
[2023-03-27] Tasked tuned model beats ChatGPT on medical domain.
[2023-03-27] Release code and checkpoints - version 0.0.1! Our tasked-tuned model beats ChatGPT on medical domain.

LMFlow

Our package has been tested on Linux OS (Ubuntu 20.04). Other OS platforms (MacOS, Windows) are not fully tested, where you may encounter unexpected errors. If you are using LMFlow for the first time, we recommend you to try on a Linux machine or Google Colab.

git clone -b v1.0.0 https://github.com/OptimalScale/LMFlow.git
cd LMFlow
conda create -n lmflow python=3.9 -y
conda activate lmflow
conda install mpi4py
pip install -e .

Looking for a previous version?

git clone -b v0.0.10 https://github.com/OptimalScale/LMFlow.git
cd LMFlow
conda create -n lmflow python=3.9 -y
conda activate lmflow
conda install mpi4py
pip install -e .

For CUDA versions 10.3-11.7

git clone -b v0.0.5 https://github.com/OptimalScale/LMFlow.git
cd LMFlow
conda create -n lmflow python=3.9 -y
conda activate lmflow
conda install mpi4py
pip install -e .

Tip

We use WandB to track and visualize the training process by default. Before running the training scripts, users may need to log in to WandB using the command:

For detailed instructions, refer to the WandB Quickstart Guide. Step 1 (registration) and Step 2 (login using your WandB API key) should be sufficient to set up your environment.

Disabling wandb

One can disable wandb by either:

Adding environment variable before running the training command.

export WANDB_MODE=disabled

OR, specifying the integrations to report the results and logs to. In the training script, add:

Please refer to our doc.

Estimated Hardware Requirement Method 0.5B 3B 7B 14B 30B 70B xB Full bf16/fp16 9GB 55GB 120GB 240GB 600GB 1200GB 18xGB LoRA 1GB 6GB 16GB 32GB 64GB 160GB 2xGB QLoRA quant_bit=8 0.7GB 3GB 10GB 20GB 40GB 80GB xGB QLoRA quant_bit=4 0.4GB 1.5GB 6GB 12GB 24GB 48GB x/2GB

Full training updates all the parameters to finetune a language model. Here is an example to finetune a GPT-2 base model.

cd data && ./download.sh alpaca && cd -

bash ./scripts/run_finetune.sh \
  --model_name_or_path gpt2 \
  --dataset_path data/alpaca/train_conversation \
  --output_model_path output_models/finetuned_gpt2

Tip

For conversation dataset, specify a conversation template for better performance by adding --conversation_template to the command.

Llama-3-8B conversation dataset example

cd data && ./download.sh alpaca && cd -

bash ./scripts/run_finetune.sh \
 --model_name_or_path meta-llama/Meta-Llama-3-8B \
 --dataset_path data/alpaca/train_conversation \
 --conversation_template llama3 \
 --output_model_path output_models/finetuned_llama3_8b

LISA is a memory-efficient finetuning algorithm that allows tradeoff between memory and the number of randomly unfreezed layers. This script currently is only tested in single gpus. Please stay tuned for our latest updates 😄

cd data && ./download.sh alpaca && cd -

bash ./scripts/run_finetune_with_lisa.sh \
  --model_name_or_path meta-llama/Llama-2-7b-hf \
  --dataset_path data/alpaca/train_conversation \
  --output_model_path output_models/finetuned_llama2_7b \
  --lisa_activated_layers 1 \
  --lisa_interval_steps 20

Tip

Llama-2-7B conversation dataset example

cd data && ./download.sh alpaca && cd -

bash ./scripts/run_finetune_with_lisa.sh \
 --model_name_or_path meta-llama/Llama-2-7b-hf \
 --dataset_path data/alpaca/train_conversation \
 --conversation_template llama2 \
 --output_model_path output_models/finetuned_llama2_7b_lisa \
 --lisa_activated_layers 1 \
 --lisa_interval_steps 20

LoRA is a parameter-efficient finetuning algorithm and is more efficient than full finetuning.

cd data && ./download.sh alpaca && cd -

bash ./scripts/run_finetune_with_lora.sh \
  --model_name_or_path facebook/galactica-1.3b \
  --dataset_path data/alpaca/train_conversation \
  --output_lora_path output_models/finetuned_galactica_lora

Tip

Llama-2-7B conversation dataset example

cd data && ./download.sh alpaca && cd -

bash ./scripts/run_finetune_with_lora.sh \
 --model_name_or_path meta-llama/Llama-2-7b-hf \
 --dataset_path data/alpaca/train_conversation \
 --conversation_template llama2 \
 --output_model_path output_models/finetuned_llama2_7b_lora \

Merge LoRA Weight

Merge LoRA weight and the base model into one using:

bash ./scripts/run_merge_lora.sh \
 --model_name_or_path Qwen/Qwen1.5-1.8B \
 --lora_model_path output_models/lora \
 --output_model_path output_models/lora_merged \

After finetuning, you can run the following command to chat with the model.

bash ./scripts/run_chatbot.sh output_models/finetuned_gpt2

Tip

We recommend using vLLM for faster inference.

Faster inference using vLLM

bash ./scripts/run_vllm_inference.sh \
  --model_name_or_path Qwen/Qwen2-0.5B \
  --dataset_path data/alpaca/test_conversation \
  --output_dir data/inference_results \

If you want to deploy your own model locally, we provide a gradio-based UI for building chatbots. Running the following command will launch the demo for robin-7b:

pip install gradio
python ./examples/chatbot_gradio.py --deepspeed configs/ds_config_chatbot.json --model_name_or_path YOUR-LLAMA  --lora_model_path ./robin-7b --prompt_structure "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.###Human: {input_text}###Assistant:"       --end_string "#" --max_new_tokens 200

LMFlow Benchmark is an automatic evaluation framework for open-source large language models. We use negative log likelihood (NLL) as the metric to evaluate different aspects of a language model: chitchat, commonsense reasoning, and instruction following abilities.

You can directly run the LMFlow benchmark evaluation to obtain the results to participate in the LLM comparision. For example, to run GPT2 XL, one may execute

bash ./scripts/run_benchmark.sh --model_name_or_path gpt2-xl

--model_name_or_path is required, you may fill in huggingface model name or local model path here.

To check the evaluation results, you may check benchmark.log in ./output_dir/gpt2-xl_lmflow_chat_nll_eval, ./output_dir/gpt2-xl_all_nll_eval and ./output_dir/gpt2-xl_commonsense_qa_eval.

Finetune Acceleration & Memory Optimization

LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

LISA is a novel and memory-efficient training strategy for large language models that outperforms existing methods like LoRA by selectively freezing layers during optimization. Check out LISA for more details.
In LMFLow, activate LISA using --use_lisa 1 in your training command. Control the number of activation layers with --lisa_activated_layers 2, and adjust the freezing layers interval using --lisa_step_interval 20.
LoRA

LoRA is a parameter-efficient finetuning algorithm and is more efficient than full finetuning. Check out finetuning-lora for more details.
FlashAttention

LMFlow supports both FlashAttention-1 and the latest FlashAttention-2. Check out flash_attention for more details.
Gradient Checkpointing

Gradient checkpointing is a memory optimization technique that trades compute for memory. It is useful when the model is too large to fit into GPU memory. Use it by just adding --gradient_checkpointing to your training command.
Deepspeed Zero3

LMFlow supports Deepspeed Zero-3 Offload. We provide an example deepspeed config, and you can directly use it.

Inference Acceleration

LLaMA Inference on CPU

Thanks to the great efforts of llama.cpp. It is possible for everyone to run their LLaMA models on CPU by 4-bit quantization. We provide a script to convert LLaMA LoRA weights to .pt files. You only need to use convert-pth-to-ggml.py in llama.cpp to perform quantization.
FlashAttention

LMFlow supports both FlashAttention-1 and the latest FlashAttention-2. Check out flash_attention for more details.
vLLM

Try vLLM for fast and easy-to-use LLM inference and serving. Thanks for the great work!

Long Context

Position Interpolation for LLaMA Models

Now LMFlow supports the latest Linear & NTK (Neural Kernel theory) scaling techniques for LLaMA models. Check out postion_interpolation for more details.

Model Customization

Vocabulary Extension

Now you can train your own sentencepiece tokenizer and merge it with model's origin hf tokenizer. Check out vocab_extension for more details.

Multimodal

Multimodal Chatbot

LMFlow supports multimodal inputs of images and texts. Check out our LMFlow multimodal chatbot.

Custom Optimization

Custom Optimization

LMFlow now supports custom optimizer training with a variety of optimizers. Elevate your model's performance with tailored optimization strategies. Dive into the details and try out the new features with our updated script at custom_optimizers.

The following table evaluates the performance of custom optimizers in the fine-tuning process of GPT-2 on the Alpaca dataset, emphasizing their individual impacts on the training loss. The specific hyperparameter settings utilize default configurations, which can be customized and adjusted at custom_optimizers. It is important to note that the evaluations were conducted over a duration of 0.1 epochs to provide a preliminary insight into the optimizers' effectiveness.
Optimizer Name Train Loss RMSprop 2.4016 LION-32bit 2.4041 Adam 2.4292 AdamP 2.4295 AdamW 2.4469 AdaFactor 2.4543 AdaBound 2.4547 AdamWScheduleFree 2.4677 Adan 2.5063 NAdam 2.5569 AdaBelief 2.5857 AdaMax 2.5924 RAdam 2.6104 AdaDelta 2.6298 AdaGrad 2.8657 Yogi 2.9314 NovoGrad 3.1071 Sophia 3.1517 LAMB 3.2350 LARS 3.3329 SGDScheduleFree 3.3541 SGDP 3.3567 SGD 3.3734

If you need any help, please submit a Github issue.

The code included in this project is licensed under the Apache 2.0 license. If you wish to use the codes and models included in this project for commercial purposes, please sign this document to obtain authorization.

If you find this repository useful, please consider giving ⭐ and citing our paper:

@article{diao2023lmflow,
  title={Lmflow: An extensible toolkit for finetuning and inference of large foundation models},
  author={Diao, Shizhe and Pan, Rui and Dong, Hanze and Shum, Ka Shun and Zhang, Jipeng and Xiong, Wei and Zhang, Tong},
  journal={arXiv preprint arXiv:2306.12420},
  year={2023}
}

@article{dong2023raft,
  title={Raft: Reward ranked finetuning for generative foundation model alignment},
  author={Dong, Hanze and Xiong, Wei and Goyal, Deepanshu and Pan, Rui and Diao, Shizhe and Zhang, Jipeng and Shum, Kashun and Zhang, Tong},
  journal={arXiv preprint arXiv:2304.06767},
  year={2023}
}

@article{pan2024lisa,
  title={LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning}, 
  author={Pan, Rui and Liu, Xiang and Diao, Shizhe and Pi, Renjie and Zhang, Jipeng and Han, Chi and Zhang, Tong},
  journal={arXiv preprint arXiv:2403.17919},
  year={2024}
}

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4