use ruff (#137)
use ruff
reformat
re-run
update deps
undo
Update src/open_r1/configs.py
Co-authored-by: Quentin Gallouédec 45557362+qgallouedec@users.noreply.github.com
- Update src/open_r1/configs.py
Co-authored-by: Quentin Gallouédec 45557362+qgallouedec@users.noreply.github.com
- Update src/open_r1/configs.py
Co-authored-by: Quentin Gallouédec 45557362+qgallouedec@users.noreply.github.com
- Update src/open_r1/configs.py
Co-authored-by: Quentin Gallouédec 45557362+qgallouedec@users.noreply.github.com
fix help strings
fix ruff version
fix formatting
Co-authored-by: Quentin Gallouédec 45557362+qgallouedec@users.noreply.github.com
Open R1
A fully open reproduction of DeepSeek-R1. This repo is a work in progress, let’s build it together!
Table of Contents
Overview
The goal of this repo is to build the missing pieces of the R1 pipeline such that everybody can reproduce and build on top of it. The project is simple by design and mostly consists of:
src/open_r1
: contains the scripts to train and evaluate models as well as generate synthetic data:grpo.py
: trains a model with GRPO on a given dataset.sft.py
: performs a simple SFT of a model on a dataset.evaluate.py
: evaluates a model on the R1 benchmarks.generate.py
: generates synthetic data from a model using Distilabel.Makefile
: contains easy-to-run commands for each step in the R1 pipeline leveraging the scripts above.Plan of attack
We will use the DeepSeek-R1 tech report as a guide, which can roughly be broken down into three main steps:
Installation
Note: Libraries rely on CUDA 12.1. Double check your system if you get segmentation faults.
To run the code in this project, first, create a Python virtual environment using e.g.
uv
. To installuv
, follow the UV Installation Guide.Next, install vLLM:
This will also install PyTorch
v2.5.1
and it is very important to use this version since the vLLM binaries are compiled for it. You can then install the remaining dependencies for your specific use case viapip install -e .[LIST OF MODES]
. For most contributors, we recommend:Next, log into your Hugging Face and Weights and Biases accounts as follows:
Finally, check whether your system has Git LFS installed so that you can load and push models/datasets to the Hugging Face Hub:
If it isn’t installed, run:
Training models
We support training models with either DDP or DeepSpeed (ZeRO-2 and ZeRO-3). To switch between methods, simply change the path to the
accelerate
YAML config inconfigs
.SFT
To run SFT on a dataset distilled from DeepSeek-R1 with reasoning traces such as Bespoke-Stratos-17k, run:
To launch a Slurm job, run:
Here
{model}
and{dataset}
refer to the model and dataset IDs on the Hugging Face Hub, while{accelerator}
refers to the choice of an 🤗 Accelerate config file in configs.GRPO
To train via the GRPO trainer, we use one GPU to run vLLM for faster generation and the remaining GPUs for training. For example, one a node with 8 GPUs, use the
recipes/accelerate_configs/zero3.yaml
config and then overwritenum_processes
to run on 7 devices:To launch a Slurm job, run:
You can find more model configurations in the recipes.
Evaluating models
We use
lighteval
to evaluate models, with custom tasks defined insrc/open_r1/evaluate.py
. For models which fit on a single GPU, run:To increase throughput across multiple GPUs, use data parallel as follows:
For large models which require sharding across GPUs, use tensor parallel and run:
You can also launch an evaluation with
make evaluate
, specifying the model, task, and optionally the parallelism technique and number of GPUs.To evaluate on a single GPU:
To use Data Parallelism:
To use Tensor Parallelism:
Reproducing Deepseek’s evaluation results on MATH-500
We are able to reproduce Deepseek’s reported results on the MATH-500 Benchmark: | Model | MATH-500 (HF lighteval) | MATH-500 (DeepSeek Reported) | | :————————– | :——-: | :—————————-: | | DeepSeek-R1-Distill-Qwen-1.5B | 81.6 | 83.9 | | DeepSeek-R1-Distill-Qwen-7B | 91.8 | 92.8 | | DeepSeek-R1-Distill-Qwen-14B | 94.2 | 93.9 | | DeepSeek-R1-Distill-Qwen-32B | 95.0 | 94.3 | | DeepSeek-R1-Distill-Llama-8B | 85.8 | 89.1 | | DeepSeek-R1-Distill-Llama-70B | 93.4 | 94.5 |
To reproduce these results use the following command:
Data generation
Generate data from a smol distilled R1 model
The following example can be run in 1xH100. First install the following dependencies:
Now save the following snippet into a file named
pipeline.py
and run it withpython pipeline.py
. It will generate 4 outputs for each of the 10 examples (change the username for the repository to your org/user name):Take a look at the sample dataset at HuggingFaceH4/numina-deepseek-r1-qwen-7b.
Generate data from DeepSeek-R1
To run the bigger DeepSeek-R1, we used 2 nodes, each with 8×H100 GPUs using the slurm file present in this repo at
slurm/generate.slurm
. First, install the dependencies:(for now we need to install the vllm dev wheel that fixes the R1 cuda graph capture)
And then run the following command:
Contributing
Contributions are welcome. Please refer to https://github.com/huggingface/open-r1/issues/23.