feat(autodiff): implement prod and prod_dim (#5106)
The Autodiff backend never overrode float_prod/float_prod_dim, so it used the default from burn-backend: exp(sum(log(x))). That path returns NaN for any negative input (log is undefined there) and expands a single reduction into three graph nodes (log -> sum -> exp). The base backends compute the real signed product, so Autodiff and B disagreed on the same input.
Add dedicated float_prod/float_prod_dim ops. The forward computes the real product; the backward uses the analytic gradient grad * prod(x) / x. Follows the existing Sum and CumProd ops.
The backward divides by the input, so the gradient is NaN where an element is zero, same as cumprod (#3864). Left for a follow-up and covered by ignored tests.
Closes #1458
Co-authored-by: leonard leonard@studio-vaai.com
版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9
京公网安备 11010802047560号
Burn is both a tensor library and a deep learning framework, optimized for
numerical computing, training and inference.
Training and inference usually live in separate worlds. Models are typically trained in Python then exported to an open format like ONNX or optimized for production engines like vLLM, ONNX Runtime, or TensorRT. This export step is often brittle and lossy, ruling out complex architectures and advanced deployment use cases.
Burn unifies the two. By executing multi-platform tensor operations via a single, unified API, the exact code used for training is the exact code that runs in production. This makes workloads like on-device personalization and federated learning straightforward, while enabling teams to go from prototype to deployment in a single codebase.
Burn preserves the intuitive ergonomics of PyTorch, with dynamic shapes and graphs, but JIT-compiles streams of tensor operations, performing automatic kernel fusion. You get the flexibility of dynamic graphs without the performance drop.
Rust for Research?
Rust used to be a tough sell for research: long compilation times disrupted the fast edit-compile-run loop that draws researchers to Python. Burn changes this paradigm. Designed around incremental compilation, modifying model code recompiles in under 5 seconds, even in release mode. This delivers a Python-like feedback loop with the speed and safety of Rust.
Ecosystem
Burn is the core of a growing, fully open-source Rust AI ecosystem. You are not adopting a single library, you are joining a stack that spans GPU compute, model interop and domain toolkits, with plenty of room to help shape what comes next.
burn-storeburn-visionburn-rlburn-datasetBurn’s CubeCL backends (CUDA, ROCm, Metal, Vulkan, WebGPU, CPU) compose with autodiff, fusion and remote-execution decorators, while external and simpler backends (LibTorch and pure-Rust CPU/
no_std) compose with autodiff only. See Supported Backends below for the full matrix.Every project here is open-source and actively developed. Want to help build the Rust AI ecosystem? The good first issues are a great place to start, and the Contributing guide will get you set up.
Community crates 🌱
These crates are not maintained by Tracel, but they are part of the same Rust AI story. Anything that helps you load data, build environments, or ship models belongs here. Built something that fits? Open a PR to add it!
Backend
Burn strives to be as fast as possible on as many hardwares as possible, with robust implementations. We believe this flexibility is crucial for modern needs where you may train your models in the cloud, then deploy on customer hardwares, which vary from user to user.
Supported Backends
Most backends support all operating systems, so we don’t mention them in the tables below.
GPU Backends:
CPU Backends:
Compared to other frameworks, Burn has a very different approach to supporting many backends. By design, most code is generic over the Backend trait, which allows us to build Burn with swappable backends. This makes composing backend possible, augmenting them with additional functionalities such as autodifferentiation and automatic kernel fusion.
Autodiff: Backend decorator that brings backpropagation to any backend 🔄
Contrary to the aforementioned backends, Autodiff is actually a backend decorator. This means that it cannot exist by itself; it must encapsulate another backend.
The simple act of wrapping a base backend with Autodiff transparently equips it with autodifferentiation support, making it possible to call backward on your model.
Of note, it is impossible to make the mistake of calling backward on a model that runs on a backend that does not support autodiff (for inference), as this method is only offered by an Autodiff backend.
See the Autodiff Backend README for more details.
Fusion: Backend decorator that brings kernel fusion to all first-party backends
This backend decorator enhances a backend with kernel fusion, provided that the inner backend supports it. Note that you can compose this backend with other backend decorators such as Autodiff. All first-party accelerated backends (like WGPU and CUDA) use Fusion by default (
burn/fusionfeature flag), so you typically don’t need to apply it manually.Of note, we plan to implement automatic gradient checkpointing based on compute bound and memory bound operations, which will work gracefully with the fusion backend to make your code run even faster during training, see this issue.
See the Fusion Backend README for more details.
Remote (Beta): Backend decorator for remote backend execution, useful for distributed computations
That backend has two parts, one client and one server. The client sends tensor operations over the network to a remote compute backend. You can use any first-party backend as server in a single line of code:
Training & Inference
The whole deep learning workflow is made easy with Burn, as you can monitor your training progress with an ergonomic dashboard, and run inference everywhere from embedded devices to large GPU clusters.
Burn was built from the ground up with training and inference in mind. It’s also worth noting how Burn, in comparison to frameworks like PyTorch, simplifies the transition from training to deployment, eliminating the need for code changes.
Click on the following sections to expand 👇
Training Dashboard 📈
As you can see in the previous video (click on the picture!), a new terminal UI dashboard based on the Ratatui crate allows users to follow their training with ease without having to connect to any external application.
You can visualize your training and validation metrics updating in real-time and analyze the lifelong progression or recent history of any registered metrics using only the arrow keys. Break from the training loop without crashing, allowing potential checkpoints to be fully written or important pieces of code to complete without interruption 🛡
ONNX Support 🐫
Burn supports importing ONNX (Open Neural Network Exchange) models through the burn-onnx crate, allowing you to easily port models from TensorFlow or PyTorch to Burn. The ONNX model is converted into Rust code that uses Burn’s native APIs, enabling the imported model to run on any Burn backend (CPU, GPU, WebAssembly) and benefit from all of Burn’s optimizations like automatic kernel fusion.
Our ONNX support is further described in this section of the Burn Book 🔥.
Importing PyTorch or Safetensors Models 🚚
You can load weights from PyTorch or Safetensors formats directly into your Burn-defined models. This makes it easy to reuse existing models while benefiting from Burn’s performance and deployment features.
Learn more in the Saving & Loading Models section of the Burn Book.
Inference in the Browser 🌐
Several of our backends can run in WebAssembly environments: Flex for CPU execution, and WGPU for GPU acceleration via WebGPU. This means that you can run inference directly within a browser. We provide several examples of this:
Embedded: no_std support ⚙️
Burn’s core components support no_std. This means it can run in bare metal environment such as embedded devices without an operating system.
Benchmarks
To evaluate performance across different backends and track improvements over time, we provide a dedicated benchmarking suite.
Run and compare benchmarks using burn-bench.
Getting Started
Just heard of Burn? You are at the right place! Just continue reading this section and we hope you can get on board really quickly.
The Burn Book 🔥
To begin working effectively with Burn, it is crucial to understand its key components and philosophy. This is why we highly recommend new users to read the first sections of The Burn Book 🔥. It provides detailed examples and explanations covering every facet of the framework, including building blocks like tensors, modules, and optimizers, all the way to advanced usage, like coding your own GPU kernels.
Examples 🙏
Let’s start with a code snippet that shows how intuitive the framework is to use! In the following, we declare a neural network module with some parameters along with its forward pass.
We have a somewhat large amount of examples in the repository that shows how to use the framework in different scenarios.
Following the book:
Moduleto train on the MNIST dataset and use for inference.Learner.Additional examples:
Learnerprogress.Module(MLP) with theLearnerconfigured to log metrics and keep training checkpoints.For more practical insights, you can clone the repository and run any of them directly on your computer!
Pre-trained Models 🤖
We keep an updated and curated list of models and examples built with Burn, see the tracel-ai/models repository for more details.
Don’t see the model you want? Don’t hesitate to open an issue, and we may prioritize it. Built a model using Burn and want to share it? You can also open a Pull Request and add your model under the community section!
Why use Rust for AI? 🦀
Deep Learning is a special form of software where you need very high level abstractions as well as extremely fast execution time. Rust is the perfect candidate for that use case since it provides zero-cost abstractions to easily create neural network modules, and fine-grained control over memory to optimize every detail. To this day, the mainstream solution has been to offer APIs in Python but rely on bindings to low-level languages such as C/C++. This reduces portability, increases complexity and creates friction between researchers and engineers. Rust’s approach to abstractions is versatile enough to tackle this two-language dichotomy, and Cargo makes it easy to build, test and deploy from any environment, which is usually a pain in Python.
Rust’s AI ecosystem is young, but it is real and growing quickly. Foundational pieces are already here: Burn and CubeCL for training and compute, candle for inference, Hugging Face’s
tokenizersandsafetensors, andpolarsandndarrayfor data. Betting on Rust today means betting on a stack that is growing, and one where contributors still shape the direction. The pieces that don’t exist yet are opportunities rather than dead-ends (see Contributing).Rust is also what makes one-stack-everywhere possible: a single self-contained binary with no Python runtime to ship, running from servers down to
no_stdembedded targets.Loading Model Records From Previous Versions ⚠️
In the event that you are trying to load a model record saved in a version older than
0.14.0, make sure to use a compatible version (0.14,0.15or0.16) with therecord-backward-compatfeature flag.Otherwise, the record won’t be deserialized correctly and you will get an error message. This error will also point you to the backward compatible feature flag.
The backward compatibility was maintained for deserialization when loading records. Therefore, as soon as you have saved the record again it will be saved according to the new structure and you can upgrade back to the current version
Please note that binary formats are not backward compatible. Thus, you will need to load your record in a previous version and save it in any of the other self-describing record format (e.g., using the
NamedMpkFileRecorder) before using a compatible version (as described) with therecord-backward-compatfeature flag.Community
If you are excited about the project, don’t hesitate to join our Discord! We try to be as welcoming as possible to everybody from any background. You can ask your questions and share what you built with the community!
Contributing
Before contributing, please read the Contributing Guidelines and our Code of Conduct. The Contributor Book covers architecture, environment setup, and guides for common tasks.
Status
Burn is currently in active development, and there will be breaking changes. While any resulting issues are likely to be easy to fix, there are no guarantees at this stage.
License
Burn is distributed under the terms of both the MIT license and the Apache License (Version 2.0). See LICENSE-APACHE and LICENSE-MIT for details. Opening a pull request is assumed to signal agreement with these licensing terms.