Skip to content

Integrate AutoRound into Diffusers#13552

Merged
sayakpaul merged 24 commits into
huggingface:mainfrom
xin3he:auto_round
Jun 10, 2026
Merged

Integrate AutoRound into Diffusers#13552
sayakpaul merged 24 commits into
huggingface:mainfrom
xin3he:auto_round

Conversation

@xin3he

@xin3he xin3he commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

This pull request introduces support for the AutoRound quantization algorithm in the Diffusers library. AutoRound is a weight-only quantization method that enables efficient inference by optimizing weight rounding and min-max ranges, primarily targeting the W4A16 configuration (4-bit weights, 16-bit activations). The changes add a new quantization config, quantizer class, backend integration, and comprehensive documentation, while ensuring proper handling of optional dependencies.

Key changes:

AutoRound quantization support

  • Added a new AutoRoundConfig class to quantization_config.py for configuring AutoRound quantization parameters, including bits, group size, symmetry, backend, and modules to exclude from quantization.
  • Introduced the AutoRoundQuantizer class in quantizers/autoround/autoround_quantizer.py, implementing the logic for loading pre-quantized AutoRound models and integrating with the auto-round library.
  • Registered AutoRoundConfig and AutoRoundQuantizer in the quantization auto-mapping logic, enabling selection via the "auto-round" key

Dependency management and import handling

  • Added is_auto_round_available utility and integrated it into the main import structure and conditional imports, ensuring that AutoRound features are only available if the dependency is installed. Dummy objects are provided otherwise.
  • Implemented a test utility require_auto_round_version_greater_or_equal for version-gated testing of AutoRound features.

Documentation

  • Added a comprehensive user guide at docs/source/en/quantization/autoround.md, including usage examples, backend options, configuration details, and resource links.

These changes collectively enable seamless integration of AutoRound quantization into Diffusers, with robust configuration, backend selection, and user guidance.

Existed Model

cc @wenhuach21 @thuang6 @hshen14

Before submitting

Who can review?

@yiyixuxu @asomoza @stevhliu @sayakpaul

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

xin3he added 3 commits April 10, 2026 15:19
Signed-off-by: Xin He <xin3.he@intel.com>
Signed-off-by: Xin He <xin3.he@intel.com>
Signed-off-by: Xin He <xin3.he@intel.com>
@github-actions github-actions Bot added documentation Improvements or additions to documentation quantization tests utils size/L PR with diff > 200 LOC labels Apr 23, 2026
@github-actions github-actions Bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 23, 2026
@xin3he xin3he changed the title Auto round Integrate AutoRound into Diffusers Apr 23, 2026
@xin3he

xin3he commented Apr 23, 2026

Copy link
Copy Markdown
Contributor Author
  • [2025/05] AutoRound has been integrated into vLLM: Usage, Medium blog, 小红书.

  • [2025/05] AutoRound has been integrated into Transformers: Blog.

I would like to integrate AutoRound into Diffusers to support diffusion models. Although the performance improvement is not significant, memory usage is notably reduced.

@sayakpaul

Copy link
Copy Markdown
Member

Thanks for this PR! Could you provide some example code using AutoRound and also some example outputs? Feel free to also report latency and memory consumption so that there's some signal into its effectiveness.

Cc: @SunMarc

@stevhliu stevhliu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the integration!

Comment thread docs/source/en/quantization/autoround.md Outdated
Comment thread docs/source/en/quantization/autoround.md Outdated
Comment thread docs/source/en/quantization/autoround.md Outdated
Comment thread docs/source/en/quantization/autoround.md Outdated
Comment thread docs/source/en/quantization/autoround.md Outdated
Comment thread docs/source/en/quantization/autoround.md Outdated
Comment thread docs/source/en/quantization/autoround.md Outdated
Comment thread docs/source/en/quantization/autoround.md Outdated
Comment thread docs/source/en/quantization/autoround.md Outdated
Comment thread docs/source/en/quantization/autoround.md Outdated
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
@github-actions github-actions Bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 24, 2026
Comment thread docs/source/en/quantization/autoround.md Outdated
Comment thread docs/source/en/quantization/autoround.md Outdated
Comment thread docs/source/en/quantization/autoround.md Outdated
Comment thread docs/source/en/quantization/autoround.md Outdated
Comment thread docs/source/en/quantization/autoround.md Outdated
Comment thread docs/source/en/quantization/autoround.md Outdated
@github-actions github-actions Bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels Apr 24, 2026

@sayakpaul sayakpaul left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just opened a small PR xin3he#1. I think we're ready to merge. Sorry for the delay!

@xin3he

xin3he commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Thanks @sayakpaul
🎉

@sayakpaul

Copy link
Copy Markdown
Member

Let's fix the dependency tests https://github.com/huggingface/diffusers/actions/runs/27119203832/job/80034333297

@sayakpaul

Copy link
Copy Markdown
Member

We have more failures. Could you run make fix-copies?

@xin3he

xin3he commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

We have more failures. Could you run make fix-copies?

Sure~

@xin3he

xin3he commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

@sayakpaul make fix-copies didn't modify any files; that's strange.

sayakpaul and others added 2 commits June 9, 2026 11:13
@sayakpaul

Copy link
Copy Markdown
Member

Can you cherry-pick this commit?
0c2838a

@xin3he

xin3he commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

@sayakpaul I reproduced the error and fixed it. Thanks.

@xin3he

xin3he commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

I saw the failure happen again, WIP.

Signed-off-by: Xin He <xin3.he@intel.com>
@xin3he

xin3he commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

@sayakpaul Both make deps_table_check_updated and pytest tests/others/test_dependencies.py::TestDependencies::test_backend_registration are verified now.

@sayakpaul sayakpaul merged commit c31bb1c into huggingface:main Jun 10, 2026
18 of 19 checks passed
DN6 pushed a commit that referenced this pull request Jun 10, 2026
* support auto_round

Signed-off-by: Xin He <xin3.he@intel.com>

* add document and unit tests

Signed-off-by: Xin He <xin3.he@intel.com>

* fix CI

Signed-off-by: Xin He <xin3.he@intel.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update document and overwrite the default quantization_config with specified backend.

Signed-off-by: Xin He <xin3.he@intel.com>

* add UT and fix bug

Signed-off-by: Xin He <xin3.he@intel.com>

* update per comments

Signed-off-by: Xin He <xin3.he@intel.com>

* update per comments

Signed-off-by: Xin He <xin3.he@intel.com>

* fix compile error in doc

Signed-off-by: Xin He <xin3.he@intel.com>

* Apply style fixes

* small nits

* Add auto_round dependency to the versions table

Signed-off-by: Xin He <xin3.he@intel.com>

* fix make deps_table_check_updated

Signed-off-by: Xin He <xin3.he@intel.com>

* fix CI

Signed-off-by: Xin He <xin3.he@intel.com>

---------

Signed-off-by: Xin He <xin3.he@intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
sayakpaul added a commit that referenced this pull request Jun 16, 2026
* update

* update

* update

* update

* [CI] Refactor SD3 Transformer Test (#13340)

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* refactor unet tests (3d_condition, motion, controlnetxs) (#13897)

* refactor unet_3d_condition tests

* refactor unet_motion tests

* refactor unet_controlnetxs tests

* refactor unet_1d tests (#13898)

* refactor unet_1d tests

* use per-sample output_shape for unet_1d tests

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* refactor unet_2d tests (#13901)

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* [chore] log quant config to the user_agent (#13850)

log quant config to the user_agent

* Integrate AutoRound into Diffusers (#13552)

* support auto_round

Signed-off-by: Xin He <xin3.he@intel.com>

* add document and unit tests

Signed-off-by: Xin He <xin3.he@intel.com>

* fix CI

Signed-off-by: Xin He <xin3.he@intel.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update document and overwrite the default quantization_config with specified backend.

Signed-off-by: Xin He <xin3.he@intel.com>

* add UT and fix bug

Signed-off-by: Xin He <xin3.he@intel.com>

* update per comments

Signed-off-by: Xin He <xin3.he@intel.com>

* update per comments

Signed-off-by: Xin He <xin3.he@intel.com>

* fix compile error in doc

Signed-off-by: Xin He <xin3.he@intel.com>

* Apply style fixes

* small nits

* Add auto_round dependency to the versions table

Signed-off-by: Xin He <xin3.he@intel.com>

* fix make deps_table_check_updated

Signed-off-by: Xin He <xin3.he@intel.com>

* fix CI

Signed-off-by: Xin He <xin3.he@intel.com>

---------

Signed-off-by: Xin He <xin3.he@intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* [tests] refactor UNet model tests to align with the new pattern (#13153)

* refactor unet2d condition model tests.

* fix tests

* up

* fix

* Revert "fix"

This reverts commit 46d44b7.

* up

* recompile limit

* [tests] refactor test_models_unet_1d.py to use modular testing mixins

Refactor UNet1D model tests to follow the modern testing pattern using
BaseModelTesterConfig and focused mixin classes (ModelTesterMixin,
MemoryTesterMixin, TrainingTesterMixin, LoraTesterMixin).

Both UNet1D standard and RL variants now have separate config classes
and dedicated test classes organized by concern (core, memory, training,
LoRA, hub loading).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [tests] refactor test_models_unet_2d.py to use modular testing mixins

Refactor UNet2D model tests (standard, LDM, NCSN++) to follow the
modern testing pattern. Each variant gets its own config class and
dedicated test classes organized by concern (core, memory, training,
LoRA, hub loading).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [tests] refactor test_models_unet_3d_condition.py to use modular testing mixins

Refactor UNet3DConditionModel tests to follow the modern testing pattern
with separate classes for core, attention, memory, training, and LoRA.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [tests] refactor test_models_unet_controlnetxs.py to use modular testing mixins

Refactor UNetControlNetXSModel tests to follow the modern testing
pattern with separate classes for core, memory, training, and LoRA.
Specialized tests (from_unet, freeze_unet, forward_no_control,
time_embedding_mixing) remain in the core test class.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [tests] refactor test_models_unet_spatiotemporal.py to use modular testing mixins

Refactored the spatiotemporal UNet test file to follow the modern modular testing
pattern with BaseModelTesterConfig and focused test classes:

- UNetSpatioTemporalTesterConfig: Base configuration with model setup
- TestUNetSpatioTemporal: Core model tests (ModelTesterMixin, UNetTesterMixin)
- TestUNetSpatioTemporalAttention: Attention-related tests (AttentionTesterMixin)
- TestUNetSpatioTemporalMemory: Memory/offloading tests (MemoryTesterMixin)
- TestUNetSpatioTemporalTraining: Training tests (TrainingTesterMixin)
- TestUNetSpatioTemporalLoRA: LoRA adapter tests (LoraTesterMixin)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* remove test suites that are passed.

* fix consistencydecodervae tests

* Revert "fix consistencydecodervae tests"

This reverts commit 41b036b.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* [tests] fix vidtok tests (#13894)

* fix vidtok tests

* style

* Update tests/models/autoencoders/test_models_autoencoder_vidtok.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Apply style fixes

---------

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* clean up

---------

Signed-off-by: Xin He <xin3.he@intel.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Akshan Krithick <97239696+akshan-main@users.noreply.github.com>
Co-authored-by: Xin He <xin3.he@intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation quantization size/L PR with diff > 200 LOC tests utils

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants