Include Qwen shared experts in MoE LoRA by vivekkalyan · Pull Request #720 · OpenPipe/ART

vivekkalyan · 2026-06-05T06:53:48Z

Summary

include Qwen3.5/Qwen3.6 MoE shared-expert gate/up/down targets in the default LoRA target set
wrap and export shared-expert LoRA weights alongside routed grouped experts
preserve shared-expert target modules when converting fused MoE adapters to vLLM format

Sky H200 validation on art-qwen35-shared-expert-lora-test: created a rank-8 Qwen3.5 MoE ART adapter, published it to vLLM format, roundtripped it back into Megatron, compared structure against the run57 Tinker adapter, and loaded it with stock vLLM LoRA loader. Stock vLLM loaded shared-expert gate/up/down and fused routed experts successfully.

vivekkalyan added 3 commits June 5, 2026 09:51

fix: Include Qwen shared experts in MoE LoRA

c69df92

fix: Respect configured Megatron LoRA rank

1350bde

fix: Propagate Megatron LoRA config to trainer

ab68dd3

vivekkalyan force-pushed the codex/lora-target-presets branch from 1481185 to ab68dd3 Compare June 5, 2026 16:52

vivekkalyan changed the base branch from main to fix/routing-replay-split-sizes June 5, 2026 16:52