Skip to content

Include Qwen shared experts in MoE LoRA#720

Open
vivekkalyan wants to merge 3 commits into
fix/routing-replay-split-sizesfrom
codex/lora-target-presets
Open

Include Qwen shared experts in MoE LoRA#720
vivekkalyan wants to merge 3 commits into
fix/routing-replay-split-sizesfrom
codex/lora-target-presets

Conversation

@vivekkalyan
Copy link
Copy Markdown
Collaborator

@vivekkalyan vivekkalyan commented Jun 5, 2026

Summary

  • include Qwen3.5/Qwen3.6 MoE shared-expert gate/up/down targets in the default LoRA target set
  • wrap and export shared-expert LoRA weights alongside routed grouped experts
  • preserve shared-expert target modules when converting fused MoE adapters to vLLM format

Validation

  • Sky H200 validation on art-qwen35-shared-expert-lora-test: created a rank-8 Qwen3.5 MoE ART adapter, published it to vLLM format, roundtripped it back into Megatron, compared structure against the run57 Tinker adapter, and loaded it with stock vLLM LoRA loader. Stock vLLM loaded shared-expert gate/up/down and fused routed experts successfully.

@vivekkalyan vivekkalyan force-pushed the codex/lora-target-presets branch from 1481185 to ab68dd3 Compare June 5, 2026 16:52
@vivekkalyan vivekkalyan changed the base branch from main to fix/routing-replay-split-sizes June 5, 2026 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant