Issues · lablup/mlxcel · GitHub

Labels Milestones

fix(vision): captured slice_update in DeepStack/section-assembly injection loops (qwen3_vl, qwen3_vl_moe, qwen2_vl, paddleocr_vl)

#650

· inureyes opened

on Jul 3, 2026

CUDA: long-prompt prefill aborts (exit 134, invalid launch config) when a kernel grid dim exceeds 65535

#648

· inureyes opened

on Jul 3, 2026

perf(speculative): CUDA drafter pairing matrix and continuous-batching integration

priority:medium

type:performance

#638

· inureyes opened

on Jul 2, 2026

perf(cuda/quant): sm_120/sm_121-tuned quantized GEMM for Blackwell prefill

priority:medium

type:performance

#637

· inureyes opened

on Jul 2, 2026

perf(cuda): single-dtype decode graph: eliminate per-token AsType conversions

priority:medium

type:performance

#636

· inureyes opened

on Jul 2, 2026

perf(cuda/kv): validate and enable quantized KV cache modes on CUDA

priority:medium

type:performance

#635

· inureyes opened

on Jul 2, 2026

perf(cuda/attn): port the native paged-attention decode kernel to CUDA

priority:medium

type:performance

#634

· inureyes opened

on Jul 2, 2026

perf(server): incremental detokenization and per-token streaming overhead reduction

priority:medium

type:performance

#633

· inureyes opened

on Jul 2, 2026

perf(server): port lookahead async_eval pipelining into BatchScheduler decode

priority:medium

type:performance

#632

· inureyes opened

on Jul 2, 2026

perf(cuda/ssm): close the hybrid-SSM decode gap on CUDA

priority:medium

type:performance

#631

· inureyes opened

on Jul 2, 2026

fix(cuda/quant): repair nvfp4 execution path on CUDA

priority:medium

#630

· inureyes opened

on Jul 2, 2026

perf(cuda/moe): fix MoE prefill collapse on CUDA (gather_qmm / grouped GEMM path)

priority:medium

type:performance

#629

· inureyes opened

on Jul 2, 2026