Real Matrix-style logit conditioning for DiffusionGemma's diffusion canvas.
MatrixGemma patches the llama.cpp DiffusionGemma entropy-bound sampler so early denoising steps are biased toward kana / Matrix-looking glyph tokens, then smoothly released back to normal model logits. The live --diffusion-visual canvas is the model's actual denoising state, not a post-processing animation.
Changed:
llama.cppsampler code, viapatches/matrix-diffusiongemma.patch.- Initial canvas sampling for DiffusionGemma entropy-bound decoding.
- Renoising tokens for non-accepted canvas positions.
- Per-step logits, with a decaying Matrix-token bias.
- CLI flags:
--diffusion-matrix,--diffusion-matrix-fraction,--diffusion-matrix-bias.
Not changed:
- No DiffusionGemma model weights are modified.
- No model files are committed to this repo.
- The local 32 GB macOS path does not use the raw Transformers checkpoint.
- macOS with Apple Silicon recommended.
- Around 32 GB unified memory for the
Q4_K_MGGUF. git,cmake, and Python 3.- Hugging Face access for the Unsloth DiffusionGemma GGUF download.
Build the patched DiffusionGemma llama.cpp CLI:
./scripts/bootstrap_macos_llamacpp.shDownload the 4-bit GGUF:
./scripts/download_diffusiongemma_q4.shVerify the binary has the Matrix flags:
./scripts/verify_matrix_patch.sh./scripts/ask_matrix_gemma.sh "What is the weirdest fact about black holes?"The wrapper runs:
llama-diffusion-cli- the local
Q4_K_MGGUF - entropy-bound DiffusionGemma decoding
- live diffusion visual mode
- Matrix logit conditioning
Cleaner answer:
MATRIX_BIAS=6 MATRIX_FRACTION=0.20 ./scripts/ask_matrix_gemma.sh "Explain gravity simply."More dramatic Matrix phase:
MATRIX_BIAS=14 MATRIX_FRACTION=0.40 ./scripts/ask_matrix_gemma.sh "Say hello from the simulation."Useful knobs:
MATRIX_BIAS: early additive logit bias for Matrix glyph tokens. Try6to14.MATRIX_FRACTION: portion of denoising steps where the bias decays. Try0.20to0.35.MATRIX_STEPS: max entropy-bound denoising steps.48is the quality default.MATRIX_BLOCKS: number of 256-token canvases.1is best for short answers.N_GPU_LAYERS: llama.cpp Metal offload layers. Default is99.N_PREDICT: requested token budget. Default is256.
The Hugging Face Transformers path is easier to hack in Python, but the unquantized google/diffusiongemma-26B-A4B-it model is not the friendly 32 GB Mac route. The practical path here is GGUF + llama.cpp + a small sampler patch.
See docs/TECHNICAL.md for implementation details and docs/USAGE.md for more commands.
matrix_rain.py is a separate terminal animation that does not affect model sampling:
python3 matrix_rain.py --text "Wake up, Neo. The Matrix has you."It is included as a lightweight visual demo, but the real project is the patched diffusion sampler.
MIT. The downloaded model and cloned llama.cpp project retain their own upstream licenses.