MatrixGemma

Real Matrix-style logit conditioning for DiffusionGemma's diffusion canvas.

MatrixGemma patches the llama.cpp DiffusionGemma entropy-bound sampler so early denoising steps are biased toward kana / Matrix-looking glyph tokens, then smoothly released back to normal model logits. The live --diffusion-visual canvas is the model's actual denoising state, not a post-processing animation.

What This Changes

Changed:

llama.cpp sampler code, via patches/matrix-diffusiongemma.patch.
Initial canvas sampling for DiffusionGemma entropy-bound decoding.
Renoising tokens for non-accepted canvas positions.
Per-step logits, with a decaying Matrix-token bias.
CLI flags: --diffusion-matrix, --diffusion-matrix-fraction, --diffusion-matrix-bias.

Not changed:

No DiffusionGemma model weights are modified.
No model files are committed to this repo.
The local 32 GB macOS path does not use the raw Transformers checkpoint.

Requirements

macOS with Apple Silicon recommended.
Around 32 GB unified memory for the Q4_K_M GGUF.
git, cmake, and Python 3.
Hugging Face access for the Unsloth DiffusionGemma GGUF download.

Setup

Build the patched DiffusionGemma llama.cpp CLI:

./scripts/bootstrap_macos_llamacpp.sh

Download the 4-bit GGUF:

./scripts/download_diffusiongemma_q4.sh

Verify the binary has the Matrix flags:

./scripts/verify_matrix_patch.sh

Ask A Question

./scripts/ask_matrix_gemma.sh "What is the weirdest fact about black holes?"

The wrapper runs:

llama-diffusion-cli
the local Q4_K_M GGUF
entropy-bound DiffusionGemma decoding
live diffusion visual mode
Matrix logit conditioning

Tune The Effect

Cleaner answer:

MATRIX_BIAS=6 MATRIX_FRACTION=0.20 ./scripts/ask_matrix_gemma.sh "Explain gravity simply."

More dramatic Matrix phase:

MATRIX_BIAS=14 MATRIX_FRACTION=0.40 ./scripts/ask_matrix_gemma.sh "Say hello from the simulation."

Useful knobs:

MATRIX_BIAS: early additive logit bias for Matrix glyph tokens. Try 6 to 14.
MATRIX_FRACTION: portion of denoising steps where the bias decays. Try 0.20 to 0.35.
MATRIX_STEPS: max entropy-bound denoising steps. 48 is the quality default.
MATRIX_BLOCKS: number of 256-token canvases. 1 is best for short answers.
N_GPU_LAYERS: llama.cpp Metal offload layers. Default is 99.
N_PREDICT: requested token budget. Default is 256.

Why Not Transformers?

The Hugging Face Transformers path is easier to hack in Python, but the unquantized google/diffusiongemma-26B-A4B-it model is not the friendly 32 GB Mac route. The practical path here is GGUF + llama.cpp + a small sampler patch.

See docs/TECHNICAL.md for implementation details and docs/USAGE.md for more commands.

Optional Visual-Only Demo

matrix_rain.py is a separate terminal animation that does not affect model sampling:

python3 matrix_rain.py --text "Wake up, Neo. The Matrix has you."

It is included as a lightweight visual demo, but the real project is the patched diffusion sampler.

License

MIT. The downloaded model and cloned llama.cpp project retain their own upstream licenses.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
patches		patches
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
logit_conditioning_pseudocode.py		logit_conditioning_pseudocode.py
matrix_rain.py		matrix_rain.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MatrixGemma

What This Changes

Requirements

Setup

Ask A Question

Tune The Effect

Why Not Transformers?

Optional Visual-Only Demo

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MatrixGemma

What This Changes

Requirements

Setup

Ask A Question

Tune The Effect

Why Not Transformers?

Optional Visual-Only Demo

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages