Skip to content

antonvice/matrixgemma

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MatrixGemma

Real Matrix-style logit conditioning for DiffusionGemma's diffusion canvas.

MatrixGemma patches the llama.cpp DiffusionGemma entropy-bound sampler so early denoising steps are biased toward kana / Matrix-looking glyph tokens, then smoothly released back to normal model logits. The live --diffusion-visual canvas is the model's actual denoising state, not a post-processing animation.

What This Changes

Changed:

  • llama.cpp sampler code, via patches/matrix-diffusiongemma.patch.
  • Initial canvas sampling for DiffusionGemma entropy-bound decoding.
  • Renoising tokens for non-accepted canvas positions.
  • Per-step logits, with a decaying Matrix-token bias.
  • CLI flags: --diffusion-matrix, --diffusion-matrix-fraction, --diffusion-matrix-bias.

Not changed:

  • No DiffusionGemma model weights are modified.
  • No model files are committed to this repo.
  • The local 32 GB macOS path does not use the raw Transformers checkpoint.

Requirements

  • macOS with Apple Silicon recommended.
  • Around 32 GB unified memory for the Q4_K_M GGUF.
  • git, cmake, and Python 3.
  • Hugging Face access for the Unsloth DiffusionGemma GGUF download.

Setup

Build the patched DiffusionGemma llama.cpp CLI:

./scripts/bootstrap_macos_llamacpp.sh

Download the 4-bit GGUF:

./scripts/download_diffusiongemma_q4.sh

Verify the binary has the Matrix flags:

./scripts/verify_matrix_patch.sh

Ask A Question

./scripts/ask_matrix_gemma.sh "What is the weirdest fact about black holes?"

The wrapper runs:

  • llama-diffusion-cli
  • the local Q4_K_M GGUF
  • entropy-bound DiffusionGemma decoding
  • live diffusion visual mode
  • Matrix logit conditioning

Tune The Effect

Cleaner answer:

MATRIX_BIAS=6 MATRIX_FRACTION=0.20 ./scripts/ask_matrix_gemma.sh "Explain gravity simply."

More dramatic Matrix phase:

MATRIX_BIAS=14 MATRIX_FRACTION=0.40 ./scripts/ask_matrix_gemma.sh "Say hello from the simulation."

Useful knobs:

  • MATRIX_BIAS: early additive logit bias for Matrix glyph tokens. Try 6 to 14.
  • MATRIX_FRACTION: portion of denoising steps where the bias decays. Try 0.20 to 0.35.
  • MATRIX_STEPS: max entropy-bound denoising steps. 48 is the quality default.
  • MATRIX_BLOCKS: number of 256-token canvases. 1 is best for short answers.
  • N_GPU_LAYERS: llama.cpp Metal offload layers. Default is 99.
  • N_PREDICT: requested token budget. Default is 256.

Why Not Transformers?

The Hugging Face Transformers path is easier to hack in Python, but the unquantized google/diffusiongemma-26B-A4B-it model is not the friendly 32 GB Mac route. The practical path here is GGUF + llama.cpp + a small sampler patch.

See docs/TECHNICAL.md for implementation details and docs/USAGE.md for more commands.

Optional Visual-Only Demo

matrix_rain.py is a separate terminal animation that does not affect model sampling:

python3 matrix_rain.py --text "Wake up, Neo. The Matrix has you."

It is included as a lightweight visual demo, but the real project is the patched diffusion sampler.

License

MIT. The downloaded model and cloned llama.cpp project retain their own upstream licenses.

About

Matrix-style logit conditioning for DiffusionGemma's llama.cpp denoiser

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors