Skip to content
Discussion options

You must be logged in to vote

Hi @yang-521 !

If I got it right (please correct me if I am wrong) you are trying to run Qwen3.5-0.8B-Q8_0.llamafile on windows by calling the llamafile server binary and passing the .llamafile as input model.

The llamafile binary is able to extract gguf model weights from a bundled .llamafile, but given the way llama.cpp works you still have to provide both -m and --mmproj parameters to run a multimodal model. As a test, if you run llamafile.exe -m Qwen3.5-0.8B-Q8_0.llamafile --cli --image image_name.png from your terminal, you should definitely see an error.

There are two ways you can run llamafile to serve a multimodal model:

  • if you have llamafile.exe, you can get both LLM and project…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by aittalam
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants