ollama 0.6.6 invoked with:
# server OLLAMA_FLASH_ATTENTION=1 OLLAMA_KV_CACHE_TYPE=q8_0 ollama serve # client ollama run --verbose qwen3:30b-a3b
/set parameter num_ctx 32768
TY for this.
update: wow, it's quite fast - 70-80t/s on LM Studio with a few other applications using GPU.
ollama 0.6.6 invoked with:
~19.8 GiB with: