You can choose the quantization by appending the right tag to the model name, bu...

mseri 78 days ago | parent | context | favorite | on: Everything I've learned so far about running local...

You can choose the quantization by appending the right tag to the model name, but they don't support other more advanced useful features (e.g. you need a special flag to enable flash attention and you cannot use KV cache quantization for large contexts).