> 4-bit quantized Mixtral Instruct running locally, gets it right This has been ...

> 4-bit quantized Mixtral Instruct running locally, gets it right

This has been one of my favorite things to play around with when it comes to real life applications. Sometimes a smaller "worse" model will vastly outperform a larger model. This seems to happen when the larger model overthinks the problem. Trying to do something simple like "extract all the names of people in this block of text" Llama 7B will have significantly fewer false positives than LLama 70B or GPT4.