Try one and find out. Look at https://github.com/Mozilla-Ocho/llamafile/ Quickstart section; download a single cross-platform ~3.7GB file and execute it, it starts a local model, local webserver, and you can query it.
The video explains that you can download the larger models on that Github page and use them with other command line parameters, and shows how you can get a Windows + nVidia setup to GPU accelerate the model (install CUDA and MSVC / VS Community edition with C++ tools, run for the first time from MSVC x64 command prompt so it can build a thing using cuBLAS, rerun normally with "-ngl 35" command line parameter to use 3.5GB of GPU memory (my card doesn't have much)).
GPU bits have changed! I just noticed in the video description:
"IMPORTANT: This video is obsolete as of December 26, 2023 GPU now works out of the box on Windows. You still need to pass the -ngl 35 flag, but you're no longer required to install CUDA/MSVC."
See it demonstrated in a <7 minute video here: https://www.youtube.com/watch?v=d1Fnfvat6nM
The video explains that you can download the larger models on that Github page and use them with other command line parameters, and shows how you can get a Windows + nVidia setup to GPU accelerate the model (install CUDA and MSVC / VS Community edition with C++ tools, run for the first time from MSVC x64 command prompt so it can build a thing using cuBLAS, rerun normally with "-ngl 35" command line parameter to use 3.5GB of GPU memory (my card doesn't have much)).