Wow, that is a lot of money ($4400 on Amazon) to throw at this problem. I am curious, what was the purpose that compelled you to spend this (for the home network, I assume) amount of money.
Large scale document classification tasks in very ambiguous contexts. A lot of my work goes into using big models to generate training data for smaller models.
I have multiple millions of documents so GPT is cost prohibitive, and too slow. My tools of choice tend to be a first pass with Mistral to check task performance and if lacking using Mixtral.
Often I find with a good prompt Mistral will work as well as Mixtral and is about 10x faster.
I’m on my “home” network, but it’s a “home office” for my startup.
Interesting I have the same task, can you share your tools? My goal is to detect if documents contain GDPR sensitive parts or are copies of official documents like ID's and driving licenses etc - would be great to reuse your work!