So I just brute forced every UUID in existence on my RTX GPU and loaded the dataset into a HA opensearch cluster on AWS. It took about 5 years of calling ‘uuid.Random()’ to effectively cover about 64% of the keyspace which is good enough.
To facilitate full-text search I created a langchain application in python, hosted on kubernetes, that takes your search query and generates synonymous UUIDs via GPT o1-preview before handing over to opensearch.
Opensearch returns another set of UUIDs, which I look up in my postgres database: “SELECT uuid FROM uuids WHERE id IN (…uuid_ids)”
I could imagine some candidates starting with their default tools like this, and start complaining about the cluster performance after a few weeks.
You need a certain way of thinking to have a gut feeling "this could be expensive" and then go back, question your assumptions and confirm your requirements. Not everyone does that - better to rule them out.
To facilitate full-text search I created a langchain application in python, hosted on kubernetes, that takes your search query and generates synonymous UUIDs via GPT o1-preview before handing over to opensearch.
Opensearch returns another set of UUIDs, which I look up in my postgres database: “SELECT uuid FROM uuids WHERE id IN (…uuid_ids)”