Then you need to go over each item with just as much care as you would any probably-irrelevant item pulled from a keyword search, because the LLM is incapable of evaluating it in any way other than correlation.
Also, you don't necessarily have a real dataset to begin with: prior art doesn't need to be patented, it just needs to be published/public/invented sufficiently before the patent. Searching the existing patent database is insufficient.
> Also, you don't necessarily have a real dataset to begin with: prior art doesn't need to be patented, it just needs to be published/public/invented sufficiently before the patent. Searching the existing patent database is insufficient.
I would caution against making assumptions with regards to dataset access and size. I agree effectiveness of the effort I mention would be a function of not only gen AI engineering, but also dataset size and scope.
Also, you don't necessarily have a real dataset to begin with: prior art doesn't need to be patented, it just needs to be published/public/invented sufficiently before the patent. Searching the existing patent database is insufficient.