Don't stop at robots.txt blocking. Look through your access logs, and you'll likely find a few IPs generating a huge amount of traffic. Look them up via "whois," then block the entire IP range if it seems like a bot host. There's no reason for cloud providers to browse my personal site, so if they host crawlers, they get blocked.
I wonder how the AI/copyright arguments will play out in court.
"If I read your book and I have a photographic memory and can recall any paragraph do I need to pay you a licensing fee?"
"If I go through your library and count all the times that 'the' is adjacent to 'end' do I need to get your permission to then tell that number to other people?"
Cases where we see the absurdity of copyright in its current form. But either we have it for everyone, including OpenAI, or for no one. Or are perhaps some more equal than others before the law?
List of crawlers for those who now want to block: https://platform.openai.com/docs/bots