Really, really great. Very ambitious. I see no viable solution to the issues people are griping about without forking over buttloads of money every month. Maybe someone on HN does, though - we can only hope you'll detect that helpful comment in this storm of pedestrian shittiness.
Are you on postgres? One thing you could do-- and I only suggest this cumbersome idea because you might just be crazy enough to try it-- would be to use the pg_trgm (trigram) extension with the following in mind:
a) Theory being, when someone greps /[a-z]{4,8}/, they're either interested in {anthem, aardvark, ambition, ...} or {nltk, xkcd, json, zzxx, xxzz, ...}, likely not both.
b) Neither (nor any third set you might come up with) is so inherently superior that it deserves default status over the other.
c) Even with limiting results, half are bound to be totally uninteresting to the user. So what does that even accomplish?
So my pg_trgm suggestion is to take that same /[a-z]{4,8}/ result set and offer the user a relative sliding-scale by which they can push their visible 1,000 closer to/further away from a predefined set of dictionary words.
Actually, we started by experimenting with SQLite (which should be faster than PgSQL I believe since it has no protocol overhead), but it was kinda slow for bulk queries. We then ended up switching to LMDB and LevelDB with a bitmask to represent the availability of all TLDs and the performance improved greatly. As an added benefit, this also made the JSON responses way lighter.
The main problem I see with the pg_trgm approach is that it would only return domains that exist in the database (or in the zone files) and thus they would have to be registered, which totally defeats the purpose of the tool. We couldn't possibly store all the 63 alphanumeric combinations in a database, that's like a gazillion gazillion possibilities! =P
StackOverflow tags is a neat idea for a set, I don't know how we missed that! Thanks!
Are you on postgres? One thing you could do-- and I only suggest this cumbersome idea because you might just be crazy enough to try it-- would be to use the pg_trgm (trigram) extension with the following in mind: a) Theory being, when someone greps /[a-z]{4,8}/, they're either interested in {anthem, aardvark, ambition, ...} or {nltk, xkcd, json, zzxx, xxzz, ...}, likely not both. b) Neither (nor any third set you might come up with) is so inherently superior that it deserves default status over the other. c) Even with limiting results, half are bound to be totally uninteresting to the user. So what does that even accomplish?
So my pg_trgm suggestion is to take that same /[a-z]{4,8}/ result set and offer the user a relative sliding-scale by which they can push their visible 1,000 closer to/further away from a predefined set of dictionary words.
http://www.postgresql.org/docs/9.3/static/pgtrgm.html
You may also consider tech acronyms - maybe steal those from StackOverflow tags. Human names would be too big a hassle, IMO.
Again, I love the ambition of the damn thing. Kicks ass.