Domain name search with regular expressions and curated sets

mikejarema · on June 13, 2014

Who's behind this anyways?

I'm genuinely curious because (1) it doesn't seem to be presented in their FAQ nor on-site and (2) the WHOIS info for namegrep.com is private.

While this isn't a huge red flag, it is a little suspicious that I could be handing over my branding strategies (if one could call regexps "strategies) to some unknown 3rd party.

(FWIW, I built a similar tool, but I don't go to any lengths to hide my involvement)

EDIT: removed link to my similar tool, wasn't intending to hijack any clicks, but rather get the discussion going on whether its important to you as a user of this tool to know who you're dealing with

ssarah · on June 13, 2014

We're just two developers working solo, based in Portugal. We didn't create any public profiles, but feel free to contact us ;) As for your question concerning privacy: it was also one of my main concerns. We don't log or track anything on our servers, and we will keep it that way.

mikejarema · on June 13, 2014

Makes sense, thanks for the response, I really like the tool.

I think one area where Namegrep would benefit is to offer a few more examples to try out. As a "hacker" myself I immediately understand the power of the tool, but still want to see a few more interesting regexps that actually yield some worthwhile domains.

That and inserting a subtle horizontal divider in the result search everytime I press the search button (while in regexp mode). That way, at a glance I can see where my previous query ended and my new one began.

Awesome work!

mikejarema · on June 13, 2014

Got a few downvotes, is this not a relevant concern?

Perhaps due to listing my similar tool (removed from above comment).

larrys · on June 14, 2014

It's a reasonable question you asked but I've noticed a tendency to get downvoted for asking similar questions.

There is domain name frontrunning for sure:

http://en.wikipedia.org/wiki/Domain_name_front_running

My speculation is that downvotes come from these cases:

a) Somebody that is well known on HN posts something and you are just supposed to know that they are well known.

b) General trusting and good nature of hackers and lack of cynicism (on the part of the people who downvote that is not all hackers).

But it is definitely a reasonable thing to ask to be clarified and instead of downvoting people should really just explain why they feel the site should be trusted.

akshatpradhan · on June 14, 2014

I can vouch for alixaxel. This is his tool and he's pretty legit. You can find his code submissions on bountify.co to see how he's contributed to the Hacker community. Plus he's submitted a few other HN posts that got upwards of 200 points, etc.

gametheoretic · on June 13, 2014

Really, really great. Very ambitious. I see no viable solution to the issues people are griping about without forking over buttloads of money every month. Maybe someone on HN does, though - we can only hope you'll detect that helpful comment in this storm of pedestrian shittiness.

Are you on postgres? One thing you could do-- and I only suggest this cumbersome idea because you might just be crazy enough to try it-- would be to use the pg_trgm (trigram) extension with the following in mind: a) Theory being, when someone greps /[a-z]{4,8}/, they're either interested in {anthem, aardvark, ambition, ...} or {nltk, xkcd, json, zzxx, xxzz, ...}, likely not both. b) Neither (nor any third set you might come up with) is so inherently superior that it deserves default status over the other. c) Even with limiting results, half are bound to be totally uninteresting to the user. So what does that even accomplish?

So my pg_trgm suggestion is to take that same /[a-z]{4,8}/ result set and offer the user a relative sliding-scale by which they can push their visible 1,000 closer to/further away from a predefined set of dictionary words.

http://www.postgresql.org/docs/9.3/static/pgtrgm.html

You may also consider tech acronyms - maybe steal those from StackOverflow tags. Human names would be too big a hassle, IMO.

Again, I love the ambition of the damn thing. Kicks ass.

alixaxel · on June 14, 2014

Thanks for the kind feedback. =)

Actually, we started by experimenting with SQLite (which should be faster than PgSQL I believe since it has no protocol overhead), but it was kinda slow for bulk queries. We then ended up switching to LMDB and LevelDB with a bitmask to represent the availability of all TLDs and the performance improved greatly. As an added benefit, this also made the JSON responses way lighter.

The main problem I see with the pg_trgm approach is that it would only return domains that exist in the database (or in the zone files) and thus they would have to be registered, which totally defeats the purpose of the tool. We couldn't possibly store all the 63 alphanumeric combinations in a database, that's like a gazillion gazillion possibilities! =P

StackOverflow tags is a neat idea for a set, I don't know how we missed that! Thanks!

ericb · on June 13, 2014

This is great, but the "please keep it under.." issue made it unusable.

Limited to .com, I couldn't use this pattern: [a-z]{4,8}coin

If that's not workable, I'm not sure I can come up with a pattern that works. Why not just cap the results returned?

ssarah · on June 13, 2014

It yields the results of each regex element separately, so it's hard to come up with a functional implementation capped results. Still, I'll look into it. But even then, the main question is if you really want to see the head of a list of billions of combinations like you propose?

corobo · on June 13, 2014

(Not OP)

No, not the billions, but yes the results of other searches when filtering by ".com available" such as (:colors:)((:words/adverbs:)|(:words/verbs:)) and then maybe if I also limit the length of the domain from there - there wont then be billions of results

As a side to that, could :colors include more of them? There's definitely more than 12 usable colour names!

ssarah · on June 13, 2014

Since the results are stacked on the browser, for that particular search you typed, you could always search (:colors:)(:words/verbs:) and (:colors:)(:words/adverbs:) separately and I believe you will get the full list you want. But I really have to generate the regex and availability results before being able to apply the filters. As for the colors: yes the sets need to be improved. (: You got some suggestions?

nkozyra · on June 13, 2014

This is actually pretty spectacular, but it would be really nice if it searched at least the geographic TLDs.

ssarah · on June 13, 2014

Working on it. Hard to get all those zones (; Thanks for your comment.

mnx · on June 13, 2014

Somethings broken, it show zip.com as available, and it was registered in 1997, is valid until 2015.

ssarah · on June 13, 2014

I'm sorry guys, we can't WHOIS every domain, so there will be a few false positives.

eli · on June 13, 2014

zip.com doesn't resolve. I would guess they're just using DNS to check registration.

mnx · on June 13, 2014

yeah, looks like it. Oh well, it gave me brief hope of having an awesome domain.

eli · on June 13, 2014

I'll save you some time: there are no three character .com domains available, not for a long time :)

maj0rhn · on June 13, 2014

Looks very useful, but would it be possible to choose different colors for the available/not available tiles? It's very difficult for those with subnormal color vision to distinguish the red and the green. Thanks.

ssarah · on June 13, 2014

Hmm, the color scheme is hard to change, but maybe I can implement a parallel symbolic scheme to help those users out. Do you have some suggestions I could use?

harvestmoon · on June 13, 2014

Looks very nice, cool. As a new programmer/web app maker, I'm quite curious how this tool might work. It's so fast - and it must be performing a lot of calculations. Anyway, thanks again for sharing!

gukov · on June 13, 2014

Probably a local WHOIS database that's periodically updated.

alixaxel · on June 14, 2014

That's right. We gather our data from the TLD zone files.

We also run Go on the backend. <3

lcnmrn · on June 13, 2014

It would have been useful if it had "English words starting with" and "English words ending with". Or a simple way of filtering sets and using them as subsets.

ssarah · on June 13, 2014

That's a nifty idea. Will think about it. Thanks for the feedback.

randunel · on June 13, 2014

Does not work properly, many domains listed as available, when they're actually not. Would have been useful if it worked properly.

ssarah · on June 13, 2014

Because of performance and privacy concerns, we are not able to resolve a WHOIS request for all domains. Hence, there will be a few false positives. You can check our FAQs for a bit more info on this.

MasterScrat · on June 13, 2014

The concept is good but there are too many false positives.

otan.com/otan.net is available? Yeah right...

aseidl · on June 13, 2014

The data might be coming from the .com/.net/.org zonefiles, where a domain will only show up if it has an NS record configured.

I'm not sure if there's a free and authoritative source for registered domains, short of grep'ing zonefiles and checking whois.

ohashi · on June 14, 2014

There's no other way to bulk lookup quickly besides zone files. Maybe if you are a registrar and even that I am not sure of (depending on how many results you need).

troels · on June 13, 2014

Is "curated sets" something I shuold know what is?

alixaxel · on June 14, 2014

http://namegrep.com/faq/#what-are-sets

troels · on June 14, 2014

Thanks. Great idea actually.

secondhandvape · on June 13, 2014

The question mark in the example regex isn't necessary, in case anyone was wondering.

primo44 · on June 13, 2014

It's necessary. It makes the "s" in "hotels" optional, so that the regex can match either "hotel" or "hotels" and then the rest.

mikejarema · on June 13, 2014

Doesn't it imply a search for hotel OR hotels?