sberman's comments

sberman · on Sept 14, 2020

Thanks! I don't have very much experience with web design, but I did my best.

sberman · on Sept 13, 2020

I'm currently working on a way around this similar to what you suggested.

sberman · on Sept 13, 2020

Some links won't work, and that's unavoidable. I'm working on a way to redirect to another site if the first one is broken. In the mean time, just press the button again for a new link.

encom · on Sept 13, 2020

Nothing appears to be loading. It's not like my browser is trying to connect to something that doesn't exist. Just tried again a bunch of times.

sberman · on Sept 13, 2020

You must click the button each time to request a website. The loading page is just a placeholder, so reloading that page will not bring you anywhere. I'm not sure if that's what you were doing, but hopefully that helps.

sberman · on Sept 13, 2020

Thanks for the feedback! I'm thinking about ways to prevent loading broken websites. I'm not sure it's possible to filter for only a certain type of website though, I think there are way too many sites for that.

indigodaddy · on Sept 14, 2020

I would say 75%+ of all the working sites were parked or expired pages. I would suggest to remove or re-redirect any sites that resolve to known registrar parking page IPs (perhaps only assuming if these IPs are distinct from their webhosting cluster IPs, where actual webhosting customer websites might live). That might be a good start to at least prune a lot of the parked sites.

sberman · on Sept 13, 2020

Yes, I'm thinking about a way to implement that. It's too many domains to filter in advance, but it might be possible to redirect the user to a new site if the current one is dead.

sam_lynx · on Sept 13, 2020

Cool :)

sberman · on Sept 13, 2020

It's true that this doesn't list all the websites that are registered, nor do all the domains lead to a working website. However, I think that most of the invalid websites are not caused by NS entries. As for the Zone File Access Agreement, it prohibits uses that allow the access of a significant portion of the data. An immense amount of time would have to be spent scraping data to get any portion that could be considered significant.

1vuio0pswjnm7 · on Sept 13, 2020

Also, there are alternative, publicly accessible ways to get most of this public zone file data now, so I am not sure that restriction in the access agreement is anything more than an historical artifact at this point.

You could use publicly available scan data for ports 80 and 443 to pare down the list of "websites".

The goal of exposing the non-popular web is worthwhile.

2fast4you · on Sept 14, 2020

You could port scan the entire IPV4 address space(minus all reserved addresses), send a GET request to everyone that responds, filter for valid HTML. It would take no more than 5 hours on a shitty PC, a lot less if you get a small aws instance.

luckylion · on Sept 14, 2020

Most non-major sites are on shared hosting. Without a host name, you won't get anything useful unfortunately.

1vuio0pswjnm7 · on Sept 14, 2020

Most major site are on shared hosting. (Sadly)

sberman · on Sept 13, 2020

Thanks, I appreciate the feedback!

sberman · on Sept 13, 2020

I started this project after doing some research about how DNS works and learning about the CZDS, where any interested individual can request access to DNS zone files. I realized that I could turn this into a website, especially since I couldn't find anything similar on the internet. I used their Python API to download all the zone files, then wrote a Python script to scrape them into one file with only the domain names. I then stored these in a MySQL database on my web server, and used AJAX + PHP to retrieve and redirect to the domain. One thing I think is cool about this is that it gives you a sense of the websites that constitute most of the internet, not just the most popular ones. And unless you've clicked the button over 200 million times, you are almost certainly going to get a website you've never seen before.

sneak · on Sept 14, 2020

It would be cool if, before redirecting, you detected parking pages and internally spun again.

I’ve clicked several times and keep hitting parking sites.

1MachineElf · on Sept 13, 2020

Thank you for this.

sberman · on Sept 13, 2020

I'm glad you like it!

codesuela · on Sept 14, 2020

how do you get access to those CZDS files?

AndyMcConachie · on Sept 14, 2020

https://czds.icann.org/home