Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

+1. Need something like archive.today/.is for Twitter so you can rip and archive the content that might not live elsewhere. Grab it, stick it in Wayback Machine, return a Wayback url.


+1 on Reddit as well.

Reddit doesn't have login walls yet but it has way too much information stored within their walls to not have a backup / non-social-media way of extracting it. It's infeasible to have Reddit blocked because it's UI is intended to be addictive like all social media but also be able to extract information from it.


For the moment, old.reddit.com is sill useful.

For the moment.


It’s dying a slow death through neglect. Image posts don’t work correctly, image comments don’t show up, and the dirent comment links generated from www don’t work on old


Still better than logging in, using the new web app or downloading the app.

It's not like 95% of the content is any good anyway. You have to dig deep into a niche to really get any value, and the last few years, less and less. Of course, I haven't logged in for 3-4 years so maybe I'm missing something. Doubt it.


There's also a thing lately where link targets posted on New Reddit (I assume, since mine aren't doing this) are all lower cased, while the link text is correct as you typed. This breaks the links for some sites. In addition to the issues with underscores getting extra slash-escapes.


There are 18+ age walls that just force you to login, often in unnecessary places.

Plus mobile sometimes refuses to show some things.

old.reddit still works though.


Some subreddits will impose the 18+ wall because reddit hasn't specifically vetted/approved of the subreddit. So it will be something totally not adult, just a small sub with important information you're looking for and you can't view it anonymously from a browser.


These can be avoided by using old.reddit.com instead of www. I think.


I think this is only true if you're using mobile. If you're on a desktop you can get through just fine. Usually.


Reddit also now blocks all Mullvad connections that I'm aware of. It's kinda ironic seeing all the scamy YouTube ads promoting using VPNs to watch Netflix in another country when that's never worked and other companies tend to be hostile towards VPN users.



> Reddit doesn't have login walls yet

There isn't one on the root directory, but Reddit has plenty of login walls.


On New Reddit. I haven't seen any on Old Reddit yet.


There's not a difference. If a subreddit requires you to be logged in, then it requires you to be logged in.


The EFF just recently wrote an article with instructions on how to persevere & archive your own tweets on the Wayback Machine, but it involves exporting your own backup and uploading it to them. Since the API is completely cut off from Twitter, there is no official way to backup other people's accounts.

But archive.today uses scraping and all sorts of tricky methods to bypass paywalls. I honestly don't understand why Nitter can't just stay logged out and rotate IPs. Although I'm sure that gets pricey when other people are accessing it constantly.

https://www.eff.org/deeplinks/2024/01/save-your-twitter-acco...


If the scraping model is impaired due to aggressive countermeasures, end game are browser extensions that scrape as users view the site and ship scraped data back to a processor, similar to recap the law (uses an extension to scrape the PACER legal database and ship digital artifacts to the Internet Archive). Care will need to be taken around potentially sensitive data that could be shipped if users are logged in.

https://free.law/recap


This model also works well for deep web content archiving.

There was a gaming message board where someone wrote a browser extension that would back up all topics someone visited in the background while they were reading them. It became important for archiving as much content from those forums as possible as the forum was in the process of shutting down.


Oh, that's a very cool project! How successful has it been? If it wasn't for Sci-hub that would be a great idea for the scientific publishing world as well.


Very successful. Millions of court filings extracted and indexed, and a crucial component in driving down PACER costs.

https://news.ycombinator.com/item?id=24086570

> I'm the director of Free Law Project. For the case mentioned in the article we actually did a full expert testimony figuring out roughly how much per page it'd cost to run PACER using AWS GovCloud and a handful of other assumptions. It was...half a ten thousandth of a penny per page

https://www.courtlistener.com/docket/4214664/52/15/national-...

https://news.ycombinator.com/item?id=24085158

> Government’s PACER Fees Are Too High, Federal Circuit Says

https://news.bloomberglaw.com/white-collar-and-criminal-law/...




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: