+1. Need something like archive.today/.is for Twitter so you can rip and archive...

madamelic · on Feb 15, 2024

+1 on Reddit as well.

Reddit doesn't have login walls yet but it has way too much information stored within their walls to not have a backup / non-social-media way of extracting it. It's infeasible to have Reddit blocked because it's UI is intended to be addictive like all social media but also be able to extract information from it.

ben_w · on Feb 15, 2024

For the moment, old.reddit.com is sill useful.

For the moment.

pimlottc · on Feb 15, 2024

It’s dying a slow death through neglect. Image posts don’t work correctly, image comments don’t show up, and the dirent comment links generated from www don’t work on old

apercu · on Feb 15, 2024

Still better than logging in, using the new web app or downloading the app.

It's not like 95% of the content is any good anyway. You have to dig deep into a niche to really get any value, and the last few years, less and less. Of course, I haven't logged in for 3-4 years so maybe I'm missing something. Doubt it.

hnick · on Feb 16, 2024

There's also a thing lately where link targets posted on New Reddit (I assume, since mine aren't doing this) are all lower cased, while the link text is correct as you typed. This breaks the links for some sites. In addition to the issues with underscores getting extra slash-escapes.

whstl · on Feb 15, 2024

There are 18+ age walls that just force you to login, often in unnecessary places.

Plus mobile sometimes refuses to show some things.

old.reddit still works though.

MSFT_Edging · on Feb 15, 2024

Some subreddits will impose the 18+ wall because reddit hasn't specifically vetted/approved of the subreddit. So it will be something totally not adult, just a small sub with important information you're looking for and you can't view it anonymously from a browser.

WaxProlix · on Feb 15, 2024

These can be avoided by using old.reddit.com instead of www. I think.

godelski · on Feb 16, 2024

I think this is only true if you're using mobile. If you're on a desktop you can get through just fine. Usually.

godelski · on Feb 16, 2024

Reddit also now blocks all Mullvad connections that I'm aware of. It's kinda ironic seeing all the scamy YouTube ads promoting using VPNs to watch Netflix in another country when that's never worked and other companies tend to be hostile towards VPN users.

rglullis · on Feb 15, 2024

https://academictorrents.com/details/7c0645c94321311bb05bd87...

thaumasiotes · on Feb 15, 2024

> Reddit doesn't have login walls yet

There isn't one on the root directory, but Reddit has plenty of login walls.

dpassens · on Feb 15, 2024

On New Reddit. I haven't seen any on Old Reddit yet.

thaumasiotes · on Feb 16, 2024

There's not a difference. If a subreddit requires you to be logged in, then it requires you to be logged in.

pogue · on Feb 15, 2024

The EFF just recently wrote an article with instructions on how to persevere & archive your own tweets on the Wayback Machine, but it involves exporting your own backup and uploading it to them. Since the API is completely cut off from Twitter, there is no official way to backup other people's accounts.

But archive.today uses scraping and all sorts of tricky methods to bypass paywalls. I honestly don't understand why Nitter can't just stay logged out and rotate IPs. Although I'm sure that gets pricey when other people are accessing it constantly.

https://www.eff.org/deeplinks/2024/01/save-your-twitter-acco...

toomuchtodo · on Feb 15, 2024

If the scraping model is impaired due to aggressive countermeasures, end game are browser extensions that scrape as users view the site and ship scraped data back to a processor, similar to recap the law (uses an extension to scrape the PACER legal database and ship digital artifacts to the Internet Archive). Care will need to be taken around potentially sensitive data that could be shipped if users are logged in.

https://free.law/recap

phone8675309 · on Feb 15, 2024

This model also works well for deep web content archiving.

There was a gaming message board where someone wrote a browser extension that would back up all topics someone visited in the background while they were reading them. It became important for archiving as much content from those forums as possible as the forum was in the process of shutting down.

pogue · on Feb 15, 2024

Oh, that's a very cool project! How successful has it been? If it wasn't for Sci-hub that would be a great idea for the scientific publishing world as well.

toomuchtodo · on Feb 15, 2024

Very successful. Millions of court filings extracted and indexed, and a crucial component in driving down PACER costs.

https://news.ycombinator.com/item?id=24086570

> I'm the director of Free Law Project. For the case mentioned in the article we actually did a full expert testimony figuring out roughly how much per page it'd cost to run PACER using AWS GovCloud and a handful of other assumptions. It was...half a ten thousandth of a penny per page

https://www.courtlistener.com/docket/4214664/52/15/national-...

https://news.ycombinator.com/item?id=24085158

> Government’s PACER Fees Are Too High, Federal Circuit Says

https://news.bloomberglaw.com/white-collar-and-criminal-law/...