+1. Need something like archive.today/.is for Twitter so you can rip and archive the content that might not live elsewhere. Grab it, stick it in Wayback Machine, return a Wayback url.
Reddit doesn't have login walls yet but it has way too much information stored within their walls to not have a backup / non-social-media way of extracting it. It's infeasible to have Reddit blocked because it's UI is intended to be addictive like all social media but also be able to extract information from it.
It’s dying a slow death through neglect. Image posts don’t work correctly, image comments don’t show up, and the dirent comment links generated from www don’t work on old
Still better than logging in, using the new web app or downloading the app.
It's not like 95% of the content is any good anyway. You have to dig deep into a niche to really get any value, and the last few years, less and less. Of course, I haven't logged in for 3-4 years so maybe I'm missing something. Doubt it.
There's also a thing lately where link targets posted on New Reddit (I assume, since mine aren't doing this) are all lower cased, while the link text is correct as you typed. This breaks the links for some sites. In addition to the issues with underscores getting extra slash-escapes.
Some subreddits will impose the 18+ wall because reddit hasn't specifically vetted/approved of the subreddit. So it will be something totally not adult, just a small sub with important information you're looking for and you can't view it anonymously from a browser.
Reddit also now blocks all Mullvad connections that I'm aware of. It's kinda ironic seeing all the scamy YouTube ads promoting using VPNs to watch Netflix in another country when that's never worked and other companies tend to be hostile towards VPN users.
The EFF just recently wrote an article with instructions on how to persevere & archive your own tweets on the Wayback Machine, but it involves exporting your own backup and uploading it to them. Since the API is completely cut off from Twitter, there is no official way to backup other people's accounts.
But archive.today uses scraping and all sorts of tricky methods to bypass paywalls. I honestly don't understand why Nitter can't just stay logged out and rotate IPs. Although I'm sure that gets pricey when other people are accessing it constantly.
If the scraping model is impaired due to aggressive countermeasures, end game are browser extensions that scrape as users view the site and ship scraped data back to a processor, similar to recap the law (uses an extension to scrape the PACER legal database and ship digital artifacts to the Internet Archive). Care will need to be taken around potentially sensitive data that could be shipped if users are logged in.
This model also works well for deep web content archiving.
There was a gaming message board where someone wrote a browser extension that would back up all topics someone visited in the background while they were reading them. It became important for archiving as much content from those forums as possible as the forum was in the process of shutting down.
Oh, that's a very cool project! How successful has it been? If it wasn't for Sci-hub that would be a great idea for the scientific publishing world as well.
> I'm the director of Free Law Project. For the case mentioned in the article we actually did a full expert testimony figuring out roughly how much per page it'd cost to run PACER using AWS GovCloud and a handful of other assumptions. It was...half a ten thousandth of a penny per page