It makes no sense to not let the site online in an archived form, for the posterity
It wouldn't even be a significant cost and ads would recoup it anyway
There should be a way to donate a website to the Internet Archive so that they run an online archive on it, basically keeping the site frozen forever (rather than relying on the Wayback Machine which has worse UX)
> It wouldn't even be a significant cost and ads would recoup it anyway
I 100% don't get it when websites with lots of content and good SEO just delete everything. Fatwallet (fuck you Rakuten). Yahoo Answers (hey, I didn't say good content).
Even on my own Drupal blogs that I didn't want to maintain/update anymore, I did a giant curl job, recursively regex'd out the login/comment submission fields and dumped them on s3. Voila. And I'm someone that would take 9 attempts to do FizzBuzz.
A bookmark you had to it 20 years ago still works and I still get some Adsense cheque residuals from it for zero effort.
I did some work for a design agency that got bought by Twitter.... we basically converted all their different sites to full static and put them on ice. It would have been expensive to keep hosting how they had been, but with everything flattened to static it's super cheap to keep up (although no publishing new stuff without some by hand work or restoring the old servers).
All the good articles and design samples stay online, and maybe the original authors can get a little credit or name recognition from any search traffic that finds it.
The one "liability" to doing this would be for visitors to not realize the site isn't being updated anymore. If vice is still publishing content outside their website they'd have to make it clear that the site is archived and new content is elsewhere, or people might get the wrong idea and assume nothing new is coming.
Could the internet archive offer this as a paid service?
For a reasonable fee, we'll archive your site, and give you back a copy of the assets you can turnkey host on the original domain for cheap with a static hosting solution (s3, cloudflare, etc).
I've been wondering this too. I've even been considering authoring a web standard to allow hosts to specify how their pages can be archived in a standard way (e.g. which scripts to include, etc.) and then pitch the IA to offer a "pay $X to archive this data forever" deal to the universe.
I'm really curious what the cost per byte would be to make it worthwhile to offer a "host this byte forever, for one up-front fee" service.
> At this time we have no fees for uploading and preserving materials. We estimate that permanent storage costs us approximately $2.00US per gigabyte. While there are no fees we always appreciate donations to offset these costs.
There's some discussion about this idea on this thread, including comments by ?id=markjgraham, who manages the Wayback Machine, thoughts from John Carmack:
No no, the exact opposite. If this bundle of content is so valuable, then someone can make a business out of buying it. Vice could go to WeBuyOldIntellectualAssets.com and get a flat price for it all, and that company would host it or do whatever with it.
The same thing happens with brands - someone bought the Montgomery Ward brand at a bankruptcy auction or something - and with store inventory: once the store goes bankrupt, they just sell the entire store contents, right down to the fixtures, to a liquidator who brings in the "Going out of business! Everything must go!" signs.
That's too attractive to malware peddlers. It's not particularly widespread currently, mainly because most of the content has been centralized into the same big silos... But what you're envisioning here is just going to get abused by abusers
Not just brands, but software too. The primary software I support at my day job was acquired by a company that, based on their other assets, can be described as where software goes to die.
We’re migrating away but they’ll squeeze out what they can from those that don’t/can’t/won’t.
But they still have to provide continuous support, some amount of updates to keep customers functioning, and maybe even get some new customers as a “value” option (that’s barely functional).
Happens to forums all the time (fuck you Internet Brands and Vertical Scope).
100000x easier to do all this with static web content.
However, the footer of that website says 2014 and the about page is broken, so not sure if it's still supported.
Also, Cloudflare has a partnership with Web Archive and they offer something similar, but I think it's only made for temporary outages and only archives the most popular pages on your site
Could they (or a for-profit company) bid on it? Do liquidators in the US have to consider any offer, even if it was unsolicited? Does it vary from state to state?
That works too, but there is something to be said about a turn key solution friendly to corporations who are willing to just throw some money to make a problem go away. Plus archive will get a bit of extra money for the Wayback machine!
Just pay some donation, redirect your DNS to the way back machine and bingo.
Note that Archive Team and the Internet Archive are separate, unaffiliated entities, though they do often work together.
Archive Team is a loosely organised group of individual volunteers that share a common interest in Internet preservation, and develop tools and share notes to serve that goal. They're basically one of your old-school Mediawiki communities, with very little budget:
Internet Archive is a full-blown multimillion dollar `501(c)(3)` nonprofit, which functions as more of a general-purpose library. They maintain physical offices and datacentres in multiple countries, host many petabytes of data, do activism, run conferences, and when they develop custom tools it tends to be somewhat more advanced than the Archive Team's decentralized web scrapers, like custom book scanning hardware:
A lot of the information in the Wayback Machine, which is run by the Internet Archive, was saved and contributed by Archive Team. For example, as of writing this comment, that is true of the latest snapshot of `https://www.vice.com/en`. You can see this with the "About this capture" button on a Wayback Machine capture.
Both groups have ways to receive monetary donations.
For Archive Team though, I wonder if it would be more useful to donate compute by running their Warrior archiving VM/container, or contributing code to their GitHub:
I think the issue is for the IA that isn't lucrative enough to make it worth there time. Someone already did it for them for free, even if it wasn't 100% as good as they could have done it.
> we basically converted all their different sites to full static and put them on ice. It would have been expensive to keep hosting how they had been, but with everything flattened to static it's super cheap to keep up
I got absolutely tarred and feathered at an agency for suggesting this strategy and I just want to say thank you for validating what I wanted to do.
I had a proof of concept for us to migrate sites to a static / low / no cost hosting option.
They were unwilling to spend time "training" the team to learn react (in 2020).
They were unwilling to let their senior FE dev spend any time with me to correct the CSS issues I was struggling with.
They used this deception and dishonesty to say "it didn't work" and wasn't worth any more time. I build a prototype in a week. It's not like I spent month(s) on it without any ROI. They just wouldn't look at it because that would mean acknowledging I and or / my ideas had value.
The closest thing I got to an answer is that the CMS they preferred, which was chosen 10 years ago by people no longer working there... was the only way they could support client sites. Because that's what they've been using. Turnover means it's so hard for us to support anything new because we have no time...
Basically they Brawndo'd me.
It was hard to stomach getting fired by the incompetent people driving the business into the ground when I was literally pleading with them to implement money saving measures.
The horse sometimes would rather kill itself than drink the clean water you've found... that's just life. It's hard to accept.
It was not an issue of refining the content on one site, you make a good point though.
It was more of an issue that they had a bunch of clients on an old CMS system, and they did not want to make any changes the way that the sites were built or hosted.
I can make arguments for and against either side of this idea, it all depends how you want to run your business.
Only skip-a-heartbeat moment was when aws sent me an email saying that you I have “one or more S3 buckets that allow read or write access from any user on the Internet”
But none of my containers had write access. All of them had public read, but yeah, it’s a website and they know this: their own route53 DNS points to the containers.
They just sent the same generic mass email to everyone with any public container.
If you don’t have some form of automated throttling, couldn’t that still become costly if a popular webpage started pulling resources from that bucket?
If so, their warning could have been phrased better, but isn’t incorrect.
If they would want to monetise their content in the future, they must keep it private before it's gobbled up by search engines and AI and becomes public domain.
I don't understand why publishers take down Kindle books when the paper book goes out of print. It happened to one of my favorite scifi novels.
It takes zero effort to keep the book available (I know, I self-published a silly little one), and zero effort to include it in your accounting as long as there's a data feed and a computer.
It probably has to do with publishing rights. Authors may not want to allow digital publication without an actual print run, and once the initial print run ends they lose the digital rights unless they do another print run. Or the digital publication rights may only be negotiated for a fixed period of time, or else require an ongoing fixed payment to retain, so it costs the publisher money to continue offering a digital version that they may not recoup without sufficient sales.
Halting digital sales might be necessary to declare a write-off and recoup a tax benefit that year. That was happening to some streaming shows, anyway.
I read one of his later works and it was horrid, but Golden Age was incredible. The first several chapters were hard to get into, I quit several times and the people I loaned it to never got past that. But after that it takes off like a rocket. On a reread I found that the difficulty at first was just from so much being unfamiliar.
It's in a distant future with superintelligent AI, immortality, physical abundance and pervasive virtual reality. And in that setting he finds a deeply human tale of epic heroism.
By having a minimum amount of foresight and putting into the initial contract an agreement which lets the parties maintain the online availability in a mutualy agreeable way.
Why do you think it's a problem of foresight and not simple motivation? Perhaps none of the parties cares about the availability of a book that doesn't make enough money to stay in print. Perhaps having the book become unavailable for some time is perceived as a benefit to the rights holder, allowing them to do a re-launch.
It isn’t about the effort. This is the same thing people say about out of print video games, too.
It isn’t about difficulty, it is about incentives. They want you to buy new books and video games, and if you are reading/playing old free ones, that is reduced demand for what they are selling. Why would they want to help you satisfy your need for free?
It can be also that some decision makers feel it is too much hassle. Or they don’t even know it is an option.
For you and me doing this would be probably an afternoon’s work? (Maybe a bit more, maybe a bit less)
For someone less technologically inclined it could be seen as a big project. They need to find someone capable of doing it, they need to ask for a quote, they need to supervise the project otherwise the contractor doing it might just do a half assed job or none at all.
Not to mention they can only think about doing this if they have an inkling that it is possible. They might be operating in a mindspace where “maintaining the servers” is a large monthly expense. For example if years ago they were sold a CMS with all the bells and wistles for some $bigbucks recurring cost. If they are savvy business people they might have done their research to figure out if “this” can be done cheaper or not. But they might not realise that it is possible to change the requirements such to achieve a massive reduction in cost. This is especially true if they treat the cost of servers as a kind of black box.
And very often the people who are providing them with IT services are not incentivised to tell about this option. Will they tell the business owner that oh by the by for half the monthly recurring cost they are paid the business could find someone who puts the page on ice and for the other half runs it for the next decade? Of course not! Doing so would under cut their income stream. That would be crazy. In fact they might spread all kind of FUD and sabotage attempts at scraping the site.
I firmly believe that if they think their only option is to shut down, they aren't fit to do their job. If they don't understand how the internet works they should let someone who does do the work.
If you are a business built on top of the web you need someone who's tech savvy in-house.
Maybe they deserve their faith. But it's sad that the next generation will miss this insightful content because they gave up.
The ChatGPTs of this world will solve that. I like to believe that I know a couple of things about a couple of things regarding technology and sometimes I ask ChatGPT or Gemini "how can I do so and so, list/name five pieces of software or technical solutions to do so-and-so".
I use it/them as a search engine on steroids. Maybe it is time more people also do so.
If you're getting residual revenue on a website, at some point someone's going to figure they could get a bit more residual revenue by adding some ad scripts, and pretty soon you've got an entire stack for serving ads that needs maintenance and ROI.
I just let Google do that. Sure it's not max possible value, but return on effort is pretty good. Advertisers can buy through google and target your site. If you get enough traffic, you can probably cut a better deal (ok, that costs negotiating time, but if the ROI is there...)
Those Yahoo Groups could be a trove of niche, otherwise uncollected information, especially with regard to vintage or specialty electronics. Removing them was a huge loss.
> did a giant curl job, recursively regex'd out the login/comment submission fields and dumped them on s3
it's because even this cost outweighs how much they care about the content, which is 0... the people who make decisions like this aren't scrimping pennies or interested in preserving effort... they're looking for the simplest way to get millions of dollars into their pockets
doing the bare minimum to maintain a library of content indefinitely isn't it
these are the kind of people who would happily set fire to a library if they could get away with insurance fraud
This is why I post on slashdot. They've passed the test of time (but not UIs, fuck beta). Looks like their first posts in '97 start here: https://slashdot.org/?page=8582 dunno what their december 31st, 1969 posts are after that (errors? intentional de-ranking?)
Newspapers have a bad history of "experimenting" with enabling online comments and then deciding the experiment failed and delete them all. You're a newspaper, you're not supposed to delete history when you don't like it!!!
And then they complain when people use social media as their newspaper.
It's very depressing that all the comment sections from the late 2000s to mid 2010s are nowhere to be found. Also a lot of live journal type sites. Comment sections seem omitted from Internet Archive snapshots, but I find them in many ways more worthy of archival than the published articles that make the cut.
> That type of data would be so interesting for things like historical sentiment analysis
Except that internet commenters are very weird, and like the least representative sample ever. Not quite as weird as Wikipedia editors, but still really weird.
They aren't representative of the general public. This can still make it very interesting though. Do trends show up earlier among commentators? If so, has the time it takes for the trends to flow to the mainstream changed over time. Has the likelihood at which online commentator trends flow to the mainstream changed? It's the influence more pronounced for specific subjects?
It is very depressing, but on the other hand you'd have millions of comments written in another era (pre-culture wars, when the Internet was more, let's say, "tolerant") that can be now traced back to the authors to cancel them. With infinite memory you need protection, otherwise it's un-erasable damnation.
The last time I looked at Slashdot comments (2021, give or take) they were low-effort trolls, racist/sexist, or just gibberish. Has moderation improved there or is it still a cesspool?
I scraped a sample of their posts ten years ago and ran a regression on user activity by ID. They have zero significant growth in user base other than mobile users posting more as AC. There was a small core of older active 5-6 digit UIDs doing the bulk of the posting and that was shrinking toward zero around now. Slashdot will die in the near future even if Netcraft can't confirm.
As one of those 6-digit UID posters, /. has been dead for quite some time. Discussions barely breach 50 comments or so now days, for the most part. The firehose sucks. The 'editors' constantly post dupes and the left hand seems to have no clue what the right hand is doing.
Unixtime is the number of seconds since 1/1/1970. Subtract a few hours for timezone post processing and you get 12/31/1969 as a date. Indicated time zero or null or missing value trying to get formatted as a date.
I wish I was joking, but reducing the amount of stuff that is owned/managed for the sake of it is a common philosophy. Another way to put it: they're focusing.
We think it would have been extremely low effort to keep a static site running, they probably thought not having to think about it at all was worth the loss.
It takes continuous effort/money to keep metaphorical company lights on. There is things you need to file periodically, legalities you need to comply with, also when they change in the future, just to exist.
So if the website shutdown coincides with the shutdown of a company or a division within the company, that might be why. And since a website will usually not shut down if it's turning a considerable profit to begin with, just deleting everything can often look like the best option.
Ever more bummed about Fatwallet as Slickdeals continues to worsen with endless "sponsored deals" and censorship to force people to use their cashback and price tracking products.
Nothing in that post indicates they are taking it offline, the HN headline is just editorializing. It says Vice is no longer making new content for their Web properties and focusing on other platforms instead like YouTube.
There's no way they'd give up the ads dollars they get from he existing stuff... That makes no sense
It's not that expensive to host old blog posts and they already host videos on YouTube... What's expensive in those operations is supporting new content and growth. Now they can wind it down and establish a fixed legacy system and eventually run it on autopilot with a small team in support roles.
And they'll keep updating contracts to sell ads for a defunct site? Seems doubtful. Past experience shows that the site is unlikely to stay up for the long haul.
You don't need to sell ads directly to Nike to make more than enough $$ to incentivize running an existing major content site, let alone pay for a small team to sufficently keep it running tech wise...
People just try and hack it constantly - as in, hundreds of automated hacking attempts per day, and when they succeed, they won't make obvious changes, they'll tweak things gently in a malignant way that won't be noticed for some time.
It's really not that complicated to manage cloudflare type Web app firewalls and shutdown content interfaces, both comment sections and admin panels, so there's no malleable auth areas to breach. And even if that happens a small team could easily handle run of the mill script kiddies and SEO schemes.
I gave up a domain I used to own in the last year or so. It was a mildly popular site that had been around for about a decade.
Within a few months someone else had snapped it up, created a site, used the same content (I assume from the Internet archive), and was using it to link farm.
At least that's what I thought when I saw the new site with a few links that were not there before.
I only discovered it because they also left the Gmail address that I was using for the site. So I got a couple of emails from folks wanting to update info on the site.
Feels a bit weird, tbh.
No way to determine who is behind it, and what would I do anyway?
It happened to the state of Maryland, too: their domain registration for starspangled200.org, featured on their license plates, expired and was picked up by a gambling website in the Philippines. Maryland has since reclaimed the domain.
I had a domain in early smartphone days for a mobile app that served up some Canadian Broadcasting Corporation content. I eventually abandoned the app/domain and the CBC snapped it up - so that's about the best-case scenario.
> It makes no sense to not let the site online in an archived form, for the posterity
>It wouldn't even be a significant cost and ads would recoup it anyway
Just so you know... investors would probably think it's just a waste of time, which means a waste of money as they (or their financial handlers) would have to keep an eye on this website. Remember, if they think this website is going to make them under $XX an hour they will just nuke it, they are not attached emotionally or otherwise to this "product", exception being shareholder founders but e.g. Steve Jobs.
I'm in the "this decision is stupid" camp, but just playing devil's advocate here, they might not want to risk damaging their brand by letting some third party agency mishandle their website
Yeah depends if the Vice brand carries on. If it does then this needs to be managed which gets rid of the “brush soot off hands; not my problem” advantage.
Do investors know that they exist? I feel like this is something that will only happen if the archive-er actively make contact and persuade the current website owner that it's worth it to make the deal.
I’m not sure your premise is correct. This is important here.
You start off by saying: it costs nothing to keep this site up. No downside.
But there may be some downside from an SEO perspective, which the minuscule ad revenue would not offset.
Even brands with good SEO, with pages which are making money, take some of them down occasionally. They idea is that you want to refocus all your SEO on the pages that matter most - which means pulling others.
Here, you may want all the SEO “juice” to go to your videos, to push the maximum number of users there. If some of them are getting diverted to the website, and that pays less per click, you may in fact be losing money on it vs. having no website.
Site wipes also disadvantage job-hunting staffers who suddenly don't have a public published portfolio to point potential new employers / clients to. Though enterprising staffers may anticipate such moves and archive content as it's published to head off the inevitable.
(I've seen this issue raised on previous site shutdowns, quite probably Gawker.)
If you care, you should be saving your own stuff in some form. If I counted on people saving what I've written in findable form I'd have very little left. As it is, I have most things I care about.
For public-facing journalism, a public-facing record (Internet Archive, Archive Today, etc.) provides some third-party credibility that what you say you wrote and was published online actually was written when and as claimed.
Keeping your own personal clippings file preserves words but not provenance.
Original photography sometimes has tricky licensing — photos are good for only 5 years, then have to be renewed. That’s what photo editors at these magazines do. The text content itself is usually a one-and-done deal though.
I’d literally love that idea; it’s a really compelling concept but my first concern would be the potential risk licensing/copyright issues that could arise from the original site owners transferring the site to IA. But seriously that would be so cool to see one day.
I came across this thread on DataHoarders before seeing it here and was glad to see so many jump at the opportunity to help archive it on our own ends. In case anyone may be interested to jump in, here’s the link to that thread.
https://www.reddit.com/r/DataHoarder/s/fnY46CuYOq
It wasn't in this post, but it was a separate rumor that the site would be deleted. This sounds crazy, but is exactly what other digital publications have done recently.
There can be licensing issues, stuff breaks, stuff gets hacked, most content just gets really stale...
I was involved with some sites recently that stopped publishing new content. The plan is to keep most of it around for now. But I have no illusions that if it becomes a "project" for some reason or another in a few years, it will just be turned off.
Saying “we will no longer publish content” could very well mean they are taking down the site; thus it is no longer published
I think it’s very likely that this is what’s going to happen because even a static archive site is going to cost money and dilute their new social brand
Perhaps they’ll repost their old stuff on their new channels but it will nevertheless suck for old timers like me that refuse to use social media
"There should be a way to donate a website to the Internet Archive so that they run an online archive on it, basically"
Can't you?
I thought everyone with an account could upload. If it'd stay, is another question and they'd definitely not archive against the expressed will of copyright holder, but you could give it a try.
Web archiving is a solved problem, you record the website in an interactive environment [1], and everything what happens on-screen will be saved in a single file in an open [2] and standardized [3] file format authored by Internet Archive and endorsed by Library of Congress for preservation [4].
You can store the resulting WARC file wherever, be it on S3 or under your pillow.
As an archivist, I urge everybody here to not reinvent the wheel, please..
The technical side is a solved problem, the legal side not so much.
You practically cannot preserve and make available something if the copyright holders don't want you to. If the copyright situation is complicated you bear the risk.
You can say, that this is how it is supposed to be, but it is not like it works in the non-digital realm. You could argue that we'd need something like digital monument protection, where artifacts can be preserved against the copyright holders will.
For example a site that I liked called The Outline stopped publishing content in 2020 and they leave the site online at least for now https://theoutline.com/
(One of the original devs for The Outline here) so happy that it is able to live on, frozen in time, although missing a lot of the features that made it cool.
A lot of the cool things we were doing at the outline were on the CMS side of things, its a shame we were never able to share it
I’ve worked in publishing (engineering and product side mostly) for a long time; reasons not to keep it up:
* Potentially expensive licensing fees for the CMS, depending what it is, and migration costs to something “free” if not.
* Ongoing costs to maintain the site (security updates etc).
* Of course, hosting costs.
I agree in this case it would be smarter to keep it running in a stripped-down form since there’s still value there, then work to find buyers or someone who has an interest in keeping it running.
While the site isn't being shut down, who is held responsible for the old content. What if new regulations are introduced? The site is most likely dynamic, so would at least require a full scrape of all content, and then placing all that content somewhere online (and that probably includes large images and video). Keeping the site going is probably not worth it for the ones on top that are set to be the only ones making profit from it.
"If it makes no sense, there's someone mean behind it."
No idea about this case specifically, but in general this is an assumption that only hurts you in most cases. People have motives and people want to be good. If their "good" is your good is a different question - a question that is often worth figuring out.
“and people want to be good.”
I’d even push back on this a bit. Many people do not actually care about being good. A small but significant minority are psychopaths.
But more likely, too your last point, lots of people have a distorted sense of good, in that they can’t separate it from their own self-interest. What is good for them will be justified or reframed to be ’good,’ and this is often unconscious.
In instances like this, I find it pays to reserve a little energy toward being open minded and holding a 'trust but verify'. Specifically tho, trusting yourself and past lessons, but verifying, as opposed to trusting others.
People are complicated, and negative interactions do tend to have an outsized influence on molding our behavior
>It makes no sense to not let the site online in an archived form, for the posterity
So do you believe this is an action taken out of spite, or that there may be circumstances that a bunch of hackers aren't taking into account?
I'm genuinely curious, because I don't believe it's spite, but can't think of the reason. Outside of some long-tail potential liability for which the value of maintaining the site doesn't overcome.
> It makes no sense to not let the site online in an archived form, for the posterity
I wanted to say, that Dr. Dobb's did that. https://drdobbs.com/ But some time in the last few years something mus have gone broken. You can't open the article links anymore.
>But some time in the last few years something mus have gone broken.
Which tends to be what happens. Something breaks. No one can be bothered to fix it. And at some point they don't renew a domain and take the content offline.
It wouldn't even be a significant cost and ads would recoup it anyway
You'd need a corp to deal with the money, people to handle things like takedown requests and expired copyrights, maybe some IT functions to keep things up when suppliers change stuff. It wouldn't be that straightforward.
University libraries can serve a function here. They'll often act as archives for local newspapers and outlets. If a publication is shutting down they should definitely work to transfer their site to some organization that will preserve it.
Business people, who make these decisions, aren’t exactly known for being intelligent or asking intelligent questions of technical people who work for them.
That’s a very broad brush you’re using for a patronising take there.
Technical people, who look at these decisions from across the Internet, aren’t exactly known for their humility, or assuming that non business people may have perfectly valid reasons for their decisions.
It wouldn't even be a significant cost and ads would recoup it anyway
There should be a way to donate a website to the Internet Archive so that they run an online archive on it, basically keeping the site frozen forever (rather than relying on the Wayback Machine which has worse UX)