Whoa, instead of just lamenting the shutdown and how talent acquisitions are horrid, and how VC's won't do the right thing and companies like Twitter are killing innovation (you might believe these things to be true; that doesn't change my point), these guys (Jason Scott and the Archive Team and jacquesm) did something about it. And when Jacques couldn't do it himself he organized other people.
My favorite kind of leadership, and an example of the (double-edged) sword of instant communication: people can be rallied around time-sensitive causes, like SOPA or posterous shutting down.
I know this seems a little obvious, but it's striking how rare it actually seems to be. I'm curious why. Maybe it's just my perception and it's happening all the time. Certainly people are doing great things, but I'm curious why we haven't yet seen more specific, directed actions like this. Does it depend on relatively homogeneous communities like Reddit (SOPA) and HN (this)? If there is a proliferation of such communities, say subreddits or otherwise, could we expect this to happen more frequently? Do we want it to happen more frequently, or do we run the risk of, say, DHS running a pro-search campaign like China's 50 cent army? I'm just curious about why this seemed so striking to me.
This also fits in with an article I'd been meaning to read by Fukuyama[0] on social capital written 15 years ago.
The vice of modern democracy is to promote excessive individualism, that is, a preoccupation with one's private life and family, and an unwillingness to engage in public affairs. Americans combated this tendency towards excessive individualism by their propensity for voluntary association, which led them to form groups both trivial and important for all aspects of their lives.
Perhaps we'll see more of these spontaneous actions as voluntary associations are easier to make as the infrastructure that supports them (e.g. reddit) becomes more well known and fine-tuned.
As someone who spends reasonably large amounts of money on digital media buys, I've found that most companies like Twitter (though actually I have no experience with Twitter, except as a user) certainly are a lot more friendly when you have money to spend.
Fun story about Facebook: when I first wanted to start spending decent amounts with them (decent not huge - $10,000s/month) a few months ago I literally could not get in touch with a single person. Even completing contact forms I wasn't hearing back from them. A friend who used to work for an SEO/SM company told me a name of someone to contact, I used LinkedIn's InMail (a paid feature) to message him, and 24 hours later I had 3 account managers (including a technical expert, a media strategist and an overall account manager), who answer calls to their mobiles at any time of day. Now I have a nice route to get answers on any topic, not just paid advertising, thanks to my spending. (Was actually shocked about how hard it was initially to make contact and give them money, too.)
Well I personally wouldn't have found out about this effort if not for this post of yours, so you certainly deserve some fraction of the credit in my book :)
(I have everything up and running according to your short guide and am working on the posterous project, though my contribution will probably limited to under 100GB due to bandwidth cap considerations. Btw, how much disk space does posterous in its entirety take?)
I can assure you that you will not use up very much space nor bandwidth :-) It's mostly text and all. Feel free to stop by the project IRC Channel which is #preposterus on EFNet for specific questions (if they aren't answered in the FAQ or HN thread already)
jacquesm's a member of the ArchiveTeam now by the way ;-) And so are you, since you're helpin'!
Archiving content is one thing but to able to use it again is another.
We are helping users move to tumblr & save their blog. So far moved 500000+ posts (mostly small blogs). But we are not able to help all the users who have multiple images and videos in a post. Currently we support only single image & audio posts. If we can find a way to host their files separately on S3 permanently, then the move would be effortless & many would thank.
There are so many users who don't understand what to do with their backup, for them moving to wordpress is too complex task.
I have reached out to Sachin aggarwal, but yet to hear a positive reply. Tumblr also rejected to host the files of Posterous blogs.
We are ready to collaborate with anyone who can host users file permanently, if needed users can pay directly to you. We were also considering dropbox for hosting files. Moving to new blog platform is a pain and we wanted to minimize as much as possible.
Got all the answers I need here (though I might drop by the channel to say hi). It's actually quite funny to see the traffic graph spike as it makes a successful connection, then lulls for a while as certain pages keep giving 502 responses. I definitely see why you guys need IPs more than bandwidth!
I'd say the reason it doesn't happen more is because it takes a ton of work and people are mostly self-involved. Activism is not what it once was (in the States anyway).
I wish more actions like this would happen, too. I'm glad that they at least happen. The technology to share work is there, and here's work that might be overshadowed at times, the BOINC project: http://boinc.berkeley.edu/ . Use your unused CPU cycles for the greater good.
This is a great effort. But it makes me furious that the founding team can be so disparaging to their users.
Sachin Agarwal, you used this community to enrich yourself and further your own career. In return, at the very least, you owe an explanation for why such a convoluted effort has to be made to get this content off the servers.
It's also exceptionally discourteous to ignore emails from upstanding community members, but perhaps you missed these. But I know for sure that you will read this thread, so I'd love to hear why a database dump can't be provided, or a couple of IPs whitelisted to just rip through a scrape.
How does one opt out? I've migrated my blog to my own system so if I want to make changes to an old post I can, but I won't be able to control my content that you are archiving which is a problem for me. It's actually rather bothersome to me. I figured I'd just go make sure my blog is deleted, but it could be archived by now.
Everything that goes online is kind of online for as long as the Internet Archive and Library of Congress decide it should be. But I started worrying about where my account info might end up (I should really worry about that before signing up for every dang thing...), so I just found this info on deleting one's account:
http://posterous.uservoice.com/knowledgebase/articles/36544-...
Maybe you can get to it before the Wayback Machine does.
Thank you, this method worked and my account is permanently gone now. I still run my blog at another location so I can maintain my content which I prefer to have some level of control over - even if it isn't total.
That is the world wide web. You published it and anyone could save a copy of what you published (right-click, save as) at any time. This is no different surely?
which would replace VBoxHeadless and remove the requirement for screen. You will need another VBoxManage command to setup VNC if you desire to use that.
You're my new hero. I don't use it and I never used geocities but this is still awesome. Historians will thank you in years to come. Sociologists and such will praise what can be mined. And the lists go on...
The problem is that Posterous is hard to crawl. For one; They'll continously and automatedly ban your IPs, even if you rotate over a lot of them. Two: Posterous can't take all of the requests.
We've (ArchiveTeam) unfortunally made Posterous unresponsive multiple times. So please be careful to not completely bring it down if you're doing a solo effort.
Please also bear in mind that it's not just to "chuck it into the downloader"..
Also, please use a sensible format if you're crawling/archiving this.
We're using WARC (Web Archive) which is an official ISO File Format standard - which the Internet Archive's Wayback Machine can use. It's also a pretty good and nice format for archiving web pages in general.
please ask on irc://efnet/#preposterus that's where the archive team guys hang out. I don't have a list of seeds but they may be able to figure out a way in which you can put 80legs to good use.
For those of us who might want to donate cloud computing time but have weak/memory-limited laptops, is there an EC2 image that we could fire up for the cause?
There actually is an EC2 AMI available. I won't mention it here though. I'd rather like that you join us up on #preposterus on EFnet over ol' school IRC.
I'm using caution because we don't want to sink Posterous, since it's a very fragile beast, which we're blowing away the caches off of.
In the long-term web startups will be dead because it will be more and more obvious that it's imposible to trust your data to for-profit companies that simply cannot maintain your interest in mind no matter what promises they make. Everything important should be on open-source, community run, nonprofit platforms.
Thanks for the interested! You're of course very welcome to help out with that. I just wanted to let you know that the source for it, is available at https://github.com/ArchiveTeam/universal-tracker
The VMware Player (and Workstation, I'm assuming) will import it automatically. The only issue is that you have to choose a different connection for the second virtual disk. Not sure why that's a problem but moving it works fine.
Why is it so important to save all this information? Seems like projects like archive are just contributing to more information clutter. We generate more information every single day, what makes a few million peoples blog posts so damn important? We can't just keep saving shit forever, though the progression of technology I guess makes it easier and easier, but eventually we're going to hit a limit.
> We can't just keep saving shit forever, though the progression of technology I guess makes it easier and easier, but eventually we're going to hit a limit.
Yes we can. Through the 'progression of technology' as you put it, I believe we will never come close to any limit. Just look at how much data is stored in DNA, and you'll see that our current data storage technologies are relatively primitive.
How are we to know what's important and not? Surely, there's interesting content available at Posterous. Just to mention an example, CloudFlare's blog is hosted there.
Sure, there's plenty of spam accounts and crappy content - but that might prove worthful in the future. Maybe someone would study what kind of content we as a race were contributing to that kind of platform, in this day and age - maybe someone is researching the automated spam.
This is not really taking up all that much space, in this day and age. There's around 2.2 TB downloaded - it's mostly text and images. That's half a single 4TB drive. Not really storage capacity to fight about in my opinion.
Yeah I guess you're right about the storage piece, however, I don't think it's useful at all. We always live in the moment of "right now is the most important moment in history", when really most of the content we're saving is junk, and, as more and more of it compounds, more and more junk will just accumulate on the pile. I'd assume that 90% of what's in posterous is worthless, the other 10% is just people reiterating good points, but the key word is _re_iterating. Do we really need tens, then hundreds, then thousands of years of files of things people said on personal blogs in the past? Absolutely not.
Ethically this is far better behaviour than those who are shutting down the service and there is altruism rather than profit motivation BUT legally isn't this epic scale copyright infringement of millions of works created by thousands of people?
It's been discussed. Because it was published in a public forum, fair use is certainly a consideration. Is it legal to copy stuff from websites without permission? U.S. courts haven’t made a clear determination. Andy Sellars, a staff attorney at the Citizen Media Law Project, says he would argue that it counts as “fair use” under copyright law. However, he notes that the Archive Team’s torrents don’t offer a mechanism for copyright holders to demand that certain material be taken down, which could hurt its case in a court.http://www.technologyreview.com/featuredstory/426434/fire-in...
I am guessing that if Posterous and the ones who is responsible ultimately, should be giving the data to archive.org directly. This is the only sensible thing to do. Of course, they have to clear copyrights first.
I think I saw a comment somewhere about your IP being banned after an hour... anything we should do to avoid this? I'd hate to be scanning for 15 minutes only to be banned and not be able to help anymore.
The bans nominally last 24 hours. There was a point where there were so many IPs running (from AWS servers) they overflowed the ban list and the bans were shorter!
We've had a few guys using Amazon Web Services and continously rotate IP's/set up new instances - unfortunally, the last time we went too 'hard' on them - effectively making Posterous unavailable.
We're thinking about this right now. Feel free to hang around #preposterus on EFNet for updates.
I got banned after doing about 4.7 G, if that is done a few thousand times because of this article it would make a huge difference. And they eventually would have to lift the bans.
Awesome! Love the spirit of the effort, running a Warrior now for kicks. Just curious, isn't it possible for someone at Google to press a button and make this happen? :)
Are you guys archiving photos / images too? When I just wget the site those don't generally come down. I don't know if they're being loaded via JavaScript or a plugin.
To preserve my own blog I just saved it as a pdf with wkhtmltopdf, but it'd be nice to have a full HTML version.
This event, and many like it, only remind me that the modern Internet is about which groups you belong to, not who you are as an individual.
Websites owned and operated by individuals are now vanishingly rare, while aggregations of people -- Facebook, Twitter, et. al. -- have become the norm.
Something like this would be cool as a chrome app. I'm not sure how complicated the job is, but if it's just a matter of hitting apis and queueing data for upload, it should be possible with local storage. You could even use subdomains to defeat storage limits.
"I made an offer to continue to host posterous.com and all the stuff on it but never received an answer."
Have you tried making a public appeal (say, over twitter) to the Posterous owners? They may be able to talk with Twitter and arrange to capture the content directly
We're rate limiting how many items/users gets handed out. Because Posterous is very fragile, and we're blowing away their front end caches - which they rely on heavily.
Because of this, we practically hit the back end every time (as well as other users of Posterous, because we blow the cache away) - which makes Posterous very slow.
We've ran a few hundered more threads earlier, successfully making Posterous completely unavailable unfortunally.
That 'crummy broadband' is very much welcome, and so are you.
In this effort, bandwidth in itself is not that important. Feel free to read some of my other posts in this thread regarding bandwidth and/or come join us at IRC (#preposterus at EFNet)
Stop being a necromancer and resurrecting a dead service. There's got to be a good reason why they killed Posterous, so let it die. Let it go; stop holding onto the past.
Please read the intro to the blog post and think again. I've seen these comments by the boatload when we took on saving geocities and you are simply wrong.
Is there a place I can find this Geocities archive? I apparently missed something I wanted to keep when I archived my own stuff, and I'm wondering if I can find it again.
This isn't resurrecting a dead service nor is it about that. This is archiving it and making the previously public data stay public.
Why would someone do that?
Well, a lot of people have poured their hearts out and made content that lives on Posterous. They might miss the "sunsetting" (asshole term) of the service and lose their content.
Think of all the dead links that'll be around after the service have died. Wouldn't you be able to read something great that was linked from HN a year ago? From the Wayback Machine or similar?
There's plenty of reasons to archive the web and the content that goes up (and down).
I've never visited posterous.com until today, but I fail to see why Twitter would buy it and kill it.. can't be that horrific. It's like buying a house and setting it on fire.
What am I missing? Are they just killing off competition?
They probably bought it for the people. So there would be no one left to work on the Posterous site if everyone is working on Twitter instead. I know they didn't buy it for the infrastructure since it's hosted on Rackspace and Twitter is an AWS shop.
My favorite kind of leadership, and an example of the (double-edged) sword of instant communication: people can be rallied around time-sensitive causes, like SOPA or posterous shutting down.
I know this seems a little obvious, but it's striking how rare it actually seems to be. I'm curious why. Maybe it's just my perception and it's happening all the time. Certainly people are doing great things, but I'm curious why we haven't yet seen more specific, directed actions like this. Does it depend on relatively homogeneous communities like Reddit (SOPA) and HN (this)? If there is a proliferation of such communities, say subreddits or otherwise, could we expect this to happen more frequently? Do we want it to happen more frequently, or do we run the risk of, say, DHS running a pro-search campaign like China's 50 cent army? I'm just curious about why this seemed so striking to me.
This also fits in with an article I'd been meaning to read by Fukuyama[0] on social capital written 15 years ago.
The vice of modern democracy is to promote excessive individualism, that is, a preoccupation with one's private life and family, and an unwillingness to engage in public affairs. Americans combated this tendency towards excessive individualism by their propensity for voluntary association, which led them to form groups both trivial and important for all aspects of their lives.
Perhaps we'll see more of these spontaneous actions as voluntary associations are easier to make as the infrastructure that supports them (e.g. reddit) becomes more well known and fine-tuned.
[0] http://www.imf.org/external/pubs/ft/seminar/1999/reforms/fuk...