Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Effort to clone unmaintained SourceForge projects to GitHub (a-sf-mirror.github.io)
146 points by hydragit on June 27, 2015 | hide | past | favorite | 82 comments



Why Github? Copying from one commercial provider to another doesn't solve the fundamental problem. Using git helps, but most of those old repos will never get cloned.

In 10 years time, Github may be the tired old service that gets acquired by a hedge fund that decides to monetize their repos. Such things are part of the corporate lifecycle.


> In 10 years time, Github may be the tired old service that gets acquired by a hedge fund that decides to monetize their repos. Such things are part of the corporate lifecycle.

So fix it in 10 years. Git makes that easy.

Point me to a good alternative to github that matches all your ideals. A free alternative to github - free as in beer, unless you're willing to fund this effort yourself, of course?

We migrated one of our projects from Sourceforge to Github, and all the stallmen came out of their rock to tell us how Github is evil, how Savannah is the only true alternative, pah. "Absolute freedom of software" is nice but it's not the only requirement. Savannah has the usability of a rusty wrench and will probably shut down without warning long before Github "turns evil".

Some people are just so far detached from reality when suggesting that stuff isn't perfect. Github is pretty damn amazing. If you want to use foss alternative like Gitlab, more power to you, but that doesn't make them ideal in every situation.


> Point me to a good alternative to github that matches all your ideals. A free alternative to github - free as in beer, unless you're willing to fund this effort yourself, of course?

Gitlab? Savannah (of the nongnu.org variety)? Bitbucket? Gitorious? A basic VPS with SSH and git?


The source for Bitbucket is no more available than that for GitHub.

http://atlassian.bitbucket.org/


The only explicitly-stated condition was "free as in beer", which BitBucket is for even private repos. BitBucket also has a self-hosted equivalent, though said equivalent is not, alas, free-as-in-beer.

The implicit free-as-in-speech condition is adequately fulfilled by other alternatives, like (as I mentioned) Gitlab and (IIRC) GNU Savannah (along with - again - just installing git on an SSH-able server).


At GitLab you are very welcome. Please let me know if you have any questions or concerns.


Props guys, I think you do an incredible job. GitLab is in fact a reasonable alternative, I should have mentioned it in my post.

The reason we went to Github rather than Sourceforge is because of the community. Nethertheless, I think it's fairly foolish to focus on the platform (in the case here of archiving) - it's just a host. Stuff goes on domain 1 instead of domain 2. In either case it's still open source, the archives are all there, git is decentralized and perfect for the job.


Contradictory signals in your rationale here. It comes off as if you're filling in a post-hoc justification.

Is it just an archive or isn't it? If it is, what's so important about the community, then?

It's fine that you chose what you did, of course. It's just that that in your defending it, you can't seem to decide whether you want to have your cake or eat it.

If you wanted, this would have been a perfectly fine reason to give: "We went with GitHub because it's just what we use mostly, and other stuff not so much."


Who fucking cares? He's having it both ways and you can't stop it? Your tone is offensive. Relax.


You're very confused. I don't work with the Archive Team, I was offering my experience regarding our project's move move from Sourceforge to Github and why GGGGGGGGP's (or something...) comment was way off base.


I originally wrote my comment[1] allowing for the idea that you were an uninvolved bystander, until a reread of your comment strongly suggested that you were part of the project being discussed. I suppose I got confused when you began talking about "we" and "our project".

1. The meat of which remains: if it's just an archive and the host is only being used as dumb storage, how is the "community" aspect of it a plus?


You're right, I shouldn't have used "We" in the original reply. This was my bad - my english still has rough edges.


No problem, that for your kind comment. And feel free to update your post :)


I would, but edits are time-limited on HN.


Ahh, good point, no problem. <3


[flagged]


Thanks? You completely missed the point of my post if you are taking this as an assault instead of criticism of the alternatives.

So I'm going to repeat myself here:

- Savannah is shit. Zero UX effort. Atrocious home page, branding, atrocious everything. - Unfunded efforts are unreliable. Get a grip on reality, man. This isn't afound foss or proprietary or whatever - it's about having a business model. - If you think it's OK to recommend shit products as alternative to good products because of ideology, you are a detriment to the cause you claim to promote.

BTW, I'm the LXQt project lead and a huge proponent of free software. Some advice next time you attack someone, look them up.

Oh and here's another piece of advice: Don't attack people.


All of your criticisms of Savannah seem oriented around aesthetics rather than its actual functionality. Personally I don't need much eye-candy for a git host because 99% of my work is done from the command line, but even if I cared about that I'd still pick Savannah over SF any day of the week.


I thought this had been resolved years ago, but aesthetics aren't about eye candy. They are about productive and pleasant experiences.

As a tangible example, imagine yourself working in a dank, contaminated 6' diameter sewer pipe with spotty pirated electricity, vs. a well-designed, large interior space with plenty of natural light and reliable utilities. Regardless of how irrelevant appearances are to you, I'd wager you'd get more work done in the second environment than in the first.

To the point of Savannah vs. SF, I'm inclined to agree. I have projects from the early 2000s still on SF and I need to move them off.


That's a terrible example and it doesn't address my point at all.


It does actually. UX, aesthetics, all of it is important and stuffing your head in the sand saying "Hey, I don't care about aesthetics, I just use the command line" is a very irresponsive attitude.


That's not what I'm saying at all. I'm saying that Savannah's website being ugly is not a reason to say "it's no better than SF", because it's still miles better.


I'm simultaneously honored to have been flamed by the lead of a project I'm interested in, and saddened to learn that the best source of attacks against us will be words our project leaders used first.

At the same time, if we're the ones generating our own bad press, we at least control the overall narrative. I guess that's a net win.


To answer your comment and dbbolton's simultaneously: My criticism of Savannah is based on aesthetics, but also on their model. Savannah has virtually zero funding.

What do you think will happen, exactly, when an unfunded host can't keep running? Maybe their gnu website will receive some donation from Facebook or Google or something, but there is no incentive to keep the nongnu part running. It's just volunteer work.

That model can work (cf wikipedia), but it's extremely rare, requires a lot of effort, a lot of bootstrapping and a lot of luck. It also requires $1 to pay for more than itself, so there are extra requirements regarding efficiency of spending, and diminishing returns of donations.

Gitlab has a good model. In fact, I'd be curious to see what their CEO makes of what I'm saying here - I always find him scouring HN comments about github, git, hosting, sourceforge, all of these. This is the sort of effort you have to put into to get your product out there into the ears and minds of people.

As I said earlier, yes there is a risk Github will turn evil. If they do, there will be warning signs (there have been warning signs about SF for 5 years). If such is the case, there will be alternatives by then (ever heard "if there aren't, I'll build one and get rich"?).

If Savannah runs out of money and has to turn off the lights, how much of a warning sign will there be? A few weeks maybe? Maybe there just won't be any. And this is a very real situation, which you absolutely need to be ready to confront. Putting your fingers in your ears and calling everything "an assault" won't help.

> if we're the ones generating our own bad press, we at least control the overall narrative

It's a very bad sign that you care so much about the "overall narrative". It's reminiscent of your approach to flag criticism as assault again. You care about what people say, not why they say it.

I'm not paid by an anti-FOSS group or just felt like shitting on the work of random people. There's a reason behind everything I wrote here and above; it's backed by experience. And you'll be hard pressed to find people disagreeing that Savannah is pretty terrible... you can't just go around "controlling the narrative" like you own the media. You have to fix the problem at the root.


Scouring GitLab CEO here, as requested :) You asked for feedback on your negative comments on Savannah. We love free software and the ideals of the FSF, so I hope you understand I'll refrain from commenting. I do want to share something interesting with the caveat that it is a 'GNU projects for network services' server, not an official one, but we're happy to host https://git.gnu.io/explore/projects


I was more curious about the part regarding bootstrapping a hosting business. ;)


We make almost no money on the hosting. Almost all of our income comes from on-premises installations at larger organizations. So that is why I can recommend monetizing open source with an open core business model.


Ten years from now for all we know we could all have so much cheap storage and bandwidth and good, open p2p software that all coders get to archive their own full copy of github's repos. So the focus should be on getting today's job done now.


> open p2p software that all coders get to archive their own full copy of github's repos

Do you mean git?


first we need cheap 10 Tb hard drives


Didn't we all say the same exact thing ten years ago?


Now we have SSD which means a small one-step-backward for storage. However, we will supposedly have 10TB SSD and beyond within a couple of years which should give some breathing room.

Even with all this development, I doubt we will be able to have everything on Github locally on our computers. I imagine the typical Github project to be tiny -- probably tens of megabytes at the most so I'd say we can have all the projects that we care about available locally. One can only care about so much.


Do you have any other suggestions? Hosting these repos on donated/personal machines is (IMO) significantly less likely to stand the test of time.

At least with a commercial entity there is a bit more "trust" involved that they won't disappear out of the blue one day. And if the time comes that Github starts to collapse, the process can be repeated.

Just because something isn't permanent doesn't mean it's pointless.


savannah? In don't think it's a very good alternative and I subscribe to the "we'll fix ten years problems in ten years" but it doesn't have the same issues.


We can't it stand test of time? Wikipedia seems to do well.


> Do you have any other suggestions?

Archive.org ?

And weird that nobody suggested the Bitcoin block chain. I don't think binaries are a good fit but source code doesn't require a lot of space. With the current and future block size it will take sometime to make it happen.


Archive.org (while an amazing resource created lovingly by amaing people) is not a great front-end for stuff like this. Github is very easy to get started with and excels at code hosting. As for the blockchain that's a terrible idea, there are so many things wrong with it including the cost to push all of that data into the blockchain and the fact that while source code can be small it isn't always and it's magnitudes of times larger. Right now each block is 1MB and blocks take some 10 minutes for just 1 confirmation so you are looking at < 1.7KBpss (13.6Kbps) "upload" speeds. IF you actually attempted this you would have to have some sort of header on each transaction to tie it all together which lowers the speed even further. I'd bet money that if you started uploading nodes would either ban you or the core devs would do something to stop the chain from being filled with shit that now has to get replicated to 10's of thousands of machines across the globe.


I received a lot of downvotes but my comment was a bit ironical since in many forums (even in HN in the past) when someone talked about backups many people suggest the block chain.

I said: With the current and future block size it will take sometime to make it happen.


The absurdity of that suggestion it not absolved by the fact that someone else has said it before, or by the fact that you said it will take 'some time for it to happen'. It's completely not an option for the migration we're discussing.


It is not absurdity if it is irony and I clearly said that it can't be done now but may be in the future. I can't see the future, do you?


It's not practical, but it's a worthy goal for blockchain computing, or some descendant of it. So I am glad you brought it up.


For now, Github is not ad-ridden as SourceForge is. Github is monetizing some repositories: https://github.com/pricing I don't know if they're sustainable, but from my naive point of view, closed-source projects on github pay for the hosting of open-source projects on github.


Looking at GitHub's business model, it looks a lot more sustainable than SourceForge's.

My company uses GitHub Enterprise. Unless we have some sort of deal/discount above the built-in, we're paying over $30,000/year for it and we run it on our own servers. I'm guessing a lot of other companies do as well. Developers are quite used to using both git and GitHub and $30,000 is nothing if you have a hundred developers costing you $150k a piece (not just salary, but computers, benefits, desks/office space, payroll taxes, etc).

SorceForge counted on their open-source stance limiting who would use their service and, by extension, limiting the resources they would need to serve those people. GitHub works the opposite way. They want everyone to think of GitHub as "the place I put stuff". Have a code snippet? Stick it on GitHub! Want a basic wiki for something vaguely code related? Create a GitHub repo just for the wiki! Collaborating with friends on a class project? GitHub! And then, years later, GitHub feels like second-nature to you and you love it when employers are using it paying GitHub tens of thousands a year for it.

I'm not accusing GitHub of doing something nefarious to lock people into GitHub. Just noting that GitHub feels very familiar and that makes GitHub a very reasonable choice for companies who pay them money. Without that familiarity, the value of GitHub isn't the same. If you're a company spending millions per year, $30k is a drop in the bucket for software your developers are already familiar with and software that works well, is well supported, and can handle your problem.

Yes, GitLab exists and has both open-source and enterprise versions, but I'm not sure that a business feels that differently about $5,000/year for a 100 person team and $25,000. I'm glad GitLab exists, I'm glad Bitbucket exists. They'll make sure that GitHub has to continue being great and they'll provide services to people that want something a bit different. But GitHub's business model seems pretty sound. The more people use GitHub for free, the more likely high-rollers are to pay for GitHub.

I mean, the GitHub subscription per developer costs less than the additional money my company pays for Apple gear for developers. By targeting open source with a premium, free, non-ad driven product, GitHub opened the door to lucrative business sales. They seem like a sustainable business and it even seems like the free, open-source repositories are part of that business plan.

I'm not saying that Apple gear is so overpriced or that it isn't a better platform to develop on, but we don't need retina displays to do our work. And many people argue that you don't want to force devs to work on a platform they're less productive on. The same applies to GitHub. If your devs are more productive or, heck, even happier or more comfortable using it, $250/year isn't something a company is going to blink at if it's paying $150k+ per dev - just as the company won't mind paying an extra $100, $500, or $1,000 in equipment for that dev.


Just wanted to mention that price is not the only reason why our customers prefer GitLab. The ability to run multiple servers to support many users is one of them, others are outlined on https://about.gitlab.com/better-than-github/


In my naive mind, I think it might be sustainable.

(During the past week I paid for the first month of private github hosting on my personal CC for the company I work for. Will get it re-imbursed and transition it to some company CC when I get the time.)


Github may be a commercial provider, but at least it's a commercial provider based on an open protocol. If things do start going wrong at Github, escape is a "git clone" and a "git push" away.


Except issues :(


SVN isn't?


A lot of the content on Sourceforge isn't in SVN. I think some of it may still be using CVS, and a lot more is just in the "project files" catch-all.

Also, SVN is a lot more annoying to copy, especially if you want the whole history. Not that it isn't doable, but it's pretty slow and obnoxious. With Git, every clone includes everything by default.



> "Git LFS ... is open sourced under the MIT license."


Who cares?

If that happens, the projects can be re-hosted somewhere else. For the time being Github is the best option.

Sometimes the hypothetical situations free software people bring up hurt their cause more than they help.


I doubt that. Github is a paid service and has several enterprise level clients. If it's ever going to flip flop, there will be quite a few warnings before hand.


somebody have to pay for git hosting. who will be better alternative in your opinion?


I like people like you , always slashing ideas and not suggesting your own...


Why aren't you mirroring the binaries? These are vital for people in the future who do not have the time to set up a build environment for software from a decade ago.

I'd also echo the concerns of others about GitHub.

Proper archivists should do for SourceForge what they did for other projects. Archive Team, maybe? Looks like they have a wiki page: http://www.archiveteam.org/index.php?title=SourceForge


This was in progress, 830GB was downloaded before a Sourceforge guy popped onto the IRC and said he's ok with the archiving, but that the robots.txt should be respected. This would put things at a practical standstill. So the downloading was paused, I'm not really sure what's happened in the week since.

Right now Xfire's videos, several URL shortners' links, and Toshiba Support material are being archived. If you have spare cycles and bandwidth, and want to contribute, running an instance of the "ArchiveTeam Warrior" is pretty easy through docker or a VM. http://archiveteam.org/index.php?title=Warrior


Honestly I think ignoring robots.txt in this case is acceptable. Even if he programs in code to respect robots.txt - once the management at sourceforge get wind of what he is doing - what is stopping sourceforge from putting up robots.txt everywhere blocking him?


Look at their current robots.txt; they're already prohibiting robots to crawl the actual source code: http://sourceforge.net/robots.txt


Sourceforge doesn't host the binaries themselves. Universities and others offer mirrors (like HEANET) for free!

So the mirrors should just cut the upload write permission for Sourceforge and transfer it over to archive.org or ArchiveTeam.


Regarding binaries, I know these could be useful and I'd like to provide them, but I'm afraid some "not (yet?) very popular mirroring project" can't show how we can trust it regarding binaries. After all, a known site like SF is untrustable, so why would an unknown site would be more?


Yes, this is a more challenging and potentially risky one.

I think you're taking the right approach by capturing the code and the history. In fact, I think you're going above and beyond what most people should ask for or expect.


Seems you could just side-step SF directly in this case and contact one of their mirrors:

http://sourceforge.net/p/forge/documentation/Join%20as%20a%2...

http://sourceforge.net/p/forge/documentation/Mirrors/

I bet at least one techie working at one of those organisations would lend a sympathetic ear to the effort, if you could find them

edit: running 'rsync -r' on a local mirror shows 512,000 directories from a..ju, but only 43k files. Mirroring all the downloads should be easy


Honestly, this is a serious issue for my field. There are so many obscure academic binaries hosted on SF... I hope someone manages to mirror them. [The fact that a lot of the scientific community is so backwards in adopting modern coding standards is another conversation for another day.]


Sourceforge is on the radar here, but maybe it's time to step it up.

http://www.archiveteam.org/index.php?title=Fire_Drill

Update: seems others have linked to archiveteam.org, so maybe that's the best route. Is the OP part of the AT effort or do they know about each other? Maybe they should.


Nice! But in my opinion better help archiveteam with their efforts!


Definitely. And if you want to help out visit [1]. There’s options to download binaries[2] as well as source code[3] from SourceForge.

[1]: http://archiveteam.org/index.php?title=SourceForge

[2]: https://github.com/ArchiveTeam/sourceforge-grab

[3]: https://github.com/ArchiveTeam/sourceforge-grab-rsync


I confess my ignorance regarding archive.org's various collections. There seem to be a lot of them, which one are you referring to?


They're not a part of archive.org (just a totally separate group with similar interests, lead by Jason Scott). The specific project page is here: http://archiveteam.org/index.php?title=SourceForge

The best place to pop in is probably on IRC. For the Sourceforge project it's #coldstorage on EFnet, http://chat.efnet.org:9090/?nick=&channels=%23coldstorage&Lo... for the web client. Though note, the ArchiveTeam project seems to be paused right now.


To add to this - ArchiveTeam often works with archive.org who have arranged long term storage for a lot of retrieved content.


See the response just above yours - ArchiveTeam is separate from archive.org


Agreed. This article sounds more like an advertisement for GH. Also, why not using other platforms like bitbucket? Centralizing everything at GH is the worst scenario for open source.


Bitbucket is exactly the same.

Gitlab has a community edition, and then there's Gogs, Kallithea and a few others.


Nice.

I agree with what the others are saying, there's a lot of source code for solving obscure programs that is only on Sourceforge.

One example I found recently is a program called QLumEdit. I recently had to figure out how to work with EuLumdat files, and if it wasn't for the source code for this program on Sourceforge I would have been completely stumped (well not quite, but it would have taken me ages).

If SF goes down the toilet, a lot of knowledge goes with it so this is awesome to see!

If anybody is interested, I was converting this code from C++ to .net, my horrible hacky unrefactored effort is here - https://github.com/bumblebeeman/eulum.net

I am planning to make this code nicer, and develop it into a WPF app when I have time!

I am getting pretty close too, here is my .net generated version of the images this program produces: http://imgur.com/PCmpnJ2


That's great. I started doing that myself (my own git server, not github) for some projects I care about. This effort seems include a very narrow list, though.

For CVS, though, I suspect cvs-fast-export [1] will do a better job than git-cvsimport.

1 http://www.catb.org/esr/cvs-fast-export/


Thanks, I'll have a look


What about creating a torrent containing all these unmaintained SF projects (with binary downloads included)?

This would dramatically increase the odds that the content is never lost.


The problem with torrents is the lack of incremental update support. If the base torrent gets updated it gets a new hash identifier. How do you know its been changed to ensure you get the latest version. When you do the swarm effectively gets diluted because some are on the new architecture and some are on older versions.


If the SF projects aren't being updated, then what's the issue? The information is, by definition, static.


> Currently, for each cloned project, we mirror its CVS repository and its website.

Please add "SVN" (Subversion)


This seems much more like a temporary fix, not really a solution. A few years from now GitHub can do the same thing that sf did. This after all seems to be the fate of commercial companies that explore open source, once they start to lose users to new competitors.


Note 1: Moving things to GitHub or elsewhere does not remove them from SourceForge. So SF can continue to host and enjoy links on unmaintained websites, search engines etc.

Note 2: If their business model is offering popular binaries and source, they can just copy these from other sites and repackage them. Open source software allows you to do this. If no one else is interesting in bundling and monetizing, then they can buy traffic and still succeed.

Note 3: Remember that academy award winning movie from 1943? Not so great it today's light. While perhaps one of the goals of the Internet and cheap storage is to keep a copy of everything, and its often better to not re-invent the wheel, if something fall by the wayside, and its needed, it will be created.

Note 4: There are plenty of websites which catalog useful abandonware, that someone had to find a physical disk drive from. If the software has value, chances are someone will eventually repost it somewhere without a massive organized effort.

----

There is clearly value in moving over some project to GitHub or elsewhere, but if some things are not migrated or moved life will go on.


Per the historical record, that "academy award winning movie from 1943" was 'Casablanca'.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: