Hey peeps.
So I need some recommendations.
I've been building a distributed file system for work to store our hash tables with. These are 1 gig files (about 40TB worth of them) that are write once, read many.
It needs to duplicate the data across servers and make the file available via HTTP. Oh and it needs to scale quite well because as from next month we are potentially adding another TB per month.
So far I havent been able to find a DFS that does all the above so have been working on my own. But am nervous - the files are mission critical but I am not too worried about losing stuff per se (there are alternative backup solutions that make sure we have multiple static copies safe). Im more worried about not being able ot cope with the load. My current implementation is in Python and simply uses a central MYSQL server to track file locations.
So. Can anyone recommend a DFS option I have missed that fulfills my requirements. Or even better can anyone offer technical ideas to help with the development of our code.
:)
It doesn't serve via HTTP directly, but it's easy to point a web server at the filesystem (it has an Apache module to provide direct access without going through FUSE if you want higher performance).
http://www.gluster.org