I looked a bit into this code and here's a short summary:
- The API supports GET/PUT/DELETE (no range queries).
- It uses 1 master server and N volume servers.
- The master server keeps track of all keys and all requests need to go through this server. The values are stored in one or more volume servers using rendezvous hashing.
- The volume servers are just Nginx + DAV which stores data on disk.
- There's no live support for rebalancing. If you change the backend volumes, then you need to take the master server down, run a re-balancing job and then bring it up again.
- Objects are locked in the master server during modification. This means that while a server is processing PUT/DELETE all GET requests for that key will be blocked.
- There's no handling of errors from the volume servers. E.g. if one volume server fails to store data, then it won't rollback the changes done to the other volume servers nor do anything in the master server.
- The benchmarking tool (thrasher.go) does "10000 write/read/delete" in 16 threads/goroutines. This means that it generates 10k random keys (i.e. no contention at all), and for each key it does a PUT, then a GET and then a DELETE. Their result show that it's capable of handling 3815.40 keys/sec (i.e. multiply this by 3 to get the request count).
> - The API supports GET/PUT/DELETE (no range queries).
There are range query on keys:
# list keys starting with "we"
curl -v -L localhost:3000/we?list
> - There's no handling of errors from the volume servers
If a GET/PUT/DELETE fails it is communicated to the master server, who doesn't write anything in its own database
All in all I believe it does quite a lot for less than 1000 lines already. Knowing its limitations I'm curious to see how well it handles production usage
> # list keys starting with "we" curl -v -L localhost:3000/we?list
Ah, I missed this!
> If a GET/PUT/DELETE fails it is communicated to the master server, who doesn't write anything in its own database
Yes, but there's no protocol for cleaning it up. Example: If the second replica fails to store it, then the first replica will still have the data and there doesn't seem to be a way that it will be removed. This might end up "leaking" storage. Also, if you then attempt to do a rebuild it seems to me that the file will be revived.
> All in all I believe it does quite a lot for less than 1000 lines already.
Oh sure. I wasn't trying to be negative. I think it's great that people play around with these ideas! Distributed systems are complicated beast, and there's only so much you can do in 1000 lines.
Another thing I realized: It buffers all received values in-memory in the master server. You might need quite a lot of memory if you intend handle uploads of big files in parallel.
- The API supports GET/PUT/DELETE (no range queries).
- It uses 1 master server and N volume servers.
- The master server keeps track of all keys and all requests need to go through this server. The values are stored in one or more volume servers using rendezvous hashing.
- The volume servers are just Nginx + DAV which stores data on disk.
- There's no live support for rebalancing. If you change the backend volumes, then you need to take the master server down, run a re-balancing job and then bring it up again.
- Objects are locked in the master server during modification. This means that while a server is processing PUT/DELETE all GET requests for that key will be blocked.
- There's no handling of errors from the volume servers. E.g. if one volume server fails to store data, then it won't rollback the changes done to the other volume servers nor do anything in the master server.
- The benchmarking tool (thrasher.go) does "10000 write/read/delete" in 16 threads/goroutines. This means that it generates 10k random keys (i.e. no contention at all), and for each key it does a PUT, then a GET and then a DELETE. Their result show that it's capable of handling 3815.40 keys/sec (i.e. multiply this by 3 to get the request count).