Storage is Cheap, Don't Mutate - Copy on Write

wheels · on Aug 12, 2011

There's a problem here in going from their specific case to where this may make sense, to the more general case they're trying to make, where it doesn't.

For instance, if you're mapping disk space to memory addresses and have large blocks of data, this would mean that every time you needed to update a single byte, you'd either have to duplicate the entire block (possibly several mega- or giga-bytes) or give up linear access. Yuck. There are a lot of use cases that would absolutely be miserable for.

You can get away with such things a lot further if you're implementing these things at the file-system level.

hvs · on Aug 12, 2011

There's a great presentation of these types of issues in the book Purely Functional Data Structures:

http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf

rlander · on Aug 12, 2011

This is a great read for any programmer, not only those interested in functional programming.

Also related (HN discussion): http://news.ycombinator.com/item?id=1138979

And Cris Okasaki's blog: http://okasaki.blogspot.com/

hashbo · on Aug 12, 2011

Hi.

Yep agreed, I did I believe mention it is most valuable when you have mostly have creates & reads and less updates like in the example of web applications. Of course there are valid in memory level operations, quite famously in the Java world we have ConcurrentHashMap which is a CoW semantic. However naturally one size doesn’t fit all, but then it never really does :-)

hvs · on Aug 12, 2011

I'm all for finding ways to avoid locking and using immutable objects is one way to help, but this:

  Remove the mutation part and now you have a version history,
  allowing conflict resolution.

is a little too "hand-wavy" for my tastes. Conflict resolution is not an easier problem than resource contention.

hashbo · on Aug 12, 2011

Fair point, maybe I should have been clearer (and an edit will be forthcoming). My point is that with a version history it’s possible to let the content originator know there is a conflict and allow them to choose which version they wish to go with. This can be as simple as diplaying an icon to show there has been a conflict and giving the choice to keep or revert, showing the other users changes, like most wikis. Or much more complex like in a version control system.

My point is more the negative here, you can’t do conflict resolution if you write over the previous state.

Sorry if it was hand wavy, was just a quick Friday afternoon blog :-)

hvs · on Aug 12, 2011

Fair enough. I'm on your side about keeping a version history and using immutable data (for some things). Its just that my biggest question when I was reading (as it is any time someone discusses concurrency) is: how does he handle updates to the "same" data? Because, ultimately, you will have state somewhere in your system.

hashbo · on Aug 12, 2011

So the simple answer is that right now we use a most recent wins strategy because our data is not critical or highly correlated like relational data. For us it’s simple to add at a future date UI features to allow wiki style history. Obviously with correlated data it’s not so easy :-) and that is, as I believe you were implying, where versioning gets tricky! Of course point still applies about the need to avoid mutation :-)

wccrawford · on Aug 12, 2011

I agree. Locking is used because the code needs to make sure nothing else modifies that data while it's reading it. If you don't lock, and the data is modified, the program could crash.

Adding version history to it doesn't change this fact.

Example:

You're iterating over bank records, marking them paid. Another thread adds one to the beginning of the list. Thread 1 finishes marking them paid, then moves them all to the 'paid' section... Including the non-paid one.

hashbo · on Aug 12, 2011

Actually copy on write would help you 100% here :-) you just need to take the abstraction a level up. That is mutations to your list would create a new list. The original thread continues reading the stale copy. Therefore reducing consistency issues. You will see that is how (in Java) ConcurrentHashMap works for example. There’s no magic bullet ever in coding, but CoW can make life a lot easier, and certainly that is my experience. So I’m just encouraging folk to think about adding it to their toolkit. Thanks for the comments though it’s interesting to talk about the use cases.

wccrawford · on Aug 12, 2011

Actually, no it wouldn't.

The second thread makes a copy of the list and adds a new item.

The next time the first thread goes through to mark everything 'paid', it either gets an empty list, or a list that still contains everything it supposedly moved elsewhere earlier, depending on how you handle it.

ori_b · on Aug 12, 2011

Right, you need lazy lists (or generators) for this to work naturally. eg, in python:

    paid = (markpaid(x) for x in list_records())

robertskmiles · on Aug 12, 2011

I followed the link at the bottom, and it made me pretty angry. Hashbo is a very neat example of the kind of bullshit I hate in upcoming software products and services.

The only information available before asking for signup is this;

> Hashbo is a simple innovative, new way to communicate, collaborate and curate.

What the hell is that supposed to mean? Is it a blogging system? Microblogging? A social network? IM? Is it some combination, or a new thing?

People come to a site like this with the question "What is this and why should I care?". At the end of that (quite nice) animation, I have no idea, and that doesn't make me intrigued and curious, it makes me annoyed and frustrated at not getting a straight answer.

Nobody cares about your buzzwords, they care about what your site is, and what it can enable them to do that they couldn't do before.

hashbo · on Aug 12, 2011

Hey Robert

It’s not attempting to give intrigue, I just really haven’t figured the right copy out yet:) Take a look at some more of my blog articles if you like, they explain at least the general idea of what we’re doing. We hope to get everyone on the beta soon.

The problem for me is that it’s actually difficult to describe what it does and is better experienced then you can make your own valuation as to whether it suits you or not. Much like when the Internet started, it was easier to show than explain. So while we're just in the early stages of beta I didn’t want to prejudice people by giving partial information.

So I’ll see if I can get some feedback from early beta testers to find out what words they use to describe it - that may help me get a more unbiased view.

Glad you liked the animation btw., we’re (mostly) a one man band in reality, and I’m a techy at that, so any complement towards design is gratefully accepted.

All the best and thanks for the feedback, it’s really helpful to know what people think, Neil.

robertskmiles · on Aug 22, 2011

It's good to know that it's not deliberate, I can totally understand that. The thing is, the site's very polished, and it looks finished. There's no real indication on the site that the information provided is less that the amount intended to be available, except for the link to the blog, which has three problems:

1. I don't want to click another link just to find out the answer to the simplest question I have - 'What is Hashbo?'. You have to remember that people don't care, and won't care unless you give them a reason to care. Every additional click, every extra second, loses people.

2. (minor point) The link opens in a popup, which is blocked by my popup blocker. It really only makes sense to open in a new tab/window if the user is very likely to want to keep the first page open, and there's basically nothing else on that first page, so no-one will want to keep it open.

3. When I get to the blog, there's still no insight into what Hashbo is, just the first blog post, which at time of writing is a post about NoSQL. What percentage of prospective users of your site have ever even heard of SQL? (Non-rhetorical question, I really don't know because I still don't know what the site actually is)

Because of this conversation, I've already spent far, far more time on Hashbo than I normally would give to a random new site, and I still don't know what it is, and I still don't care. And I don't think I'm particularly unusual in my attitudes.

I'm not a 'hater', really I'm not, I'm just pointing out that you have a pretty serious problem here. What's your elevator pitch?

http://en.wikipedia.org/wiki/Elevator_pitch

sgrove · on Aug 12, 2011

Yeah, don't mind the haters. I haven't read the copy from your page - just from the op, it does look lame. But that's fine. Work on it, make it specific, and pitch concrete benefits.

If you product involves education (new model, etc.), you should be pitching to everyone, counting the mean-time-to-understanding, and relentlessly honing it downwards.

hashbo · on Aug 12, 2011

Really appreciate the combination of honest and helpful feedback. I will certainly keep doing exactly that :-)

balakk · on Aug 12, 2011

Ironic that SQL is mentioned here - I think the idea is close to Multi-version concurrency control, which "real" databases have been doing for years now.

http://en.wikipedia.org/wiki/Multiversion_concurrency_contro...

hashbo · on Aug 12, 2011

Absolutely but that is internal to the database and cannot be cached locally. With copy on write semantics being at the application level it means you can cache your data locally and not have to worry about distributed cache invalidation etc.

birken · on Aug 12, 2011

How do your distributed nodes figure out which is the most recent version of a particular object?

hashbo · on Aug 12, 2011

Hiya, by an index. The index of course isn’t cached. This model applies equally well to RDBMS as well as non-RDMS systems. You can index which is the most recent version without losing the benefit of caching the data (and potentially it’s de-serialized form within the application). Not suggesting one size fits all, just highlighting a useful strategy.

birken · on Aug 12, 2011

And how do you keep this consistent across different objects? IE object A and object B are updated at the same time, going from generation 1 to generation 2. I am on a remote machine reading from the cache. How can I ensure I don't get a view where I see object A at generation 1 and object B at generation 2?

IE the basic point I am making here is that MVCC does a lot of relatively complicated stuff to handle this for you. I don't disagree that you are gaining a lot of power at your remote nodes to do things that would increase performance, but for 99% of applications I think it a huge amount of work for a small amount of benefit.

gnaritas · on Aug 13, 2011

> And how do you keep this consistent across different objects? IE object A and object B are updated at the same time, going from generation 1 to generation 2. I am on a remote machine reading from the cache. How can I ensure I don't get a view where I see object A at generation 1 and object B at generation 2?

You use caches when it doesn't matter, which is much of the time. If it really really matters, then you use ACID semantics of the DB to ensure that.

hashbo · on Aug 12, 2011

Hi so I believe you’re asking how does CoW solve consistency issues. This seems to be a recurring theme. I suspect and correct me if I’m wrong this comes from the fact we’re using a non-RDMS database or is it because our actual caching is by nature distributed. In either case the questions being asked are applicable to all distributed caching, if you need to have an absolute guarantee of consistency (nuclear power plant, large financial transactions) then I would personally avoid using caches anyway and reconcile these relationships in the database anyway. I might add this is not our use case :-)

My experience has been in both high volume financial systems and consumer startups and I personally have found that sometimes ACID behaviour is near essential. Or at least near essential to keep the implementor sane :-) And other times it’s just a hinderance. Nothing in my article implied that I’m out to solve all these, interesting cases I hope - rather that a particular pattern which can be implemented in different ways has a set of benefits. For the record Neo4J supports ACID transactions, in case I have inferred otherwise.

What’s interesting to me is that in various of the comments I have received seem to extrapolate my article into, I believe an attempt to remove or somehow change the working of RDMS systems. Which I can honestly say has me at a loss, goodness knows what I said to trigger that view. No offence to the commenters I just am uncertain where this came from. What I was discussing originally was a design pattern which I have had recent experience of within the context in my case of persistence.

Again forgive me if I misunderstood but I believe you are talking the internals of your database when you’re talking MVCC. This wasn't the problem we’re trying to solve. In fact Neo4J is a very high speed transactional database that we’re more than happy with.

In fact we’re not trying to solve a problem, we needed versioned data so we implemented that, now we’re getting lots of benefits including zero cache invalidations which means we have a safe write through cache, which of course can be caching application style representations of the data (i.e. objects). And the various other benefits I mentioned.

I can only state my practical experience here, which is that we do get a massive benefit from being able to arbitrarily cache data without fear of stale data.

But hey horses for courses, enough from me for tonite - a huge pile of code awaits me and a beta needs to get underway - if you’d like to get back to me, please drop a comment on the original article and I will be pinged. Always happy to chat.

Thanks for the discussion!

Neil

joevandyk · on Aug 12, 2011

This is what bugs me about state machines (like https://github.com/pluginaweek/state_machine).

Adding a history or audit always seems like a giant hack.

hashbo · on Aug 12, 2011

Actually a history on a state machine give’s you undo functionality a fairly common pattern, no? You have a stack of states instead of a single state. Trouble is though inter-related states isn’t it :-) things can get messy if you don’t have atomic states.

Actually someone else just pointed me in the direction of Event Sourcing - every pattern has a name somewhere - http://martinfowler.com/eaaDev/EventSourcing.html

jchrisa · on Aug 12, 2011

There is a lot of complexity in that Event Sourcing document. The pattern can be simpler: https://github.com/iriscouch/follow/blob/master/README.md

EGreg · on Aug 12, 2011

One of the most versatile ways to have scalability and partition tolerance while also having consistency is having eventual consistency.

And having eventual consistency with MVCC implemented with vector clocks really rocks.

Riak takes Brower's CAP theorem seriously and even lets you configure the values per bucket so that you can get more of C, more of A or more of P.

You don't need to store all the history, just the contentious history. Riak pretty much rocks in the sense that you can add nodes to it and it just works, or take them away and it just works. Just like Dynamo. (Not like Cassandra.)

RexRollman · on Aug 12, 2011

Just thinking out loud: does doing copy on write result in less filesystem fragmentation (assuming the data is not held in RAM)?

hashbo · on Aug 12, 2011

In theory yes, as it avoids holes in the databases or files used for storage. It’s also the semantics of ZFS - http://en.wikipedia.org/wiki/ZFS

ora600 · on Aug 12, 2011

... and ZFS copy-on-write has been known to cause severe performance degradation for databases. Not to mention that too many SAs keep the default ZFS block size, which is far too large for OLTP databases.