Hacker News new | past | comments | ask | show | jobs | submit login
OrmHate (martinfowler.com)
100 points by jgrodziski on June 26, 2012 | hide | past | favorite | 134 comments



I think a lot of the frustration with relational databases in general comes from a misunderstanding.

People conflate "relational" with "SQL", because of the historical accident that SQL is the most popular way to query relational data. Then when SQL isn't a good fit for their problem, they think relational is not a good fit for their problem, which is almost certainly not true.

The original motivation for relational databases is to have path-independent access to data. This is a really powerful idea.

What I think we really need are better APIs around relational data access that embrace real relational semantics, instead of dumping them in favor of an object graph at the first opportunity (which throws away the path-independent access).


This is correct!

A relational DBMS is simply predicate logic applied to set theory[0]. The DBMS part offers logical/physical data independence, which is amazing, and no one has actually implemented that (NoSQL, neither!).

SQL has never been particularly awesome way to interact with relational database management systems; SQL isn't even strictly relational: SQL tables are bags (multisets), not sets[1].

The Rel project (http://dbappbuilder.sourceforge.net/) is probably the closest thing to a "real" RDBMS. It also implements CJ Date & Hugh Darwin's "Tutorial D" language, a truly relational language used to work with a RDBMS.

What I find most ironic with the whole "Relational / Object" mismatch is nothing in Relational Theory prohibits objects. As a matter of fact, a RELATION[2] consists of NAMED ATTRIBUTES and DOMAINS.

Domains are basically DATATYPES and OPERATIONS that can be performed on them[3]. Although the notion of CLASSES (in OO land) wouldn't map 100%, much of what you're interested in doing with Objects can and should be directly represented in a truly relational DBMS[4].

In a traditional MVC style framework the RDBMS would consist of the entire Model layer; thus the "Object/Relational Mismatch" disappears!

-- Matt

Refs:

0. http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf

1. A bag can have duplicate rows SQL does not require that a table have keys NULL introduces three-valued logic

2. http://en.wikipedia.org/wiki/Relation_(database)

3. http://www.dbdebunk.com/page/page/627014.htm

4. http://www.dbdebunk.com/page/page/1717959.htm


People conflate "relational" with "SQL", because of the historical accident that SQL is the most popular way to query relational data. Then when SQL isn't a good fit for their problem, they think relational is not a good fit for their problem, which is almost certainly not true.

For most practical purposes, SQL is the only way to query relational data. In the absence of alternatives, it's natural to conflate the notions of SQL databases and relational databases. I agree that SQL is a mess, but I don't think an approach based on pure relational primitives would make it better; in fact, I think SQL is a mess specifically because of lack of expressiveness in the pure relational primitives. NULL, ORDER BY, LIMIT/OFFSET, opaque keys, windowing functions, transitive closures, etc fit poorly into the relational model.

The original motivation for relational databases is to have path-independent access to data. This is a really powerful idea.

I agree with both assertions, but in many applications, path-based access makes the total majority of queries, and SQL or relational model provides little means to distinguish them from other equi-joins or arbitrary join conditions. In my opinion, it would be better to start with a navigational path-based database model, and extend it to allow constructing new paths dynamically.


I'm not sure what you mean by fit poorly in the relational model.

ORDER BY, LIMIT/OFFSET have to do with presentation of data. So, although it's highly desired that a language based on the relational model supports them, they have nothing to do with the model per se.

As for opaque keys, I thought the old debate about surrogate/artificial and natural keys was over years ago. The relational model has nothing to say about it, that would be like trying to make a model understand if facts are true in the real world or not. A key is a key.

Moreover, last time I've checked transitive closure operators were defined for the relation model (people should not stop ad Codd papers, Date and Darwen wrote a lot of books, e.g. The Third Manifesto, extending on the original ideas).

NULL is a completely different beast and this is the only real thing one can consider problematic.

In some cases NULL just means the predicate for the relation is different because values for an attribute don't apply, so this is not really a problem, in some other cases we simply don't know the value when we are collecting information, and this is indeed a problem.

Date follows Wittgestein that said we should remain silent about things we can't speak about i.e. we shouldn't collect incomplete information, Codd came up with I-marks, A-marks and n-valued logic, SQL collapsed everything into NULLs and 3-valued logic.


ORDER BY, LIMIT/OFFSET have to do with presentation of data. So, although it's highly desired that a language based on the relational model supports them, they have nothing to do with the model per se.

By fit poorly I mean that you cannot express most real-world business inquiries using pure relational primitives without ORDER BY or LIMIT/OFFSET and that's why I think relational algebra is not usable per se. SQL fixed this problem by adding many non-relational constructs, but but without any sense of consistency or direction.

I also strongly disagree that ORDER BY and LIMIT/OFFSET are presentational operations since I often use them not only for wrapping the outer SELECT, but also within correlated subqueries.

To show some proof, here are a few queries which are hard or impossible to express in relational algebra:

1. Show the blog post with the largest number of comments [^].

2. Show the tags associated with the blog post with the largest number of comments.

3. For each blog category, show the 3 top blog posts by the number of comments.

[^] If more than one exist, pick the latest.

NULL is a completely different beast and this is the only real thing one can consider problematic.

I think NULL is only hard because relational model is a wrong way to look at the data. If you see an entity attribute not as a column of a tuple, but as a function from an entity set to some value domain, the fact that the attribute is nullable just means that the function is not total. There is a well developed mathematical apparatus for partial functions, in which NULL becomes a bottom value injected to the value domain, and tri-valued logic is simply a monotonic extension of regular Boolean operators.


As long as you can represent your query with predicate logic and not violate set theory (or other relational tenants), it's perfectly "relational".

It's important to note that folks have figured out how to extend Codd's original relational algebra with stuff like aggregation. As the OP mentioned, "Relational" doesn't mean "What Codd wrote in a single paper back in 1969". It has continued to evolve, both with Codd's direct involvement and from successors like Date, Darwin, and Pascal. Codd wasn't an all-seeing, all-knowing data-management demi-god - but his general theory of relational database management and the core tenants are still super awesome. Extensions to it, as long as they don't violate the RM, are just as valid as Codd's original work.

That means #1 is totally relational (as an aside, you don't need ORDER BY or LIMIT for it either). Indeed, relational algebra supports aggregations (http://en.wikipedia.org/wiki/Relational_algebra#Aggregation).

It's important to separate the query language from core RDBMS theory, as the two are orthogonal. Codd suggested Relational Algebra as a reference language but never intended for it to be the only way to communicate with a RDBMS.

See CJ Date's excellent discussion on ORDER(BY): http://books.google.com/books?id=WuZGD5tBfMwC&lpg=PA163&...

EDIT: It even seems folks have figured out how to make "LIMIT" relational operator: http://stackoverflow.com/questions/10229535/relational-algeb...

I have not read the paper, so I cannot discuss the validity of the approach.


That means #1 is totally relational (as an aside, you don't need ORDER BY or LIMIT for it either).

I would love to see it. Yes, you can do it in SQL, but I'd say it's not easy at all without ORDER BY and LIMIT or windowing functions and I don't know if you can do it in Tutorial D. For the reference, #1 is:

Show the blog post with the largest number of comments. If more than one exist, pick the latest.

The schema is:

    post(id integer, created timestamp)
    comment(id integer, post_id integer)
See CJ Date's excellent discussion on ORDER(BY)

I read it and the book as well, but I wouldn't call it excellent. What I read there is a reluctant admission of failure to incorporate an important operation to his query model. I see no attempt to analyze why it doesn't work or adapt the query model to make ORDER a regular operation.


It's been a loooong time since I wrote any longhand Relational Algebra, so I'll cheat and use SQL. All of this can be done pretty easily with relational algebra primitives.

And we can make up any operator we want as long as it uses a primitive, so (to save typing I created a view, but you could copy-pasta). I added a "Title" to post because otherwise you could skip it entirely and just use the comment table twice, but where's the fun in that?

<pre> CREATE VIEW counts AS ( select count() AS comment_count, post_id from comment group by post_id )

SELECT id, title, comment_count FROM post p INNER JOIN counts AS c1 ON p.id = c1.post_id WHERE NOT EXISTS( SELECT FROM counts AS c2 WHERE c2.post_id != p.id AND c2.comment_count > c1.comment_count ) </pre>


Thank you. I accept your answer with the note that you ignored my request to return only the latest post when there are more then one posts with the same number of comments, but it's not hard to adapt you query to satisfy this requirement.

However you can't do the same trick if I ask you to return the top 3 posts with the largest number of comments; or, to make the query more realistic, ask you to return the percentage of comments generated by the top 10% popular (by the number of comments) posts. Which is my point: pure relational algebra as advocated by Date et al in Tutorial D is less expressive than SQL, which probably explains the cold reception it got from the industry.

Edit: now that I think about it, you could do it without ORDER BY/LIMIT, but still it's harder than necessary.


(Someone figured out LIMIT for relational algebra, and ORDER can be coerced to relational if you define a bunch of edge cases. I don't think it's fair to suggest that a RDBMS can't/shouldn't/wouldn't do LIMIT or ORDER).

I agree, relational algebra sucks to code in. That wasn't the point of it, of course. Codd's goal was to prove as long as your language is reducible to relational algebra, you're relational. And with that, you get all the side benefits.

I'm not so sure Date et al are advocating coding purely in RA. At least, I've never heard them say that. Their "Tutorial D", as the name implies, is for educational and specification purposes more than a useable implementation. Again, if you can make your awesome language reducible to "Tutorial D", you get all the nice benefits of the True RDBMS, which is pretty awesome.


I was about to prepare the query without ties and yes it is a lot harder than necessary and, more important IMO, much less readable/intuitive for people than have to maintain the code and that is exactly why different SQL dialects introduced ORDER BY/LIMIT/OFFSET/TOP/RANK etc.

But let me be a little bit picky about this and Date's view on ORDER BY.

The relational model deals just with the algebra/calculus without getting into the details of a language based on the model.

In the book pointed out by Matt, Date explicitly states he's not saying ORDER BY is not useful, just that it doesn't return a relation and thus it's not included in the algebra.

However a language based on the relational model, like Tutorial D, can include such an operator. To be double sure, I checked on The Third Manifesto V2 and a LOAD operator with an ORDER specification is defined in the context of the "special cased" support for arrays.

You can see a couple of paragraphs if you go here http://books.google.it/books?id=X85QAAAAMAAJ&dq=editions... and search for ORDER and LOAD (page 118).

TBH I'm not even sure it's a good idea to introduce arrays for ordering, but, anyway, back to the quota queries: we agree that regular aggregation operators are enough although the query becomes very complex.

In the same book referenced by Matt there's an exercise (7.14) showing how to do a quota query and you can see that even in Tutorial D it's complex.

However, in the solution, Date & Darwen also propose something else: to add a more specific RANK operator which is really just syntactic sugar to simplify this kind of queries. With the important difference, compared to ORDER, that it still returns a relation and not an ordered sequence of tuples.

Unfortunately the whole solution to the excercise is not available through Google Books preview, and the operator is formally defined elsewhere, but you can see how such RANK operator would work here http://books.google.it/books?id=WuZGD5tBfMwC&lpg=PA163&#...

Just a final comment about Date being reluctant to analyze the matter, unfortunately his work is disseminated in a lot of books (and he changed his position on quite several matters throughout the years).

I love the "SQL and Relational Theory" one but, having red all of his books, I would be hesitant to suggest it unless one already knows Date. I think the latest edition (8th) of "An Introduction to Database Systems" is still the best book to start with.


I was about to prepare the query without ties and yes it is a lot harder than necessary and, more important IMO, much less readable/intuitive for people than have to maintain the code and that is exactly why different SQL dialects introduced ORDER BY/LIMIT/OFFSET/TOP/RANK etc.

Very good point.

In the book pointed out by Matt, Date explicitly states he's not saying ORDER BY is not useful, just that it doesn't return a relation and thus it's not included in the algebra.

My biggest gripe about ORDER BY, LIMIT and relational model is the fact that while Date and others made some attempts to express these operations in terms of relational algebra, they never (AFAIK) tried to do the opposite: alter the relational query model to naturally support them. It's not hard: just replace sets with sequences or arrays. It will gives you natural ORDER and SLICE operators as well as new aggregates FIRST, LAST, NTH. It solves duplicates without having to introduce bags, gives windowing functions for free and probably better represents how modern RDBMS interpret a query. Another hint why sequences may work better than sets is the fact that regular set operations such as INTERSECT and UNION (as opposed to UNION ALL, which becomes concatenation) are so rarely used in real-world queries.

I'm not even arguing that this is a good approach, but I think it deserves some discussion and it appears they never even thought of a possibility of changing the model treating it not as an instrument, but as a sacred scripture.


First I don't know why order by, limit, offset, or windowing functions, can't be said to return a relation if we define relations in a way which is sufficiently useful to include these operations. In other words, they are used in ways which returns sets of tuples (or sets of entities if you want to see it that way), based on specific selection criteria.

I would thus agree that to the extent that these are not part of the relational model this says more about the incompleteness of that model than it does about the operations themselves.


Relations being defined what they are is not accidental; sets are not bags and there are a lot of very good things that come out of a relation having a very well defined, uh, definition.

It's not a matter of "usefulness" but of "well-defined" that allows us to derive a whole lot of other interesting things.

Folks actually have defined ORDER BY, LIMIT, OFFSET, etc. in terms of the RM; it's just that the typical ORDER BY doesn't return a relation because of ordering (sets are unordered by definition) and so there was a lot of gymnastics they had to do in order to keep the set theory intact.

Sure, arbitrarily reordering is not a hard concept (or even implementation) but to make sure you cover all the bases requires a significant amount of work. A RDBMS is a complicated thing and you don't want to just add something to it without doing proper due diligence.

One could argue that NULLs are more "useful" (I disagree) but the addition of NULLs (a deceptively simple concept) has vastly overcomplicated SQL and lead to a number of inconsistencies in the spec.


Sets may be unordered by definition but that doesn't mean you can't define something interesting as an ordered set.

Consider the Pythagorean attempt to prove that all numbers were rational by trying to prove that the square root of two was rational. That they were able to prove that it was not rational meant that we ended up with a new category of numbers. Similarly once you get into the square root of -1 you get into yet another category of numbers designed to address that problem.

Our numeric model isn't complete with just rational numbers, or just rational and irrational numbers. Today we have to add imaginary and complex numbers as well. Why shouldn't we be expanding relational math in the same way?


I disagree with you on NULL. The relational model is a very good way of looking at data in many contexts.

The problem actually is that NULL has several distinct meanings and there is often no real way to differentiate between them other than to disallow all meanings but one, and that is often difficult.

For example you talk about nullable columns, and this is one aspect of NULLs. NULLs may mean missing data. They also are often used to mean the data doesn't apply. This already runs you into ambiguity problems because you can't type in a query that easily distinguishes whether the attribute doesn't apply or is merely unknown. Note that Oracle treats NULL strings as equivalent to empty strings, while PostgreSQL tries to differentiate strings by allowing empty strings which are distinct thus allowing a not-applicable value for character string fields.

Additionally, you would expect the || operator to handle unknown data differently than it does data which doesn't apply. string || not_applicable should equal string. string || unknown should equal unknown.

Now that's only the beginning of the problem. There is a third use of NULLs too, namely as a placeholder for missing rows in outer joins......

If we had three different NULL values some of this problem would be more manageable, but the problem is that as soon as you allow nulls in columns, you can't always easily tell from a query on a well-normalized database what that NULL means without a lot of additional introspection of the representative of the entity set and then you are basically guessing.


As soon as you have arbitrary one to many relationships you have relational data so there are a lot of add hock systems out there. As to larger scale ones I often use linq to do in memory query's of objects with relationships.

More limited examples often fit the hierarchical database model such as graphics API's that have can handle object -> Vertex relationship, and or recursive relationships between objects. But, games frequently need 1:M or M:M relationships between things like factions, so there are plenty of addhock API's out there that fit the relational data model.


People mean different things when they say relational model, so to clarify, by relational model I mean a model in which data is represented as sets of N-tuples of fixed structure, and queries are constructed using set-based operations such as filtering and Cartesian product.

Also, when I say path-based access, I mean access that follows predefined links between entities (in SQL, provided by FOREIGN KEY constraints). Those are well supported by object model and ORMs, as opposed to arbitrary joins, which aren't.

You don't need a relational model to represent one-to-many relations, in fact, an object model such as provided by many ORMs could represent them perfectly. In your first example, `figure.vertices` could be a list of vertices associated with a figure, and `vertex.figure` is the figure which owns the vertex. Similarly mutual object or list references could represent other singular or plural relationships. Though I agree it requires referential loops and cannot be expressed well in a pure hierarchical model such as in many novel no-sql databases.


Can you suggest any good references for alternatives to SQL or a more general discussion of path independent data access? My background is such that practically "relational" and "uses SQL" are wholly conflated. I would be very interested to get a broader view. Does datalog fit somewhere in the discussion?



In some ways, I agree with you. SQL is part of the problem. But, correctly structuring data is generally very hard, time consuming, has diminishing returns, and adds maintenance issues that never go away.

I've seen teams spend months up front trying to craft a perfect schema, only to see it demolished when a non-obvious edge case appears. I've also seen teams become over burdened with schema and data maintenance, that it takes up 20-30% of their time. That's not to say that you're not right, but sometimes ORMs are just the more pragmatic answer to get a team through the first 4-5 iterations (or even longer).

And, sometimes path-independence is important. But, often, the database just ends up being coupled with the application anyway. At least that's my experience.


> There is a lot of truth to these charges, but such charges miss a vital piece of context. The object/relational mapping problem is hard.

I suspect that context is precisely what underlies the common critiques of ORMs. Those people who best understand the inherent object-relational impedance mismatch tend to be the very people who conclude that the effort isn't worth it.

In my admittedly anecdotal experience, I have found that ORMs are the most useful for the most trivial queries. For anything complicated, I find it easier to drop into SQL and write the query directly than to compose an equivalent object using the ORM - and that becomes increasingly so as the query gets more complicated.


> In my admittedly anecdotal experience, I have found that ORMs are the most useful for the most trivial queries.

True, but 'trivial queries' generally cover the majority of use cases within your average CRUD application, and that's exactly why ORMs are useful -- you don't end up re-typing 'select bar from fu where id = 1' or 'select * from fu, bar where fu.bar_id = bar.id' etc., etc.

Beyond that, most of the common joins are trivial as well, and can be abstracted away, and a lot of those queries end up being optimized according to known SQL patterns for the common use cases, so you can avoid n+1 and so forth. For these work-a-day queries, ORMs work just fine.

That said, there are tons of situations where ORMs make no sense at all, like complex migrations, large OLAP-style analytic queries, etc. For these, it's important to keep your SQL chops up and know the pitfalls. So, I use ORMs where they work well, and drop into SQL all the time where they don't.

The trick, as with most things, is finding the right balance: don't try to abstract away complex, singular use cases, but also don't duplicate trivial code all over your code base where the ORM has done the work for you.


But ORM's don't actually help for the trivial queries.

I generally find myself writing functions to automate the trivial ones: get_single_field_by_primary_key(), get_row_by_primary_key(), get_values_by_field(), etc. There are maybe 4 or 5 of these. An entire ORM is total overkill.

As for the common joins, in my experience they rarely are that trivial. Sometimes it's a left join, sometimes an inner join. Sometimes there's a limit or not. Sometimes there's an aggregate function.

If you find yourself writing the same query structure 20 times, then create a function for it. But I've never found that to be the case in projects I work on.

I just personally find that ORM's are total overkill in the situations where they can be usefully applied. So the disdain for them kind of makes sense.


> I generally find myself writing functions to automate the trivial ones: get_single_field_by_primary_key(), get_row_by_primary_key(), get_values_by_field(), etc. There are maybe 4 or 5 of these. An entire ORM is total overkill.

Making a homebrew data layer isn't avoiding an ORM, it's building one.

> If you find yourself writing the same query structure 20 times, then create a function for it.

What do you think ORM's do? That is what they do, they provide a standard API for all the common queries.


No, ORM's provide object-relational mapping, and all sorts of functions for building up queries as part of that. They don't provide an API for common queries at all. I don't even know what "all the common queries" would be... beyond get-a-value-for-pkey, every app is totally different in the kinds of queries it needs.

What I described isn't a homebrew data layer or building an ORM, it's just a few helpful shortcut functions for queries.


I hate to break it to you, but there are about as many definitions of "ORM" as there are of things like "functional programming".

What you have described would be considered, by many ORM critics, to be ORM.


Aw, MAAAAN... Oh well, OK...


> every app is totally different in the kinds of queries it needs

There is more to SQL than queries. Every app need insert, updat, and delete, and a basic select 1 and select many matching a predicate. Those can all be automated; that's what ORM's do, they automate the repetitive simple stuff.

More complex queries that strain the ORM's abilities can be done in SQL in a view, and then mapped in by querying the view with the ORM. Or you can just drop into raw SQL when necessary. Every app is not different, they are largely all very similar.


> In my admittedly anecdotal experience, I have found that ORMs are the most useful for the most trivial queries. For anything complicated, I find it easier to drop into SQL and write the query directly

As a recovering DBA, I want to jump out of my seat and say "Yes, exactly!"

In many situations, "simple" queries can easily cover 90% of what you actually need, and simple queries are often an order of magnitude less expensive than even modestly complicated ones.

If your devs are in the habit of writing lots of SQL, they'll often decide to do things like write one complex query to avoid the need for a few trivial queries. From a developer point of view, that makes sense, but the result is that you end up chewing lots of database resources. Databases tend to become scarce resources quickly, and it's usually very expensive to scale up.

In my previous experience, the ORM helped reduce needlessly complex queries, and freed up resources on the DBA side of the house to help optimize the complex stuff that needed to be there.

Granted, I was a DBA a long time ago, and resources were far less plentiful back in the good old days. But I think this is still relevant whenever you get into "big" databases.


> If your devs are in the habit of writing lots of SQL,

> they'll often decide to do things like write one complex

> query to avoid the need for a few trivial queries.

As someone who single-handedly maintains a large codebase and a large database for a large website, I can assure you you've hit the nail on the head but completely missed the point.

Unless you're writing a hello world style to-do list, trivial queries tend to be the vast minority of the application logic needed. Most queries need some degree of tuning, whether that be to pull out specific rows, associate related data, summarise, or sort. Do this right, and you can achieve in a single query what would take multiple, dozens, perhaps hundreds of round trips to the database if written with "trivial" queries.

And let's not forget, it's particularly rare for the "complex" query to be more complex than the application code needed to implement the equivalent task. SQL isn't always beautiful, but it can often express ideas more succinctly than most general purpose languages. One GROUP BY can save you from writing big loops. One INNER JOIN or subquery can save you from subsequent database calls. One HAVING clause can save you pages of twisted logic.

A "complex" query generally avoids the need for a few trivial queries that are comparatively inefficient, both in terms of resources and latency. The goal is to allow the database to do the hard work of extracting every byte you need and nothing else. Databases are good at this shit, and even with a good ORM you're often forced to compromise or maintain two worlds within one app.

---------

That said, I acknowledge that there are multiple legitimate points of view on this debate. I tend to find that the SQL vs ORM debate tends to fall depending on the sort of code that's being written, and the sort of logic that's needed. Desktop apps tend to suit ORMs more readily than web apps. Smaller and self-contained apps tend to survive ORMs more gracefully than broad-scoped apps.

Nothing can excuse stored procedures though. Useless garbage.


I agree with you too, which may sound strange, but I think the difference is in the teams. I think in your case (and many others), my position doesn't make sense. Why? Because you're competent and grok programming and databases, which is an amazingly rare. Also, you're working on a really small team.

In a bigger or more distributed team, you need to have strong standards and governance when many people are sharing one resource. Otherwise, you run into a "tragedy of the commons" scenario where the whole business suffers.


> In my admittedly anecdotal experience, I have found that ORMs are the most useful for the most trivial queries.

Bingo. In other words, non-relational queries work well and relational queries don't. When I'm joining multiple tables and looking for specific rows/objects that match multiple restrictions, ORM turns into a migrane of epic proportions. SQL was specifically designed to solve that exact problem. Abstracting away from SQL will simply increase the pain factor.


> When I'm joining multiple tables and looking for specific rows/objects that match multiple restrictions, ORM turns into a migrane of epic proportions.

Perhaps you're doing it wrong. Put the complex query in a proc or view, and use the ORM to map the result set into memory. Using an ORM doesn't mean not using SQL when it works better.


>> When I'm joining multiple tables and looking for specific

>> rows/objects that match multiple restrictions, ORM turns into a

>> migrane of epic proportions.

>

> Perhaps you're doing it wrong. Put the complex query in a proc or

> view, and use the ORM to map the result set into memory. Using an ORM

> doesn't mean not using SQL when it works better.

that's what he sad: use sql when it works better. if you put it in a proc, a view or your application makes no difference.


That's the view I've come around to. The more I learn, the more I see that the ORM problem isn't just hard, it's effectively unsolvable. They don't map. There is no such map. It turns out to be a general principle that just because something is basically impossible doesn't mean that you can't write code that sort of kind of does it mostly most of the time, but the usually when that happens you end up with an enormous endless tar pit of a mess that even after massive resource expenditure still has lots of quirky problems and doesn't really work. It's better to build your system on foundation that doesn't involve solving an impossible problem.

Another example: "Multitasking" in Mac OS prior to OSX. No amount of hackery can turn a cooperative multitasking system into a preemptive one... but you sure can make a big mess trying. There's a lot of other examples if you look around. Look for a big, endlessly-growing mass of code that never seems to stabilize no matter how many resources are poured into it, and you'll probably find it's trying to do something impossible deep down in its core.


And yet many ORMs such as ActiveRecord let you do this trivially, maintaining as much of it's functionality as you want, such as mapping column types to appropriate pl types. In my experience, even if you had a policy of all queries being written explicitly in SQL by all developers and encapsulating these into your own data access layer, you would still have quite a bit to gain from the well-tested community boilerplate in one of the available ORM packages.


This. Use ORMs for what they do well, and move into straight SQL when it makes sense.


> Those people who best understand the inherent object-relational impedance mismatch tend to be the very people who conclude that the effort isn't worth it.

You're showing a severe bias.

> I have found that ORMs are the most useful for the most trivial queries.

That's really what ORMs are for. Three joins, max. Any more than that and you really can't trust the ORM. It's a computer, after all, not a DBA.


I strongly disagree with this. I'm consulting for a project where I cannot discuss technical decisions about database modelling; sometimes we need to query 20/30 tables or more in a tx, with very complicated mappings 5+ levels deep. If I was to write the sql for this, I'd take 10x the time and it would be wrong anyway.

Of course, this is one end of the spectrum where the ORM just enable me to shoot in the foot faster. I've also worked in the other side of the spectrum where I could use an ORM and generate exactly the sql query I wanted.

At the very least, ORMs can save you substantial time because of typesafe queries (if they are typesafe) and expressiveness (specially on join syntax).

My teacher used to say that you can solve half of your problems with an abstraction layer; I think for lots of cases an ORM does just this.


"If I was to write the sql for this, I'd take 10x the time and it would be wrong anyway."

SQL is the problem here. It's so hard to write reusable, composable fragments of SQL that we've all just internalized the idea that we have to write the query you're referring to from scratch every time, which is a bizarre and annoying intrusion of 1970s software engineering into our 2012 world. You almost certainly have patterns of access you are using, and if you could define those in a more sensible query language, and easily compose them together later, basically DRY out your queries, you might not find this such a daunting prospect, and might not need to use an ORM to paper over this deficiency.

It's SQL that's the real, core problem here, and I mean the language qua query language, not the relational model, not object orientation, not even the impossible mapping thereto, it's that SQL is a terrible, terrible language. We should stop elevating it to the status of One True Query language. If SQL didn't suck so hard it wouldn't be so necessary to try to solve the impossible mapping problem in the first place.


I would agree with you. It's impossible to write a query that can both be composable and also use all of the features of the SQL server's optimizer.


I've done the "kick a design around in my head as I'm driving home" on this problem a few times, and the optimizer is always what ends up stopping me dead. I can create a new query language, I can write the compiler to turn it into an equivalent SQL query, but I have no interest in the monstrosity the language would become if it tried to give you enough control over the resulting SQL to satisfy optimizers. A query language that produces uselessly slow SQL is itself useless, so it's hard to get up the motivation to start a project like this when there's basically no way I can produce something usable.


I've found tools like SQLAlchemy do let you compose queries pretty well.

I've been told ClojureQL has a similar property.


Interesting remark. Although I really can't see any other good alternative at the moment.


I've often felt that SQL was terrible, but I don't know what would be better.


Relational algebra or Datalog.


I guess I shouldn't have made such generalizations about how many joins, and I'll fully admit that I've written a 10+ join query in an ORM and the SQL it generated was just fine.


In other words, ORMs are at their most useful when solving the least challenging problems. To which we must ask: why bother?


I'm going to make an argument here, and you might disagree but that's fine:

If you don't understand the purpose of an ORM, you haven't worked on a sufficiently large enough software project or don't fully understand what the definition of an ORM is.

If you need to write a report of any fashion, do not use an ORM. You can help write the report with your ORM of choice, but at some point you need to get down to the language that the database speaks because you're doing the kind of set mathematics that ORMs aren't made for.

BUT, if you need to add a user permission, and you're doing anything other than:

$user->addPermission( Permissions->get( 'CAN_EAT_CAKE' ) )

you're doing something wrong.


But that doesn't require or imply an ORM. That's just a matter of having an object model.


If your object model at some point gets persisted into a database, then you're going to need to write code to (load from/save to) the database. At which point, if it walks like a duck, and quacks like a duck...


If you have a database, any object model is at some level an ORM.


>> In other words, ORMs are at their most useful when solving the least challenging problems. To which we must ask: why bother?

1. I disagree with your assertion. Ruby's ActiveRecord has scopes that make it very easy to compose complex queries from pieces; start with a simple query, then conditionally add clauses like where, having, joins, group, order, etc. I'm looking at code right now where mixing and matching clauses like that was crucial. It would have been a nightmare to do in raw SQL.

2. Even if you were right, "least challenging" != "least annoying." Washing clothes is a trivial problem, but I'm glad to have a machine do it. Same for writing the same boilerplate SQL 30 times a day.


Because most tasks are simple. If you need to write SQL to put the three most recent blog postings on the company homepage, you're doing it wrong.


Any good ORM will let you write your query directly, and still save your the work of mapping the resultset to a collection of objects.


I agree. There are some micro ORMs in .Net world that do exactly this and lately I have started using Dapper http://code.google.com/p/dapper-dot-net/ that came out of stackoverflow and I really enjoy using this over nhibernate.


And tools that help one write SQL queries more easily (like SQLAlchemy) are so much more useful than ORMs.

The fact that SQLAlchemy also has an ORM built on top of those tools only sweetens the deal.


Heh. I use SQLAlchemy, but I don't use the associated ORM.


That's exactly how I feel, in fact I was inspired to write an article about it when this topic came up on HN a few weeks ago http://verysimple.com/2012/05/14/re-orm-haters-in-defense-of...

As an ORM author I may be a bit biased, but I really think that, like all things, ORMs are a tool that can be used or abused. That's no reason to throw them out. It must have snuck up on me that there was even an anti-ORM thing happening!


    def user = User.findById(25)
    return user.getLocation().getAddress().getState() 
  
What happened during that execution? Was that 1 query or 50? Would it have been better to write a specific sql statement?

To me, this describes the biggest challenges with using or not using ORMs. I don't think there is an easy answer.


>I don't think there is an easy answer.

Uhm, ActiveRecord at least will print the SQL it generated. If you think it's inefficient, you can just… write your own sql that maps out to that query. With Arel/scopes, they're also composable and lazy loaded. What's not to like?


I find that the opposite way is way harder to debug: I see that a controller causes a bunch of SQL queries, but where do they come from? In the view? In a helper? In another library within the app?


Active record tells you where it came from in the log.


At least in Hibernate, you can set to print the actual queries in console (or text file, depending how your logger is set).

http://stackoverflow.com/a/2538954/228692

That way, during development you can see which are the actual statements executed. I say during development because after that, in production, you'd be better off filling log files with these stuff :)

And I can answer your question (for Hibernate), that will be one query.


Because people are scared of things they don't understand? Or because they had one bad experience, years ago, with some home-grown crappy ORM, and now they assume all ORMs are bad? The reasons are legion...

The second of those is what I saw at my last job. The tech-lead / architect guy was vehemently anti-ORM... because a previous group (at the same company) had rolled their own ORM in Jython and embedded it into the Java codebase, and it A. didn't perform well and B. was extremely fragile and hard to modify. Now, you can't explain to him that there are well written, productive, performant ORMs like Hibernate. He just doesn't want to hear it. sigh


Or maybe people expect to interact exclusively through the ORM and when they have to do something more complex, conclude that the ORM is useless. When using an ORM increases the complexity of the code substantially over direct SQL, I'll usually switch to more comprehensible SQL.


Well, at least in the case of Hibernate, it's easy enough to drop down to native SQL when you need it (although you obviously risk giving up database independence) and you can still keep the OOP "feel" by defining query post converters and what-not.

The points made above about "don't use ORM for reports" aside (a point I strongly agree with, btw), I've found Hibernate makes my life a lot simpler and rarely gets in the way or fails to do what I need it to do. But I do occasionally create a native query to deal with some edge case or whatever.


Let's define a good ORM though.

A good ORM should be able to handle a database designed for a different application, normalized to 5NF or BCNF, using composite primary keys, composite foreign keys, and other solid db design concepts, and be able to reliably generate queries to bridge the gap, and should do so where sets of objects are required in a way that does not unduly burden the database with small queries that could be consolidated.

That's a hard problem. There are some ORMs that come pretty darned close if not there. Many more, in my experience fall well short.


What ones do you like? The Django ORM makes me want to vomit every day or so.


The Django ORM is widely regarded as being vastly inferior compared to dedicated ORMs like SQLAlchemy. It's more of a 'get the job done, quick' solution (that works for simple use cases, but becomes difficult to use for anything more).


The best one I have looked at was DBIx::Class


I think the hate comes from two sides:

1) Relational folks who feel that ORMs don't really enable good relational design, and

2) OO folks who feel that ORM's cause a lot of headache regarding relational to operation data structure conversion.

The folks in camp 1 are usually absolutely right. Most ORMs encourage the developer to design the database around the ORM which has to be an antipattern when it comes to relational systems. This isn't always the case. DBIx::Class has done some awesome work on some things like composite primary keys and the like. But generally in most cases it is true.

On the OO side I am less convinced of the costs. So my complaints come from the relational side. Yes object to operational data structure mapping is hard to automate, and it's even harder to automate well. So maybe we shouldn't try. Maybe ORM's should be disregarded in favor of ways of designing the data so that abstractions occur at different points.

In the LedgerSMB project we decided to use stored procedures to accomplish this abstraction. We don't put "all the logic" in the database, far from it. But database operations go through the stored procedures, and these are often well tuned for performance (or as well tuned as they can be). The stored procedures thus create an API for addressing relational data, and this allows an abstraction layer to be formed which maps these into object methods. We'll probably extend this further in the future to tie object properties to complex database types, but we haven't decided the best way to do this yet.

ORMs get a lot of hate because they tackle a very difficult problem and consequently tend to produce results that are unsatisfactory particularly to db folks. They are the product of app designers looking at the db, more often than not, rather than db designers looking at the app.


> In the LedgerSMB project we decided to use stored procedures to accomplish this abstraction.

Why? Stored procedures are nothing more than a bolted-on way to store SQL queries on the database server. Yes, you've create an "API" and an "abstraction" but you've done nothing you couldn't have done in any other language.

> The stored procedures thus create an API

I consider stored procedures to be little more than a shitty "API builder" thrown in for free with your database.

* The programming language generally sucks

* Passing in complex data always sucks

* Source control is non-existent (or sucks)

You've probably had to write an API layer within your code anyway, to deal with the remaining logic requirements, why not store those queries within that layer? Or put them in a further API layer within your own source code?


With PostgreSQL the programming language can be almost anything you want (I have some stored procedures written in Python), the data structures can be reasonably arbitrary (you can build your own data types easily, the better database libraryes already support arrays and dictionaries, and in 9.2 JSON is being shipped by default), and the source control problem is easily solved by having pg_dump output your stored procedures to a file.

The key advantages are then that 1) if you need to take the output of one query and use it in another--especially if they need to be in the same transaction--you can bring the round-trip time down to nothing and 2) this part of your abstraction can now be modified as part of a transaction, making changes to table schemas (which PostgreSQL supports as parts of transactions) something to do painlessly and without downtime on production.


Most of our stored procedures are currently in PL/PGSQL. I don't know that I would say the language sucks. In fact where one is doing set operations, the language is wonderfully expressive and flexible, and one can get more done in it faster than any alternative, again where the operations are set operations.

SQL has some very serious downsides and these are persent in PL/PGSQL, but these aren't that hard to avoid with good db design. Don't get me wrong, I have seen some really horrible PL/PGSQL stored procs in my day, but it doesn't have to be bad code.

As for an additional API layer, yes there is one but it is pretty thin and most of it is pretty heavily automated. I would say that at least 75% of perl module code is just identifying methods to map to stored procedures, so there isn't a lot of code there (figure three lines to map, a sub name, a sproc name, and closing the sub).

What this allows us to do is support multiple applications in multiple languages while offering consistent security and database functionality to these applications. The DB isn't just a data store, it's essentially a meeting point potentially of a larger number of applications.

We have considered adding PL/Perl and PL/Python stored procedures but since most of these are set operations and so they are things that belong in db queries.

As for source control I don't know what really would separate this from a compiled language. You can think of loading stored procedures into the database as essentially like compiling a program. As we move to future versions this will become an even closer comparison, as we start bundling these using PostgreSQL 9.1's extension framework.

Edit: Our stored procs are defined in text files that are checked into a source repository. They are also designed to be rebuildable, and there are utilities for rebuilding stored procs included in the software.


I personally haven't heard/seen anyone griping about ORMs for a couple of years now. I thought the debate was over. Much like you rarely see anyone talk about stored procedures these days.


Most of the people I know who gripe about ORMs have either never used one, or tried to apply one in an instance where an ORM is not applicable.

ORMs fit a very specific use case - eliminating the tedium and risk associated with writing simple queries, and creating a simple layer of abstraction from the database. If you mention ORMs and someone starts griping about how it made their accounting reports slower, slap the shit out of them.


Please do come and have a talk with my (pointy-haired style) boss :) .

I take it you think stored procedures are generally good (at least that's what I think). But some people still believe that they'll migrate database architecture or some such. Also, some ORM's aren't fit for the task and give a bad name (Microsoft's Entity Framework v1 was especially horrible)

I'm definitely out of touch and I spend way too much time working with Microsoft SQL, but do people treat relational databases as "a crazy aunt" as Fowler says? (I still haven't tried NoSQL)


Re: SPs, no, I meant people used to say all your data access should be through SPs, which seemed to turn about 5-7 years ago as the general consensus moved to parametrized queries and freedom from DBAs overseeing your changes. I haven't seen anyone say that in a long time now. I used to use them for complicated TSQL and helper functions, like some crazy recursive tree building stuff, but these days I haven't actually written one now in a couple of years.

On a more individual level I'm also now finding myself even shying away from complicated SQL queries. I'm finding these days they're unnecessary 90% of the time. Often it's actually faster and more maintainable to pull out a larger amount of data and then do the more specific calculations in code. Even with good indexes, etc.


Heh, I'm probably 15 years behind then :) . Thank you for the reply :)

I strongly dislike writing "strings with sql" inside my code, though I do use parametrized queries these days.

I do appreciate freedom from DBAs as a feature :) but I have access to my stored procedures these days (though I don't know for how much longer).

I work for an insurance company with an awful legacy database structure, with mnemotechnic table names like S0001... to S99999, and disregard for naming conventions or even normalization sometimes. It's not making me a better developer :( unless learning by bad examples counts.


> In the 90's many of us (yes including me) thought that object databases would solve the problem by eliminating relations on the disk. We all know how that worked out.

Maybe I'm too young (or too old) because I don't know what he is suggesting here. I've used an object database on a large commercial project, and it was a dream: everything I like about ORMs, and none of the drawbacks.

My impression of OODBs, therefore, is that they were a technical success, but a commercial (and open-source) failure. But I don't think that's what Martin is suggesting. Can someone explain what "we all know" about database history here?


I think it's all about getting to know the tool (ORM) you work with. Look at the generated queries and their performance, use eager/lazy loading in the right places and mainly take care of the database operations that run most often. Switching to plain SQL might make sense if you don't need structured/hierarchical objects at this point in your application and a simple array/list/table will do just fine. Don't bother with that if your hand-generated query will most likely end up the same though. Use database-specific bulk imports or plain SQL if you have lots of data to insert and need it fast.

I am also often seeing comments about ORMs saying "What's so hard about SELECT..." - nothing's really hard about retrieving data, every decent developer will sooner or later end up with a working complicated SQL that gets them exactly what they wants. But I think the real benefits of ORMs start when you have to save/edit/update/extend relational data again and have to take care of all the inserts, updates, deletes across several tables, maybe including versioning or workflow (drafts or changes waiting for approval) and other additional complexity on top. That's the parts where I am glad to rely on an ORM supporting relationships and which can be extended via hooks or behaviours. (Note that I am mainly talking about 'Admin' pages here and nothing that runs too often.)


"You have a relational mapping problem. 'I know', you say. 'I'll use an ORM.' You now have two problems."

That just about sums up my experience with ORMs. Of course, like all things in the real world, experiences vary. However, I do think that generally, ORMs solve none of the difficult relational mapping problems and adds another layer of abstraction that complicates things like performance tuning to the point of negating any other benefits they may bring.


Have you actually read TFA? Fowler's point is that yes, ORMs don't solve all of the really hard mapping problems, but save you a lot of boilerplate on the other 80-90%, and the hallmark of a good ORM is that it allows itself to be bypassed with relatively little hassle for those hard problems (like performance tuning).

Also from TFA: what do you suggest using instead?


Come on - how hard is it to write

"select id, username, email, nickname, date_registered, pass_hash, first_name, last_name, middle_initial, home_phone, mobile_phone, work_phone from user where username=?" and manually do escaping whenever you want user info?

You seem to be suggesting that

user.findByUsername('joe')

is somehow more worthwhile or usable. It's certainly more noble, because an ORM doesn't solve 100% of a set of problems 100% of the time. OBVIOUSLY it's better to never use them, ever. In any circumstance.

"select id, username, email, nickname, date_registered, pass_hash, first_name, last_name, middle_initial, home_phone, mobile_phone, work_phone from user where username=?"

is really more readable because I always want to know every single piece of data, and read it every time I look at code (and update all my SQL queries every time I make a schema change) all the time. I'm finding that my monitor is often devoid of code - I have too much space on the screen just begging to be filled up with useful boilerplate code so I don't ever forget all the column names in every table.


copying my answer from virtually the identical commentary made on reddit ("why should I use an ORM when I can just write the SQL?") (not to mention Fowler's article makes this same point pretty clearly):

ORMs do not claim to be "simpler than SQL" and this has nothing to do with the purpose of an ORM. There is nothing that's ultimately "simpler than SQL", if in fact you aim to just write SQL.

The purpose of the ORM is to marshal data between an object-oriented domain model and a relational schema, to translate a wide array of common relational patterns with corresponding domain-model patterns (think foreign key to collections, for example), and to express relational queries in terms of the object model at least to some degree.

All of this is well outside the realm of just "writing SQL queries". The ORM is a tool used to integrate relational databases with an object oriented application layer, and to that degree they make this task a lot less effort than doing it by hand.


You pretty much nailed what I was going to say. Yeah, I found it a bit strange that the only alternative to using an ORM that the author could imagine was to roll your own? What's the problem with using SQL directly?

To answer the grandparent: I did read TFA, and I don't buy it. His claim that a good ORM makes 90% easier and gets out of the way of the remaining 10% doesn't match my experience with any ORM at least. Perhaps I'm wrong! All I'm saying is that from what I can tell, using an ORM buys you nothing worth buying.

edit: Ha, you're being sarcastic. No, I'm serious. Are you saying that using an ORM solves escaping problems? I honestly have never connected those two. That to me is a problem calling for a set of escaping/unescaping functions, not an object-relational mapping. To each their own...


I'll try to clarify what I mean, since my reply to the other post is somewhat confused. I should read more carefully before posting. Anyway.

First of all, is your data suitable for storage in a relational database?

If not, if you end up having to do tons of joins and every table has references to other tables, something like Redis is probably a better fit - I'd move away from ORM/relational at that point.

If it is, then what does using an ORM buy you? Simple queries are simple to write and maintain, so in my experience you don't gain much there. More complicated queries are not handled well by any ORM I've seen, so there you'll want to write them in a relational language anyway.

So, at least the situations I've had come up have both resulted in moving away from the ORM, either to a non-relational database or to keep the data in a relational-friendly format. A lot of the nitty gritty of writing relational queries by hand (as noted, escaping, listing the fields to be queried etc) can be abstracted by a set of helper functions, and maintaining these is much simpler than fixing problems with the relational mapping. For example: you have an inefficient join. Using an ORM, you'll be solving this problem indirectly either by massaging the ORM or bypassing it entirely. If your join is directly expressed in SQL, you solve the problem by modifying the join. There's no additional headache involving figuring out how the object model turns into tables and queries.


Data that's not a good fit to be represented as relations is a completely different issue, and I agree with you on that.

Simple queries maybe simple to write, but they're still boilerplate you're better off without. An ORM lets you work at a higher level of abstraction.

> A lot of the nitty gritty of writing relational queries by hand (as noted, escaping, listing the fields to be queried etc) can be abstracted by a set of helper functions, and maintaining these is much simpler than fixing problems with the relational mapping.

Except that you just started to roll your own ORM, and I can guarantee that maintaining it will not stay simple for very long.

> If your join is directly expressed in SQL, you solve the problem by modifying the join.

Yep, that's what you do with a good ORM.

>There's no additional headache involving figuring out how the object model turns into tables and queries.

Wrong - as long as you have a relational DB at one end and an object model at the other, this additional headache is unavoidable.


The problem is there's plenty of trivial work for an ORM to do (mapping types, simple queries, etc). If you roll this yourself your implementation will be buggier than a major open source ORM.


Yeah. The mismatch between how devs expect to use the ORM, and how the ORM expects to be used, causes enough problems in my experience to make the ORM not worth it in the first place. I'm tired of seeing crazy-long NHibernate stack traces and exception descriptions because, like, a column got altered.

Writing data-retrieval code is tedious, and nerds always want to automate tedious things. But sometimes the cure is worse than the disease, and the effort to eliminate one type of tedium gives rise to a new, fancier kind of tedium.


I've used both Django ORM and SQLAlchemy and they are generally excellent, allowing you to use SQL for anything that's just a bit too complicated for the ORM, but greatly simplifying day to day dynamically generated queries.


I've always wondered what the mathematical basis behind OOP is. I mean, relational databases are grounded in first-order logic and set theory. If there was a mathematical theory that grounds OOP, then I'd love to know because then there may be a way of finding a mathematical basis for ORM.

Anyone who wants to provide the answer to this one would be my new best friend :-)



The single biggest disadvantage of ORM for me is that in exchange for easier programability, they force you to learn a proprietary, arbitrary, ad hoc DSL that is less consistent, more complicated and less powerful than SQL.


It depends on what "proprietary" means. In Java, ORM is pretty much standardized with JDO/JPA.


or HQL. which is pretty much exactly what this guy means.


My why is more of a feeling/observation/theory so please feel free to share your thoughts and feelings:

Some developers hate, and are almost threatened when something fancy and complex they might do is trivialized, thereby trivializing them.

There is no question that something like ORM makes some things easier, and other things harder. It applies equally well to SQL and which flavor of database you end up using, or whether you're using ActiveRecord, etc., or not.

I would hope developers who find something that frees to solve user problems instead of technology problems would be a good thing, but maybe it's just me.


I don't like the implementation of ORMs as run-time layers. I very much prefer using a tool that generates code. It's orders of magnitude easier to maintain... for me anyway.


There's some serious miscommunication going on every time this discussion comes up.

A lot of people seem to conflate all data abstractions with the term "ORM" (which has a very specific meaning), therefore assuming everyone who thinks ORMs are a bad idea are arguing against all forms of data abstraction.

This is not the case.

One of the better articles on this: http://seldo.com/weblog/2011/08/11/orm_is_an_antipattern


I really have never understood the ORM hate. I've found them to be immensely useful in 99% of circumstances, and for the remaining 1%, a good ORM will always let you fall back to raw SQL. Aside from providing a simpler syntax for performing basic queries, there are a few features that ORMs provide that have greatly simplified my life:

1. Automatically using prepared statements and validating/escaping query arguments to prevent injection. You have to be quite a bit more careful when you're working with raw SQL.

2. Providing an clean API to construct complex queries.

This typically becomes an issue when you have a query where you are filtering and/or sorting by multiple fields which are specified by the user, some of which are conditional. If you're writing raw SQL you end up needing to do a lot of string manipulation which can get fairly messy (and makes the code more difficult to understand). An ORM which provides some sort of query builder syntax that lets you do:

  if (some condition):
    query.addWhere(clause)
  if (some other condition):
    query.addWhere(other clause)
  etc.
is pretty convenient.

3. Collating repeated result rows from joins

Whenever you're working with joins you end up with repeated data in your result set, which you generally end up having to collate before display. For example if I have Recipes <- (many to many) -> Categories and do a query to load the both of them, I might end up with something like this:

  recipe_id | recipe_name  | recipe_ingredients | category_id | category_name
  --------------------------------------------------------------------------
  1         | Shrimp Pasta | 1 cup tomato sauce | 1           | Pasta
  1         | Shrimp Pasta | 1 cup tomato sauce | 2           | Seafood
  1         | Shrimp Pasta | 1 cup tomato sauce | 3           | Shrimp
  2         | Fruit Cake   | 4 cups flour...    | 4           | Dessert
  2         | Fruit Cake   | 4 cups flour...    | 5           | Cakes
Without an ORM I have to loop through the result set to re-format the data the way I wanted before displaying it. An ORM takes care of that for me and gives me back 2 recipes with their categories accessible via recipe.categories.

4. Simplified manipulation of many-to-many relationships. Following the above example, if I want to add a new Category to a Recipe I can simply do:

  recipe.categories.add(category)
If I want to set the categories to something entirely different, I can do:

  recipe.categories = [category1, category2, category3, etc]
Without the ORM I would have to manually sync up the entries in the join table which is kind of a pain in the ass. Working with join tables in general is rather obnoxious, so I'm quite glad that the ORM takes care of that one for me.

5. Some ORMs give you notifications when an object (or collection of objects) changes. This is pretty important on the client side when you want to make sure the data you're displaying stays up to date, even as it is being manipulated. For example: if I'm viewing a recipe on my iPad and I update that recipe on my desktop. A background thread is running which keeps the two synchronized, and at some point the underlying recipe is updated in the database on my iPad.

If I'm working with raw SQL there's basically no way to know when that object is changed (short of polling it periodically, or rolling your own notification system). But my ORM will keep me notified of changes to the object so I can refresh the user interface with the updated recipe after the sync completes.

6. Some ORMs implement a unit of work that allows you to track what changes have been made to an object since it was retrieved from the database. So you can easily see which fields have been modified, and then when you go to save the object back out, it will intelligently only issue the SQL to update the columns which have changed, or won't even touch the database if nothing has actually changed.

7. Some ORMs put their objects into an identity map, so if you query for the same object under multiple different scenarios (e.g different areas of your UI), you always get the same underlying instance back. This means that you don't have multiple copies of what is semantically the same object floating around in different places of your app, and the object is always up to date with the latest changes.

Note: my use case is typically client-side database backed software, so features like (5), (6), and (7) save me from having to do a TON of work. If you're doing more web oriented stuff, I can see how those particular features may be less useful to you. Still, I think ORMs are a huge win overall.

Of course, none of this absolves you from needing to know what's going on at the database level. You still have to know what SQL your ORM is generating in order to make sure you're using it correctly. But I get really confused when people badmouth ORMs and try to tell me that it's simpler to use raw SQL, cause it never has been for me and my use cases. And the funniest thing is, if I stuck with raw SQL while attempting to solve all of the problems I listed above, I probably would end up with a half-assed version of a full-fledged ORM anyway.

(The only use case I can think of where I'd prefer to use raw SQL over an ORM is with report generation type activities: usually those types of queries aren't very dynamic, they can often be too complex to be expressed via the ORM's API, and once you have the data you're just dumping it to display without worrying about interactivity anyway.)


So now suppose we refactor your database a bit, since all cakes are deserts and we don't really need to store that the fruit cake is both a cake and a dessert.

Suppose we make category have a self-join (add a parent_id field that references category(id)).

Now we want to make sure that when we want to list all deserts, all subcategories are listed too. Assume the possibility of arbitrary depth.

How easy is tht to do in your ORM?


Good question. I can't speak for all ORMs, but both SQLAlchemy and Core Data provide direct support for self-referential relationships

SQLAlchemy lets you drop down to raw SQL as well if you need to, while still taking care of mapping the result set to objects for you.

Core Data is less powerful (and technically not an ORM), but it's the defacto standard on iOS so I'm pretty much stuck with it.


So if you have self-referential relationships of arbitrary depth then can it generate clauses like WITH RECURSIVE or CONNECT BY? Or are you functionally limited to one level of self-joins without dropping to SQL?


WITH RECURSIVE is supported in SQLAlchemy now, although the syntax is rather verbose. Not sure about CONNECT BY, I think you may need to drop to SQL for that.


that's pretty cool. (I would assume that since WITH RECURSIVE is the standard and CONNECT BY is Oracle's invention that either it would be handled by db-specific extensions or just be another cost of running Oracle)


Surprised by the pro-ORM comments so far.

My hate for ORMs grew proportionally with my usage of NoSQL. It's such a pleasant experience when your domain model fits the data model. Graphs are a good example, but Redis is a better one. When your intention fits a Redis data structure, it's just programming bliss - a few lines of explicit and simple code.

The only way I can use a relational database now is with Sequel.


From the article:

I think NoSQL is technology to be taken very seriously… But even so it only works when the fit between the application model and the NoSQL data model is good. Not all problems are technically suitable for a NoSQL database. And of course there are many situations where you're stuck with a relational model anyway. Maybe it's a corporate standard that you can't jump over, maybe you can't persuade your colleagues to accept the risks of an immature technology. In this case you can't avoid the mapping problem.


I'm confused by your statement. When your data fits the tool that was designed to work with that data...everything is great? Isn't that the point of using the right tool for the right job?


What's confusing? For years...decades...developers have been taking a one-size-fits-all approach to data storage - relational databases.

Now that we are starting to make use of more specialized tools, we're starting to go back and question (or hate) our past approach.


it depends so much on the context. i work with a wide range of different technologies. when i'm doing "enterprise" stuff, i tend to avoid the heavy orm like hibernate and go for things that give me more control (spring's helper classes, say). so you can say that i am "avoiding orm".

but then recently i've been writing c code that needs to use a small database and boy, it's tedious working with sql directly. so i'm actually writing an "orm for c".

it's all about balance...

[the orm for c thing is basically two layers - a simple library that helps make sql queries a little easier to mangle and then a python library that, given some structs and a database, will auto-generate some basic boilerplate via the library. nothing fancy, but it will save a pile of tedious work.]


That is really bad ideological myopia there. He refused to consider alternatives (for which I would adovcate functional design or something like sqlalchemy).

"Actually I think this is an inevitable consequence of using a relational database - you either have to make your in-memory model more relational, or you complicate your mapping code."

No its an inevitable consequence of trying to think with an object-oriented type system to describe a set-based data collection! In other-words the problem is attempting ORM: he's begging the question!

I would expect better from Fowler, but it does go to show the futher into an ideology you get (heavily design-patterned OOP in fowler's case) the more you see design and programming problems as problem with OOP rather than anything more fundamental.


Adding to some of the anti-orm sentiment above: I always find that orm's are trying to solve a problem that doesn't really exist. The proportion of time (and therefore cost) spent on a project writing SQL is orders of magnitude less than time spent on things like performance, scalability, user-experience, testing etc. You might have something that only maps data from your results into your language of choice but, again, if small, slightly repetitive tasks like this are affecting your ability to deliver your software, you have bigger problems.

Sure - database design is important - but that you have to do that regardless.


For the same reason PHP gets so much hate - both are tools with valid uses, but because they make hard things easier for people, that means that people who have no idea what they're doing can do very bad things.

Hating on a tool is ridiculous.


The core of fowler's article is, "what is the alternative to ORMs?" - and the answer is there isn't much other than rolling your own.

This is not at all analogous to PHP where there is an abundance of alternatives to it's mediocre design. PHP isn't hated because of its concept - a "web-based scripting language". It's hated because it's done very poorly. Just like a lot of bad ORMs Fowler refers to.


Funny you should mention PHP. I suspect that a lot of the ORM hate comes from people forced to develop PHP and witnessing the epidemic of NIH syndrome among its web frameworks, every single one of which seems to come with its own half-assed ORM.


There is nothing wrong with PHP - it runs multi-billion dollar companies and it is the most ubiquitous language on the web for a reason.


Because it has near-zero barrier to entry. That doesn't speak highly of PHP as a language, just on how its deployment strategy is designed.


I don't think Facebook or Yahoo chose PHP because it was easy to write.


I think that is exactly why Mark Zuckerburg chose PHP when working on TheFaceBook back in 2003 and 2004. It was easy to write and deploy.


you are incorrect.


Yahoo's rationale for choosing PHP:

"The short development cycles needed to stay ahead of the competition demand a web-centric scripting language that is easy to maintain and update."

http://public.yahoo.com/bfrance/radwin/talks/yahoo-phpcon200...

They didn't choose Python, or Ruby, or Java, because none of those languages are as expressive or shut-up-and-get-out-of-my-way as PHP. Just because a high schooler can write PHP doesn't mean it's a toy not also meant for world class engineers.


PHP is more expressive than python, ruby, or java. I see.


It's success is largely due to it having the correct execution model. The request based execution environment makes iterative development quick and easy.

However there are plenty of bad ideas in the language itself.


there's a lot wrong with PHP, here's a good place to start: http://me.veekun.com/blog/2012/04/09/php-a-fractal-of-bad-de...


Just because PHP doesn't 100% doom you to failure, doesn't mean there is "nothing wrong with it".


I would say there's a difference. PHP is bad because it has a wide variety of objectively awful design flaws, its standard library is a conflicted mess, and it's unperformant. Implementing it into a modern architecture is highly correlated with sloppiness and technical debt.

ORM is bad because the impedance mismatch between an object and a relational model is very difficult to get right.

Frequently, for example, one iterates over collections of objects in the OO paradigm. A naively written ORM, or one without sufficient introspection into the loop intent, can translate that into N select statements, each incurring a network round trip.

The end result is that OO programmers have to understand not only their object code, but also the relational model, and finally too the ORM's peculiarities and suboptimalities. So in attempting to solve one problem, most of the time you end up with three.

That said, there is a sweet spot in ORMs for simple code (e.g. most web apps, where ActiveRecord and Sequel and the like are fine). But there's no sweet spot for PHP code. Every line written is technical debt.


Not all problems are technically suitable for a NoSQL database.

You might as well say "I'm a big fan of JSON, but not all problems are technically suitable for a NoXML data format." There's nothing magical about SQL, as proven by the fact that so few apps actually use its abstractions! The reasons to use SQL today are pragmatic; battle-tested engines, tools, etc. Most developers will never write their own ORM or database, but the ones who are up to it are starting to explore the design space quite profitably.


Throughout history there has always been some hate towards a platform that inherits and simplifies another platform. In truth, it would be more efficient to talk directly to the processor, but that takes a lot of time. And sure, ORM isn't the fastest way to data, but the deficit is well made up in the end result. You can't utilize ORM without an understanding of data. Suggesting that developers are leaving semantics behind is absolutely not true in most cases.


I think it's basically the O part. In theory, one might have a delightful ontology of object oriented code, like Animal->Mammal->Cow , but in reality you often have something much more like

CompanyNamePersistentObjectBaseClass->Entity->ExtendedPersistentEntity->CacheFactory->Entity(but in a different package)->NeedToAddAPropertyForJustThisReleaseIPromiseThisClassIsGoingAway->IdAddThisToTheBaseClassButIReallyNeedToRelease->Shape->Triangle->XYCoordinates

and then somebody wants to cache these things in hibernate.


Your comment is screwing up the page layout. (Firefox, Win7)


I for one never touch an ORM if possible. Coding for SQL in code is not that difficult, but sure it leads to SQL lock in. But you don't expect to change your underlying data model and scheme so often, and if you ever do, you most probably need a complete redesign.


He comes close but still misses the overall general issue and condition.

These are facts:

- Relational and/or heap representations can be fairly universally represented e.g. heap on FS aka VM

- "impedance mismatch" problem is peripherally about latency and more generally present regardless of heap/store (hey..) image representation medium

So in general:

Any in-memory store process that is distinct from an in-memory heap process will still face the "impedance mistmatch" unless

a) a compiler bridges the semantic gap between the two representation domains

b) a mediator provides an interaction means between the two distinct processes

A very specific case of this general condition is:

[in-memory object language runtime] <--driver--> [FS based relational DB]

Fundamentally the condition is that of 'networking' so the question immediately presents itself as to "what is distinct from general networking that is causing the tedium of mapping?".

Ad-hoc messages of potentially unseen semantics would certainly cause an "impedance mistmatch" in a networking setup. Think of the nightmare of infrastructure to do the boiler plate, etc. (Look familiar?)

So, queries are the raw nerve that is exposed by this type of coupling.

What is a query? What is (the meaningful consequences of) a query in context of heap and/or store?

A heap typically maintains a 'collected assembly of facts' e.g. a set of related objects (eternalban's comments trail collection object and various other data) all nicely stitched up by a static compiler and/or dynamic runtime. Same set of facts is distributed in a potentially unique manner across an unknown set of store media. A query in store is 'work'. It is also more precisely 'a functional computed in a specific time and space'. (CAP anyone?)

So we know the answers to each and every one of these adjacency issues in the simple form. The runtime memory object model process <-> RDBMS is a very common system pattern that unfortunately exposes a whole truck load of these difficult interfacing issues in one lump scary form. It is a lot of 'work'.

What is necessary to get around this is entirely against the prevailing trends arguing for "simplicity" as this requires infrastructure level plumbing for code at the same level of care and attention as that given to analogous networks for IP.

tldr; ORMs (or any other magic layer) for independent actor system A-B coupling works best when the interaction patterns are bounded and statically known. Any increase in runtime uncertainties will necessitate working partially outside of the framework hand-holding and may even be more tedious. Choose your tools appropriately.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: