Hacker News new | past | comments | ask | show | jobs | submit login

When I was inexperienced I feared ORMs because of the negative performance impacts I've read they could have. I constantly worried about what would happen if the amount of data increased and I hit ORM induced problem that I could not resolve without major rewrite of data access layer. However, whenever I've actually hit those problems in production, I found the similar thing the authors of the article did - ORM induced performance problems can be fixed by 1-5 lines of code changes, if you knew where the actual problem was. In fact, I learnt that there's no "ORM induced performance problems", there are only problems induced by lack of understanding of how ORM works.

As for the knowing where the performance problem is, I find that skill (that I think should be basic) is in fact very elusive in the engineers I encounter. When I interview people, even in what would be upper intermediate to senior level in terms of years of experience, alarmingly small number have ever done or could do (or even would do) performance profiling like described in the article. Yet they do describe how they changed this ORM for that ORM, this DB for that DB, this language for that language, in the name of higher performance.

I've seen or heard of:

- Teams spending weeks exchanging SQL DB for No SQL DB because of unsolvable performance problem. When hitting the same problem with NoSQL DB, they find that addition of a simple index is solution in both cases

- Teams spending weeks exchanging Hibernate for OpenJPA in a complex application, because of performance, without doing any performance analysis, because they've read article that says Hibernate is slow

- Teams choosing complex architectures they don't really understand, for performance reasons, without being able to articulate performance requirements of the system they're building

These days, whenever someone mentions performance as a reason for anything, I judge their competence based on their response to the question "And how are you measuring and monitoring it?"




‘Unsolvable performance problems’ that could have been fixed by adding an index??

How did these team members pass their job interviews?

By practicing algorithm puzzles?


Sadly, this occurs far more often than you might think. Large companies especially do it to themselves given how they structure their teams with developers and DBAs being on different teams and reporting to different managers. (i.e. you can trace their respective management chain separately until just below the C level) These reporting structures are problematic since large companies tend to hire people with very narrow skillsets which makes it vital that these groups work together. The end result is that you have teams throwing things over the wall to each other not knowing or caring what is happening on the other side of the wall... dysfunction by design.


Funny story: at a, let's say, Fortune 50 tech company there was an investigation into why performance for a certain database query had become atrocious. The problem was that the query was against a table that started out quite small and then grew very large. And the database had been configured to use "query plan stability" to improve predictability of performance. However, the query plan that the db originally came up with for that nearly empty table was a full table scan, which was actually the fastest method under those conditions. Yet it continued to use a full table scan even as the table grew to many tens of millions of rows. It was a simple matter to switch the query to actually make use of the indexes that already existed.


Well, knowing that you need to profile is one step. But then you have to know HOW to profile.

Many years ago, a team member got asked to figure out what the performance implications would be if a specific application were to be changed from PostgreSQL to MongoDB (Mongo was very early at the time). That's a very difficult question to answer in general, but the way he decided to approach the problem was: create two programs. One would ask to connect to PG, grab the current time, run a query (once!), grab the time again, disconnect. And that's it. The other program would do the same for Mongo.

His 'findings' pointed out what some people were already expecting, that MongoDB was magically faster than PostgreSQL. Which was odd, as they both were running a single instance, with a simple data type, which PG should have zero problems handling.

I pointed out that he was measuring connection time too, which is slow on PG. He said that wasn't the case, because he started the timer after connecting to PG. I replied with "No, you are starting the timer after asking to connect to PG, you don't know if the connection is made at that point. Run a simple query first to be sure." He wouldn't accept that.

Turns out that not only that was correct (the mongo library would connect immediately, the PG one would defer until needed), but none of the DBs had indexes defined. So no numbers made any sense, the tested query was not based on any observed production queries, it would only be run one time, etc.

I have not seen a more meaningless 'performance' testing since. But the Powerpoint graphs looked pretty, were only management in the room, they would have likely be convinced to migrate.

All that, for a badly written application, that a single box with SQLite would have zero problems handling.

One the other side, I had a coworker implementing some simple, but effective, profiling of a Rails application. Slow path was traced to the caching code, specifically the code that generated the hash key. That was replaced with a better version and we got massive speedups.

Alright, I think I'll add some questions on profiling in my current team's interview process.


How would you test a CRUD app with a DB?

The way that I do it now it's just use some simple functional test to GET / POST and measure the time in a pytest script, I'm sure there are better ways to do it.


The fact that you feared ORMs probably pushed you to use raw database, and such experience really helps you understand how ORMs work.

Same is true with any abstraction – if you want to learn to use it right, first learn to make do without it.


Like so many things, it's useful to roll your own, not so you can use it, but so you can understand the complexities and trade-offs inherent in the problem space.

I've written a few ORMs before, and I can do it with fairly succinct and concise codenquickly if needed. I still reach for the full ORM from the beginning, because swapping out layer is painful, and you always want to do it sooner than later.


> Teams spending weeks exchanging SQL DB for No SQL DB because of unsolvable performance problem. When hitting the same problem with NoSQL DB, they find that addition of a simple index is solution in both cases

I don't get the idea that one has to pick either SQL or NoSQL, but a lot of people seem to think this way. Why not use both? The SQL portion can more or less be treated as a rich index of relationships, and the NoSQL portion can handle performance when it makes more sense for data to be embedded in parent documents.


Indeed, there is no reason to pick a "side". All big systems I have seen either have both or would benefit from both.

In fact, our current system has PG and several NoSQL databases, because each product has its own strengths.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: