When I was inexperienced I feared ORMs because of the negative performance impac...

raarts · on June 28, 2018

‘Unsolvable performance problems’ that could have been fixed by adding an index??

How did these team members pass their job interviews?

By practicing algorithm puzzles?

blihp · on June 28, 2018

Sadly, this occurs far more often than you might think. Large companies especially do it to themselves given how they structure their teams with developers and DBAs being on different teams and reporting to different managers. (i.e. you can trace their respective management chain separately until just below the C level) These reporting structures are problematic since large companies tend to hire people with very narrow skillsets which makes it vital that these groups work together. The end result is that you have teams throwing things over the wall to each other not knowing or caring what is happening on the other side of the wall... dysfunction by design.

InclinedPlane · on June 29, 2018

Funny story: at a, let's say, Fortune 50 tech company there was an investigation into why performance for a certain database query had become atrocious. The problem was that the query was against a table that started out quite small and then grew very large. And the database had been configured to use "query plan stability" to improve predictability of performance. However, the query plan that the db originally came up with for that nearly empty table was a full table scan, which was actually the fastest method under those conditions. Yet it continued to use a full table scan even as the table grew to many tens of millions of rows. It was a simple matter to switch the query to actually make use of the indexes that already existed.

outworlder · on June 29, 2018

Well, knowing that you need to profile is one step. But then you have to know HOW to profile.

Many years ago, a team member got asked to figure out what the performance implications would be if a specific application were to be changed from PostgreSQL to MongoDB (Mongo was very early at the time). That's a very difficult question to answer in general, but the way he decided to approach the problem was: create two programs. One would ask to connect to PG, grab the current time, run a query (once!), grab the time again, disconnect. And that's it. The other program would do the same for Mongo.

His 'findings' pointed out what some people were already expecting, that MongoDB was magically faster than PostgreSQL. Which was odd, as they both were running a single instance, with a simple data type, which PG should have zero problems handling.

I pointed out that he was measuring connection time too, which is slow on PG. He said that wasn't the case, because he started the timer after connecting to PG. I replied with "No, you are starting the timer after asking to connect to PG, you don't know if the connection is made at that point. Run a simple query first to be sure." He wouldn't accept that.

Turns out that not only that was correct (the mongo library would connect immediately, the PG one would defer until needed), but none of the DBs had indexes defined. So no numbers made any sense, the tested query was not based on any observed production queries, it would only be run one time, etc.

I have not seen a more meaningless 'performance' testing since. But the Powerpoint graphs looked pretty, were only management in the room, they would have likely be convinced to migrate.

All that, for a badly written application, that a single box with SQLite would have zero problems handling.

One the other side, I had a coworker implementing some simple, but effective, profiling of a Rails application. Slow path was traced to the caching code, specifically the code that generated the hash key. That was replaced with a better version and we got massive speedups.

Alright, I think I'll add some questions on profiling in my current team's interview process.

Ramiuz · on June 28, 2018

How would you test a CRUD app with a DB?

The way that I do it now it's just use some simple functional test to GET / POST and measure the time in a pytest script, I'm sure there are better ways to do it.

golergka · on June 28, 2018

The fact that you feared ORMs probably pushed you to use raw database, and such experience really helps you understand how ORMs work.

Same is true with any abstraction – if you want to learn to use it right, first learn to make do without it.

kbenson · on June 28, 2018

Like so many things, it's useful to roll your own, not so you can use it, but so you can understand the complexities and trade-offs inherent in the problem space.

I've written a few ORMs before, and I can do it with fairly succinct and concise codenquickly if needed. I still reach for the full ORM from the beginning, because swapping out layer is painful, and you always want to do it sooner than later.

ravenstine · on June 28, 2018

> Teams spending weeks exchanging SQL DB for No SQL DB because of unsolvable performance problem. When hitting the same problem with NoSQL DB, they find that addition of a simple index is solution in both cases

I don't get the idea that one has to pick either SQL or NoSQL, but a lot of people seem to think this way. Why not use both? The SQL portion can more or less be treated as a rich index of relationships, and the NoSQL portion can handle performance when it makes more sense for data to be embedded in parent documents.

outworlder · on June 29, 2018

Indeed, there is no reason to pick a "side". All big systems I have seen either have both or would benefit from both.

In fact, our current system has PG and several NoSQL databases, because each product has its own strengths.