In my programming life I've barely used SQL databases so far, and there is something I've never understood about these recommendations. Maybe someone here can explain that to me.
I thought databases are designed to organize the data optimally internally and execute all operations on them as efficiently as possible. Why do you then need to organize databases so stringently? Doesn't that mean that there is something wrong with how the db works?
To me the advice always seemed similar to bit twiddling instead of relying on the compiler to optimize. Shouldn't the database lay out the data optimally internally?
I'm not here to say the author is right. S/he is making a lot of assumptions, and as you can see in this thread, everyone is saying it depends.
But one fact is that reducing redundancy in a database is a GOOD thing.
To make a comparison, I'd say it's like when I have to maintain a codebase that is large, and I have to pass a constant (let's say, ERR_STATUS = 1) around in some functions in various folders.
I wouldn't want to have to define it in every single file. One would want that in a header, in a library or in a file that defines static values somewhere. Anywhere but only one location for it.
Database is a bit like a dumping ground. Once the business gets big enough, a lot of other application are going to plunge in and do things with data. I wouldn't want to have inconsistent data. As a programmer, if one had inconsistent things in one's source code, one could deal with it. But data is different, data is produced by users and interactions and so on. Once you begin to have inconsistent data, you can't trust it anymore until you find out exactly what did it. A lot of wasted time. And that leads to a lot of problems.
But I agree with you about your point on performance. However by organising data correctly, having indexes on things that matter, joining in the right way, one's using business knowledge and programming knowledge to optimise the database. And that's not something that can be done 100% on its own (as of now).
The schema establishes the contract between you the user and the database.
You have to help the database so that it can help you.
The information you are giving it is mostly things that it can't reliably know or decide on your behalf (such as whether you want to store 2 things together as 1 thing or not like first name last name).
The database still does a ton of stuff for your that you don't have to think about. For example SQL itself by definition is like that. You declare what you want to take out and the database figures out how to do that for you (this part is similar to the compiler optimisation of your example).
Going by your example, not having an schema is similar to saying "why do I have to write code, can't the compiler just write the code for me?".
In both cases, on one hand yes the compiler or the database does the magic for you, but at the same time it can't work without your input and decisions (writing code, or the schema).
The DB will work just fine regardless of how you organize your data, some queries might just not be possible or efficient. A well designed database will easily serve many different type of applications at the same time.
The power in databases lies in their ability to easily transform or restructure the same source data in multiple kinds of different ways. The catch is that it has to be arranged in a specific way (normalization) for this to happen. Normalization is actually based on math and set theory.
I thought databases are designed to organize the data optimally internally and execute all operations on them as efficiently as possible. Why do you then need to organize databases so stringently? Doesn't that mean that there is something wrong with how the db works?
To me the advice always seemed similar to bit twiddling instead of relying on the compiler to optimize. Shouldn't the database lay out the data optimally internally?