NULL is one of the most powerful features of SQL. It simply means a lack of data. This means it is neither equal nor not equal to anything else (including NULL). NULL is not a value; it's a state of non-existence.
So, for example, if you were providing a survey with an optional question with a yes/no answer. NULL would mean "no answer", false would mean "no", and true would mean "yes". Storing the "no answer" as a false would be incorrect since they did not answer the question.
It could also happen if you were adding a new column. Existing rows do not have data and would deserve a NULL unless you had a deterministic way to fill in a true or false value.
> NULL is one of the most powerful features of SQL.
I'd argue that NULL is, from a logical perspective, the single most broken feature of SQL.
> It simply means a lack of data.
The semantics of NULLs are less straightforward than that, and have a poor relationship to how SQL actually treats them. Every table with one or more nullable columns really should be a table with all the non nullable columns, plus additional table with each combination of columns that would never be missing together, each of which has a foreign key relationship back to the first table.
That's for the simple case, where the semantics of missing data are always consistent for any set of columns; in real-world databases there are often more than one reason data that can be missing might be missing, and those different reasons (because they are different classes of fact), for any given column or set of columns to which they apply, each call for another table with a foreign key reference to the table containing only the mandatory columns.
> So, for example, if you were providing a survey with an optional question with a yes/no answer. NULL would mean "no answer", false would mean "no", and true would mean "yes". Storing the "no answer" as a false would be incorrect since they did not answer the question.
Sure, storing it as one table with all the questions as columns and storing the "no" answer when the answer was missing would be an error. If all the questions aren't required for the survey to be valid, then -- from a logical perspective -- the problem is presenting the whole thing as a single relation in the first place. Its a set of relations, that share a key (but not necessarily all values of the key.)
And how much overhead in logic, code and frustration would that cause in terms of development and support.. right now, I'm dealing with an over-normalized database close to what you are describing and needing over 20 joins in a single query to get a complete record for display (not including actual sub-records) but to get a complete set of properties, where null means not there...
> And how much overhead in logic, code and frustration would that cause in terms of development and support..
Depends on the competencies of the people doing dev and support. Personally -- both as a developer and a technical user -- I've had more problems dealing with situations where NULLS had ambiguous semantics, where the typical naive use of nullable columns instead of normalization into logical units of data that must all be present or absent together resulted in avoidable data inconsistencies, etc., than I've ever had with overnormalized tables.
Joins for queries are a solve-once development problem; data inconsistencies and ambiguities resulting from the problems with NULL are an ongoing problem.
So, for example, if you were providing a survey with an optional question with a yes/no answer. NULL would mean "no answer", false would mean "no", and true would mean "yes". Storing the "no answer" as a false would be incorrect since they did not answer the question.
It could also happen if you were adding a new column. Existing rows do not have data and would deserve a NULL unless you had a deterministic way to fill in a true or false value.