> If two relations have identical designs, the difference can only be encoded in the relation (and possibly attribute) names, in violation of the IP [information principle]. As such, it is inaccessible to the DBMS, which cannot rely on the RPs [relation predicates] to accept/reject tuples as correct/incorrect. Users--with little, or no help from the DBMS--must ensure that the correct tuples are inserted in the proper relation. Moreover, because information is not represented explicitly (as data values), relational operations lose it: if you UNION the two relations, you get users that either viewed or downloaded items, or both.
The author appears to be suggesting that the name of a relation has nothing to do with the meaning of a tuple in it. I've read dozens of books and papers on database design and I have never come across anyone ever suggesting any such thing.
He also seems to be saying:
(1) The name of a relation is inaccessible to the DBMS.
(2) You should be able to throw tuples at a DBMS and it should just figure out where they go.
(3) If there are tables A and B in your database and you can't reconstitute A and B separately from (A UNION B), then there's something wrong with your database design.
> The author appears to be suggesting that the name of a relation has nothing to do with the meaning of a tuple in it.
The name of a relation has to be a good mnemonic for human use, but ultimately, the real meaning of a relation is what the DBMS can enforce about its tuples, just like the meaning of a type is what the type system can enforce about its inhabitants. You can't prove much about how humans will actually use a relation (or a type), since that depends on their whims, but you can prove useful things about the limits of what humans can do with a relation (or type), even in principle. And the purpose of a good database schema (or typeful design) is to make nonsensical things impossible, even in principle.
>And the purpose of a good database schema (or typeful design) is to make nonsensical things impossible, even in principle.
Can you elaborate on what is the nonsensical thing possible with two tables of the same type?
To me the (A UNION B) example makes very much sense exactly in the case that you want all x s.t. (x is in A) OR (x is in B). With differently named id columns you might have to write this as (excuse my pseudo code):
(RENAME(A, a_id TO a_or_b_id) UNION RENAME(B, b_id TO a_or_b_id))
which is indeed more explicit. Having written that I might see the rub here. In the article the column "item_id" is kind of a "view_or_download_id" and which it is is only specified the namespace (i.e. the name of the table ITEMS or DOWNLOADS the tuple belongs to). In an OO world this would be the difference between:
class Person { public string Name { get; set; } }
...
var employees = new List<Person>(...);
var managers = new List<Person>(...);
and the version where employee and manager types are incompatible:
class Person { public string Name { get; set; } }
class Employee : Person {}
class Manager : Person {}
...
var employees = new List<Employee>(...);
var managers = new List<Manager>(...);
The latter would indeed prevent e.g. employees.Concat(managers) on accident. However, in many programming scenarios the former would be preferred for the increased code reusability. Is this not an objective in database design (with e.g. stored procedures that could be applied to more than one set of tables)?
> Can you elaborate on what is the nonsensical thing possible with two tables of the same type?
There's nothing intrinsically wrong with it a priori. But see below.
> To me the (A UNION B) example makes very much sense exactly in the case that you want all x s.t. (x is in A) OR (x is in B).
To me, the very idea of making a collection whose elements have different logical types is completely nonsensical. I consider it a weakness of SQL that I can form the relation:
select customer_id as id, customer_name as name
from customer
union
select employee_id as id, employee_name as name
from employee
The attributes customer_id and employee_id should have different abstract types, so that they can't be accidentally conflated with one another, even if their internal representation is the same (say, an autoincremental int). Then, if you want to make a derived relation containing both customers and employers, you should be forced to use a sum type (à la Haskell's Either) as the primary key.
> In an OO world this would be the difference between: (snippet)
Object-orientation is simply the wrong paradigm for building data-oriented applications, at least if you care about automatically enforced data integrity and declarative data manipulation (which you should): https://news.ycombinator.com/item?id=11861520
For what it's worth, I seldom use dynamic SQL in stored procedures, and then seldom for the purpose of changing the names of the objects used; usually it is to vary the criteria applied to a query.
No, the real meaning of a relation is the intended interpretation it was given when it was designed (and which is also the criterion by which the database user decides whether a tuple should be present or not, thus by which the database user decides which updates to carry through).
The DBMS doesn't read your mind before interpreting relations. It interprets relations according to its own data integrity enforcement rules.
Most programmers are used to expecting language constructs to mean what a human would mean (“I called my class `Customer`, so of course its instances are customers, right?”), but the formal semantics of the language in question seldom, if ever, justifies this. If you want to establish that your code means what you mean, you have to prove it.
> The DBMS doesn't read your mind before interpreting relations.
The DBMS doesn't interpret relations at all. It performs relational operations on them; humans interpret them (often using other software as a tool to present an interpretation.)
> Most programmers are used to expecting language constructs to mean what a human would mean (“I called my class `Customer`, so of course its instances are customers, right?”), but the formal semantics of the language in question seldom, if ever, justifies this.
Which "language in question". If you mean the DB query language, that's true. OTOH, in a sense, most programming involves creating specialized languages (often layered on top of existing languages) with their own (in the layered case, additional) semantics. In the constructed language of a particular library or database, that the instances of a Customer class or members of a Customer relation are representations of customers is part of the semantics of that language, even if it isn't part of the semantics of the host language.
> The DBMS doesn't interpret relations at all. It performs relational operations on them; humans interpret them (often using other software as a tool to present an interpretation.)
A database schema is a multisorted axiomatic system (each base relation is its own sort), of which the database's state is a model (in the logician's sense: a concrete interpretation of a theory). Unlike the interpretation in your head, whose only justification is your personal wishes, the DBMS' interpretation is backed by a concrete formal system: the language in which your database schema is expressed.
> In the constructed language of a particular library or database, that the instances of a Customer class or members of a Customer relation are representations of customers is part of the semantics of that language, even if it isn't part of the semantics of the host language.
This is only true if you can actually prove that your database or program's “Customers” actually behave like customers in your problem domain.
The DBMS doesn't read your mind before interpreting relations.
Which is why the real interpretation happens outside the DBMS.
And which is also why the DBMS needs "help" from its users when deciding whether TUPLE{} represents "The shop is closed" and should go in THE_SHOP_IS_CLOSED rather than "The alarm is set" and should go in THE_ALARM_IS_SET.
It interprets relations according to its own data integrity enforcement rules.
If this is your way of saying that the only thing a DBMS does (and can ever be expected to do) is algebraic computation (e.g. to verify constraint satisfaction of a given update to a given existing db state), completely devoid of any form of "interpretation" in the sense of "human interpretation", I agree. Hell, it's why those things were called "computers". They compute. Nothing more.
> Which is why the real interpretation happens outside the DBMS.
Unfortunately, computer programs mean what they actually do, not what you wish they did. The field of formal semantics of programming languages exists to study the real meaning of programming languages, not the “intended” one, whatever the latter might be.
> If this is your way of saying that the only thing a DBMS does (and can ever be expected to do) is algebraic computation (e.g. to verify constraint satisfaction of a given update to a given existing db state)
The programmer is expected to make the DBMS enforce business rules. A database schema is a model (in the informal, everyday sense of the word) of the real world to the extent the real world's rules are enforced constraints in the schema. Otherwise, it's just wishful thinking.
Placing this in its context of things said previously, "The DBMS doesn't read your mind before interpreting relations.", "Which is why the real interpretation happens outside the DBMS.", "Unfortunately, computer programs mean what they actually do", it is entirely irrelevant.
The meaning of what a DDL script does is that it asserts/ensures the (non-)existence of database objects in the database. This has absolutely nothing to do with the intended interpretation that the designer attaches to those of the database objects that are base tables/relations/relvars and derived ones in his design.
> The meaning of what a DDL script does is that it asserts/ensures the (non-)existence of database objects in the database.
The DDL also defines what database objects mean. It's the programmer's job to make sure database objects actually mean what the specification says they should mean. Calling it a “customer” isn't enough: it has to have the same attributes, as well as obey the same constraints, as a customer in your problem domain. If your DBMS can't enforce this, it's not a customer, no matter what you call it.
> This has absolutely nothing to do with the intended interpretation that the designer attaches to those of the database objects that are base tables/relations/relvars and derived ones in his design.
If your DBMS's DDL is sufficiently expressive, you can arrange things so that the behavior of your database objects matches your intended interpretation. (And, if it isn't sufficiently expressive, you should switch DBMSes.) This is the whole point to studying database models.
Unless it is possible for a RDBMS to make that decision instead of relying on the users as we know them to be. Which it is, but just because people are working in the db field without an ounce of introduction to logic is not good enough reason to defend reliance on user discipline.
The author appears to be suggesting that the name of a relation has nothing to do with the meaning of a tuple in it. I've read dozens of books and papers on database design and I have never come across anyone ever suggesting any such thing.
He also seems to be saying:
(1) The name of a relation is inaccessible to the DBMS.
(2) You should be able to throw tuples at a DBMS and it should just figure out where they go.
(3) If there are tables A and B in your database and you can't reconstitute A and B separately from (A UNION B), then there's something wrong with your database design.