Database Design: Relation Predicates and “Identical Relations”

gnud · on Sept 6, 2016

Wow, this was hard to read. Not because it was very complex, just because of the incomplete sentences and unessescary use of abbreviations.

In the other post [1], linked from this one, he nit-picks because the question uses the terms "tables" and "fields" instead of "relations" and "attributes".

I'm sure the author knows more about relational modelling than me, but I'm not sure I would enjoy asking him for advice.

1: http://www.dbdebunk.com/2012/09/NormalOrthogDBDesign_3.html

sgeneris · on Sept 7, 2016

> Wow, this was hard to read.

Well maybe the reader had something to do with it.

mhluongo · on Sept 7, 2016

Are you trying to communicate, or be right on the internet? If you're trying to communicate, readability matters.

Retra · on Sept 7, 2016

I think the point there is that if you're thinking of a database as a collection of fields and tables, then there's a chance you might be failing to understand why we use relational databases in the first place, rather than just... tables of data which are indexed as fields.

Anyway, in my experience, when someone is insisting on some specific terminology, they tend to have a good reason to do so (at least to the end of highlighting the specific confusions that lead to avoiding yours), and the mental hoops you have to jump through to figure out why their terminology is better are the kinds of things one needs to do to learn.

> I'm not sure I would enjoy asking him for advice.

I'm not sure you should be optimizing for your own enjoyment when it comes to asking people for advice.

Chris2048 · on Sept 7, 2016

> I'm not sure you should be optimizing for your own enjoyment when it comes to asking people for advice.

Who talked of optimisation? If you can deliver the same information without the pain, why not do so?

garyclarke27 · on Sept 6, 2016

Article is poorly written, long winded, nonsense. I am happy for Relation names to distinguish entities. He is basically arguing that this violates some sacred principle and that all meaning should be available from column names, so that relations names can be ignored. I don't think that even Chris Date who is a zealous RM fanatic would argue this - his books are actually quite good much more logical than Fabian Pascall. I'm also sure that Joe Celko would not agree with Fabian on this.

kahirsch · on Sept 6, 2016

> If two relations have identical designs, the difference can only be encoded in the relation (and possibly attribute) names, in violation of the IP [information principle]. As such, it is inaccessible to the DBMS, which cannot rely on the RPs [relation predicates] to accept/reject tuples as correct/incorrect. Users--with little, or no help from the DBMS--must ensure that the correct tuples are inserted in the proper relation. Moreover, because information is not represented explicitly (as data values), relational operations lose it: if you UNION the two relations, you get users that either viewed or downloaded items, or both.

The author appears to be suggesting that the name of a relation has nothing to do with the meaning of a tuple in it. I've read dozens of books and papers on database design and I have never come across anyone ever suggesting any such thing.

He also seems to be saying:

(1) The name of a relation is inaccessible to the DBMS.

(2) You should be able to throw tuples at a DBMS and it should just figure out where they go.

(3) If there are tables A and B in your database and you can't reconstitute A and B separately from (A UNION B), then there's something wrong with your database design.

catnaroek · on Sept 6, 2016

> The author appears to be suggesting that the name of a relation has nothing to do with the meaning of a tuple in it.

The name of a relation has to be a good mnemonic for human use, but ultimately, the real meaning of a relation is what the DBMS can enforce about its tuples, just like the meaning of a type is what the type system can enforce about its inhabitants. You can't prove much about how humans will actually use a relation (or a type), since that depends on their whims, but you can prove useful things about the limits of what humans can do with a relation (or type), even in principle. And the purpose of a good database schema (or typeful design) is to make nonsensical things impossible, even in principle.

ulber · on Sept 6, 2016

>And the purpose of a good database schema (or typeful design) is to make nonsensical things impossible, even in principle.

Can you elaborate on what is the nonsensical thing possible with two tables of the same type?

To me the (A UNION B) example makes very much sense exactly in the case that you want all x s.t. (x is in A) OR (x is in B). With differently named id columns you might have to write this as (excuse my pseudo code):

  (RENAME(A, a_id TO a_or_b_id) UNION RENAME(B, b_id TO a_or_b_id))

which is indeed more explicit. Having written that I might see the rub here. In the article the column "item_id" is kind of a "view_or_download_id" and which it is is only specified the namespace (i.e. the name of the table ITEMS or DOWNLOADS the tuple belongs to). In an OO world this would be the difference between:

  class Person { public string Name { get; set; } }
  ...
  var employees = new List<Person>(...);
  var managers = new List<Person>(...);

and the version where employee and manager types are incompatible:

  class Person { public string Name { get; set; } }
  class Employee : Person {}
  class Manager : Person {}
  ...
  var employees = new List<Employee>(...);
  var managers = new List<Manager>(...);

The latter would indeed prevent e.g. employees.Concat(managers) on accident. However, in many programming scenarios the former would be preferred for the increased code reusability. Is this not an objective in database design (with e.g. stored procedures that could be applied to more than one set of tables)?

catnaroek · on Sept 7, 2016

> Can you elaborate on what is the nonsensical thing possible with two tables of the same type?

There's nothing intrinsically wrong with it a priori. But see below.

> To me the (A UNION B) example makes very much sense exactly in the case that you want all x s.t. (x is in A) OR (x is in B).

To me, the very idea of making a collection whose elements have different logical types is completely nonsensical. I consider it a weakness of SQL that I can form the relation:

    select customer_id as id, customer_name as name
    from customer
    union
    select employee_id as id, employee_name as name
    from employee

The attributes customer_id and employee_id should have different abstract types, so that they can't be accidentally conflated with one another, even if their internal representation is the same (say, an autoincremental int). Then, if you want to make a derived relation containing both customers and employers, you should be forced to use a sum type (à la Haskell's Either) as the primary key.

> In an OO world this would be the difference between: (snippet)

Object-orientation is simply the wrong paradigm for building data-oriented applications, at least if you care about automatically enforced data integrity and declarative data manipulation (which you should): https://news.ycombinator.com/item?id=11861520

cafard · on Sept 7, 2016

For what it's worth, I seldom use dynamic SQL in stored procedures, and then seldom for the purpose of changing the names of the objects used; usually it is to vary the criteria applied to a query.

ErwinSmout · on Sept 7, 2016

No, the real meaning of a relation is the intended interpretation it was given when it was designed (and which is also the criterion by which the database user decides whether a tuple should be present or not, thus by which the database user decides which updates to carry through).

catnaroek · on Sept 7, 2016

> intended interpretation

The DBMS doesn't read your mind before interpreting relations. It interprets relations according to its own data integrity enforcement rules.

Most programmers are used to expecting language constructs to mean what a human would mean (“I called my class `Customer`, so of course its instances are customers, right?”), but the formal semantics of the language in question seldom, if ever, justifies this. If you want to establish that your code means what you mean, you have to prove it.

For more information: https://en.wikipedia.org/wiki/Semantics_(computer_science)

dragonwriter · on Sept 7, 2016

> The DBMS doesn't read your mind before interpreting relations.

The DBMS doesn't interpret relations at all. It performs relational operations on them; humans interpret them (often using other software as a tool to present an interpretation.)

> Most programmers are used to expecting language constructs to mean what a human would mean (“I called my class `Customer`, so of course its instances are customers, right?”), but the formal semantics of the language in question seldom, if ever, justifies this.

Which "language in question". If you mean the DB query language, that's true. OTOH, in a sense, most programming involves creating specialized languages (often layered on top of existing languages) with their own (in the layered case, additional) semantics. In the constructed language of a particular library or database, that the instances of a Customer class or members of a Customer relation are representations of customers is part of the semantics of that language, even if it isn't part of the semantics of the host language.

catnaroek · on Sept 7, 2016

> The DBMS doesn't interpret relations at all. It performs relational operations on them; humans interpret them (often using other software as a tool to present an interpretation.)

A database schema is a multisorted axiomatic system (each base relation is its own sort), of which the database's state is a model (in the logician's sense: a concrete interpretation of a theory). Unlike the interpretation in your head, whose only justification is your personal wishes, the DBMS' interpretation is backed by a concrete formal system: the language in which your database schema is expressed.

> In the constructed language of a particular library or database, that the instances of a Customer class or members of a Customer relation are representations of customers is part of the semantics of that language, even if it isn't part of the semantics of the host language.

This is only true if you can actually prove that your database or program's “Customers” actually behave like customers in your problem domain.

ErwinSmout · on Sept 8, 2016

The DBMS doesn't read your mind before interpreting relations.

Which is why the real interpretation happens outside the DBMS.

And which is also why the DBMS needs "help" from its users when deciding whether TUPLE{} represents "The shop is closed" and should go in THE_SHOP_IS_CLOSED rather than "The alarm is set" and should go in THE_ALARM_IS_SET.

It interprets relations according to its own data integrity enforcement rules.

If this is your way of saying that the only thing a DBMS does (and can ever be expected to do) is algebraic computation (e.g. to verify constraint satisfaction of a given update to a given existing db state), completely devoid of any form of "interpretation" in the sense of "human interpretation", I agree. Hell, it's why those things were called "computers". They compute. Nothing more.

catnaroek · on Sept 8, 2016

> Which is why the real interpretation happens outside the DBMS.

Unfortunately, computer programs mean what they actually do, not what you wish they did. The field of formal semantics of programming languages exists to study the real meaning of programming languages, not the “intended” one, whatever the latter might be.

> If this is your way of saying that the only thing a DBMS does (and can ever be expected to do) is algebraic computation (e.g. to verify constraint satisfaction of a given update to a given existing db state)

The programmer is expected to make the DBMS enforce business rules. A database schema is a model (in the informal, everyday sense of the word) of the real world to the extent the real world's rules are enforced constraints in the schema. Otherwise, it's just wishful thinking.

ErwinSmout · on Sept 8, 2016

I thought it was the database designer. But OK, maybe that's just semantics and I'm not very good at those.

catnaroek · on Sept 8, 2016

The database designer is a programmer, and the DBMS's data definition language is a programming language.

ErwinSmout · on Sept 9, 2016

Placing this in its context of things said previously, "The DBMS doesn't read your mind before interpreting relations.", "Which is why the real interpretation happens outside the DBMS.", "Unfortunately, computer programs mean what they actually do", it is entirely irrelevant.

The meaning of what a DDL script does is that it asserts/ensures the (non-)existence of database objects in the database. This has absolutely nothing to do with the intended interpretation that the designer attaches to those of the database objects that are base tables/relations/relvars and derived ones in his design.

catnaroek · on Sept 9, 2016

> The meaning of what a DDL script does is that it asserts/ensures the (non-)existence of database objects in the database.

The DDL also defines what database objects mean. It's the programmer's job to make sure database objects actually mean what the specification says they should mean. Calling it a “customer” isn't enough: it has to have the same attributes, as well as obey the same constraints, as a customer in your problem domain. If your DBMS can't enforce this, it's not a customer, no matter what you call it.

> This has absolutely nothing to do with the intended interpretation that the designer attaches to those of the database objects that are base tables/relations/relvars and derived ones in his design.

If your DBMS's DDL is sufficiently expressive, you can arrange things so that the behavior of your database objects matches your intended interpretation. (And, if it isn't sufficiently expressive, you should switch DBMSes.) This is the whole point to studying database models.

ErwinSmout · on Sept 9, 2016

My DBMS is sufficiently expressive, thank you.

sgeneris · on Sept 7, 2016

Unless it is possible for a RDBMS to make that decision instead of relying on the users as we know them to be. Which it is, but just because people are working in the db field without an ounce of introduction to logic is not good enough reason to defend reliance on user discipline.

combatentropy · on Sept 7, 2016

Hey, I can talk like that too.

  A mental construct has occurred to my consciousness
  subsequent to the exposure of said mind to myriad
  graphological artifacts (GE's) conjecting observations
  that are encapsulated in a word structure (WS) of such
  syllabic cornucopia that the reliable transmission of
  said observation to the recipients takes on a high
  degree of uncertainty.

Now it's your fault if you don't understand it.

sgeneris · on Sept 7, 2016

I wouldn't expose your inability to understand if I were you.

combatentropy · on Sept 7, 2016

Why is that?

burnstek · on Sept 6, 2016

I would say the biggest decision point here is how either entity might evolve. If these are truly different domain entities then one may end up with different attributes than the other in the future, meaning these should definitely be different tables. They just happen to have the same set of fields at this point in time.

dragonwriter · on Sept 7, 2016

Wow, for a site that claims to be "database fundamentals made accessible", that's a fairly inaccessible, overwritten piece.

I think the fundamental argument it is making has some value; if 2 or more tables [0] are structurally different only in the name of the table, then the facts in those tables are almost certainly related instances of some general common supertype, such that there should be one table, with an additional field disambiguating the subtype (and which probably a foreign key into a new table specifying the valid subtypes, which have a 1:1 relationship to the tables in the "bad" schema.)

OTOH, in lots of practical applications no application would ever care about that "ideal" base table, and all interaction would be through views that exactly correspond to the tables in the "bad" design. Outside of a DB serving as an ideal, application-independent store (which is an important role, and even in single-app DBs designing for this can have advantages in dealing with growth and change and unexpected future uses), the value of this ideal transformation may be minimal.

[0] base relations, if you prefer.

cafard · on Sept 6, 2016

Fabian Pascal tends to write as if the point of database work were to construct perfect ontologies. Perhaps we would be better off if we did that, perhaps not. In this case, perhaps it would be a cleaner design to have a single table with ID, USER_ID, FIRST_VIEWED, FIRST_DOWNLOADED. But would I lose sleep over this? No.

catnaroek · on Sept 6, 2016

> Fabian Pascal tends to write as if the point of database work were to construct perfect ontologies.

Wait, is it not?

makebelieve · on Sept 6, 2016

I found this approach deeply troubling because it moves us away form semantic design towards logic design... which always runs into problems when the database itself becomes semantic content. The write up was rather confusing.

The solution seems straightforward. A single table that captures the meaning expressed by the separate VIEWS and DOWNLOADS tables. eg. USERACTION (USER_ID, ITEM_ID, ACTIONTYPE) where ACTIONTYPE is a value like V for view and D for download. Of course, that solution is hard to see because it's a synthesis of meanings occurring at different levels and not the product of predicate logic.

sgeneris · on Sept 7, 2016

Database design IS logic. That's the point of the RDM: to formalize and symbolize semantics such that the DBMS can enforce integrity on and manipulate data, such that logical and semantic correctness is guaranteed. Leave that to users in apps at your peril. We used to do this before the RDM and the whole shabang collapsed. And we're still doing it because practitioners know nothing beyond SQL and coding.

dragonwriter · on Sept 7, 2016

> We used to do this before the RDM and the whole shabang collapsed.

No, it didn't, and relational-theory-purists aren't going to sell their ideas to practitioners in the real world by pretending that it did. The RDM certainly offers all kinds of abstract benefits, which practitioners often do not fully understand or leverage, and there is a very real problem when the not fully leveraging is due to not fully understanding (rather than weighing practical costs and benefits in the particular use case.)

OTOH, the reason that things built on the relational model took off in practice wasn't that non-relational systems had reached a point of catastrophic logical failure that led to their rejection, but because the relational model had a convenient mapping to implementations that were convenient and efficient in the technology of the day (particularly, hard disk storage), combined with some of the structural improvements over other approaches being particularly attractive for important application domains.

> And we're still doing it because practitioners know nothing beyond SQL and coding.

Yeah, look, we're probably never going to have a time when most practitioners are deep theoreticians rather than expert tool users, and if you want to sell practitioners on deeper consideration of the underlying theoretical models, you're going to need to make explanations of the practical benefits much more accessible than you have in the source article or your comments in this thread (and you're going to need to be a lot less personally abusive.)

sgeneris · on Sept 7, 2016

Right, there is never time to do it right and lots of time to do it over.

I do not think that your reading of history of the field is anywhere close to reality. I do suggest that you read as carefully my comments as I write them: I did not say practitioners ought to be theoreticians, I said they should no engage in a field founded on logic without ANY intro to logic. Big difference.

In fact, the initial mapping to implementation--direct image SQL implementations--was not in the relational spirit at all and is in large part responsible to logical-physical confusion and confusion of tables with relations. And to call those initial implementations efficient in the technology of the day is from another planet. IBM would not budge implementing the RDM until Oracle forced it. It just so happened that even the limited relational fidelity of SQL proved superior to the rigidity, complexity and lack of soundness of hierarchic and network technologies. Have you ever seen IMS or Codasyl code?

Listen, have done nothing but exactly making the practical implications of the theory for the last 40 years. I suggest you read my stuff and tell me exactly what is wrong with it. The problem is lack of fundamental education which has been replaced by tool training. Practitioners are not even aware that there is something beyond experience with tools that they need to know.

So pls.

dragonwriter · on Sept 7, 2016

> Right, there is never time to do it right and lots of time to do it over.

Purists like to snear when they say this, but in point of fact its often true: its often more efficient to do things good enough for now and fix the things that turn out to need to fixing later (because the real pace of change often means the things that are broken-in-theory, but good-in-enough-in-practice are going to need completely replaced because of requirements changes before they become problematic in practice.)

But its true, OTOH, that lack of knowledge of relational theory, the anomalies that it identifies that are tied to improper data models, and the practical impact of these can lead to poor analysis of the tradeoffs, and that the common cargo-cult rules of thumb (say, for degree of normalization to pursue) that are frequently used in practice are poor substitutes for deep understanding of the relational model and the problem of concern, and analysis of the real risks in the system under design of taking shortcuts.

> I do suggest that you read as carefully my comments as I write them

One of the biggest problems with your comments is that they don't appear to be written carefully -- particularly, if you hope to influence the practitioners that you treat with such condescension, you have failed to put due care into consideration of your approach to the audience, which is a central element of any communication.

> I did not say practitioners ought to be theoreticians, I said they should no engage in a field founded on logic without ANY intro to logic.

You seem to also, however, keep suggesting that either a lack of deep familiarity with relational theory or a disagreement with your interpretation of how that theory ought to shape practice are equivalent to (or can only be a result of) a complete lack of grounding in logic. Whether you are actually conflating these things or just engaging in particularly obnoxious condescension and personal abuse when you do this is less than clear, but neither is helpful or useful.

> In fact, the initial mapping to implementation--direct image SQL implementations--was not in the relational spirit at all and is in large part responsible to logical-physical confusion and confusion of tables with relations.

I'm not sure what you mean by the "relational spirit". SQL's design was clearly shaped by the relational data model, though SQL itself (in its current form as well as its early forms) is certainly not ideal from a relational perspective, even before considering the whole NULL controversy.

> And to call those initial implementations efficient in the technology of the day is from another planet.

The initial implementations weren't what became popular though; the implementations that were efficient were key to driving popularity.

> It just so happened that even the limited relational fidelity of SQL proved superior to the rigidity, complexity and lack of soundness of hierarchic and network technologies.

Sure. I just think that its easy for theory-purists to overstate the degree to which the "lack of soundness" was the problem driving adoption, rather than "rigidity and complexity". Insofar as the linguistic and expressive features of SQL and the relational model proved attractive, simplicity and flexibility were particularly important, and while the capacity for soundness is an important improvement, its one that's been underused since day one. Its simply not the case, as you seem to present, that we've "fallen" to a state where that is ignored from some rosier days when that aspect of RDBMS capacity was strongly embraced and effectively and rigorously used by practitioners.

> Have you ever seen IMS or Codasyl code?

I've even had to write (well, modify) some IMS code, far more recently than I'd prefer to have.

> The problem is lack of fundamental education which has been replaced by tool training.

I don't think that's true. I think that the number of people with "fundamental education" in the field is probably greater than ever before. Sure, the number of people with tool training has increased faster, but that's not tool training replacing fundamental education, its just that with any technology, the first generation of users will all (or, at least, disproportionately) be versed in the underlying principles because they are also the builders of the technology, but over time that's going to fade as, even with more people educated in the principles, people who are just pragmatic users of the technology with a more limited focus are going to grow at a faster rate.

> Practitioners are not even aware that there is something beyond experience with tools that they need to know.

Most practitioners I've encountered seem to be aware that relational theory exists. Sure, lots of them aren't well versed in it or what light it has to shine on their craft, but abstruse descriptions without clearly explained pragmatic benefits aren't an effective way to correct that and motivate them to dig more into theory, and neither is condescension and abuse.

sgeneris · on Sept 9, 2016

> Purists like to snear when they say this, but in point of fact its often true: its often more efficient to do things good enough for now and fix the things that turn out to need to fixing later (because the real pace of change often means the things that are broken-in-theory, but good-in-enough-in-practice are going to need completely replaced because of requirements changes before they become problematic in practice.)

Do you have stats that prove your point, or is it based on the fact that this is how it is usually done because that is what is possible given the poor level of education and knowledge in the industry? If you're not part of that, don't underestimate its size. I spent 40+ years demonstrating the ignorance and its consequences.

By the way what does "impure theory" mean?

> But its true, OTOH, that lack of knowledge of relational theory, the anomalies that it identifies that are tied to improper data models, and the practical impact of these can lead to poor analysis of the tradeoffs, and that the common cargo-cult rules of thumb (say, for degree of normalization to pursue) that are frequently used in practice are poor substitutes for deep understanding of the relational model and the problem of concern, and analysis of the real risks in the system under design of taking shortcuts.

> Glad we agree on something. My claim is that there's more to that than you seem to think.

> One of the biggest problems with your comments is that they don't appear to be written carefully a-- particularly, if you hope to influence the practitioners that you treat with such condescension, you have failed to put due care into consideration of your approach to the audience, which is a central element of any communication.

The title of one of my books is "for the THINKING practitioner". He is the one I try to influence and aren't that many. It's not entirely their fault--it's how the industry and business in general operates. I can detect very easily the difference between a thinker who is uninformed and a non-thinker and I treat them differently. It's just that there's many more of the latter than the former.

> You seem to also, however, keep suggesting that either a lack of deep familiarity with relational theory or a disagreement with your interpretation of how that theory ought to shape practice are equivalent to (or can only be a result of) a complete lack of grounding in logic. Whether you are actually conflating these things or just engaging in particularly obnoxious condescension and personal abuse when you do this is less than clear, but neither is helpful or useful.

So according to you it's not possible to detect the difference between a poor argument due to ignorance of logic and one grounded in logic? Again, I have spent 4 decades doing this and pls permit to believe that I discern quite readily who should be treated with respect and who not. You're entitled to disagree.

> I'm not sure what you mean by the "relational spirit". SQL's design was clearly shaped by the relational data model, though SQL itself (in its current form as well as its early forms) is certainly not ideal from a relational perspective, even before considering the whole NULL controversy.

The authors of SQL did not have a good grasp of the RDM, which is why SQL cannot be considered truly relational. It violates too many rel. principles. This according to Codd and Date who were both at IBM when SQL was developed. The specific spirit I was referring to is physical independence, a core objective of the RDM which a direct image implementation is not in the spirit of.

> The initial implementations weren't what became popular though; the implementations that were efficient were key to driving popularity.

Yes, but many of the reasons for which SQL was slow to make efficient was its poor relational fidelity. I've written a few articles on that subject.

> Sure. I just think that its easy for theory-purists to overstate the degree to which the "lack of soundness" was the problem driving adoption, rather than "rigidity and complexity". Insofar as the linguistic and expressive features of SQL and the relational model proved attractive, simplicity and flexibility were particularly important, and while the capacity for soundness is an important improvement, its one that's been underused since day one. Its simply not the case, as you seem to present, that we've "fallen" to a state where that is ignored from some rosier days when that aspect of RDBMS capacity was strongly embraced and effectively and rigorously used by practitioners.

I did not say soundness drove the adoption. In fact, there is no way this can happen given that a vast majority of practitioners have no clue of how a formal foundation for db mgmt is different and superior to non-formal ones. This is not different than the notion that data science is science. That's precisely why I keep stressing that substituting training for education killed the capacity to appreciate the difference.

> I've even had to write (well, modify) some IMS code, far more recently than I'd prefer to have.

My sympathy.

> I don't think that's true. I think that the number of people with "fundamental education" in the field is probably greater than ever before. Sure, the number of people with tool training has increased faster, but that's not tool training replacing fundamental education, its just that with any technology, the first generation of users will all (or, at least, disproportionately) be versed in the underlying principles because they are also the builders of the technology, but over time that's going to fade as, even with more people educated in the principles, people who are just pragmatic users of the technology with a more limited focus are going to grow at a faster rate.

Well, having spent so many years documenting the deterioration of education, I believe my evidence than your perceptions.

> Most practitioners I've encountered seem to be aware that relational theory exists.

Very well put. That's about the gist of it. And whatever little they know about it is wrong.

> Sure, lots of them aren't well versed in it or what light it has to shine on their craft, but abstruse descriptions without clearly explained pragmatic benefits aren't an effective way to correct that and motivate them to dig more into theory, and neither is condescension and abuse.

They should not be motivated to do it on their own. There should be a basic level of education required to be inducted into the profession by faculty that are themselves proficient in the material and not industry hires that are teaching coding and tools because that's what univs teach now to be "relevant".

Roboprog · on Sept 10, 2016

I hear you on the education thing. Intuitively, the "two tables with same structure" approach is redundant, but it's helpful to have the math/logic to unambiguously define what "redundant" is.

Having 2 tuple types with 3 values seems harmless. This explodes quickly when done with many columns or many tables. Having some kind of foreign key to act as a "discriminator" scales much better to constrain the number of value types (# of tables x # of columns) that must exist.

But I only say this because of having to deal with a legacy DB at work that violates the Hell out of this :-)

... which brings us full circle to the tragedy of a lack of education.

sgeneris · on Sept 7, 2016

Correction: I meant to say I made the practical implications of theory accessible via articles, seminars, books. The only way to claim they are not accessible is in the absence of basic foundation knowledge.

ErwinSmout · on Sept 7, 2016

"If two relations have identical designs, the difference can only be encoded in the relation (and possibly attribute) names,"

True, though the (and possibly attribute names) part seems dubious to say the least. Different attribute names makes the designs non-identical, no ?

in violation of the IP [information principle].

This interpretation of the IP is outright absurd.

One, the "I" in "IP" was obviously intended to cover only ever the [end-]user's own business information. The "information" that is being "hidden" under this absurd interpretation is the mapping that applies from relation names to intended interpretation. That information is never part of the [end-]users "genuine business information". How could it ? The mapping in question arises only when the models are being developed. As long as no computers are involved, no information models and no mapping, but the "genuine business information" stays the same.

Two, the relation names are present as a value of an attribute in a tuple in a relation. In the catalog that documents the structure of the database that will contain the [end-]users "genuine business information".

As such, it is inaccessible to the DBMS, which cannot rely on the RPs [relation predicates] to accept/reject tuples as correct/incorrect.

Not sure what is intended here. Is it claimed that because "table names are inaccessible to the DBMS", it is impossible for the DBMS to enforce declared constraints ? Ouch. The SQL REFERENCES clause ought to suffice to counter that.

Users--with little, or no help from the DBMS--must ensure that the correct tuples are inserted in the proper relation.

So they must know the mapping from relation names to intended interpretation. That is not an insurmountable problem. Hundreds of thousands of developers have already been doing that [or something extremely similar] since before databases even existed.

Moreover, because information is not represented explicitly (as data values), relational operations lose it: if you UNION the two relations, you get users that either viewed or downloaded items, or both.

Then don't union the two together. (It is alas left unclear whether the usage of the term UNION here refers to invocations of that relational operator on the two relations in the original design (in which case the "loss of information" is (a) intentional and (b) not really loss of information because the original relations have not magically disappeared by computing the union), or in the sense of blindly mergeing the two tables together at the schema level without adding an indication of view vs. download. In which case it's just a stupid design mistake even my cat probably wouldn't commit.)

sgeneris · on Sept 7, 2016

The RDM is an attempt to maximize database management by the DBMS. What is left only in the mind of users--particularly the kind of users we witness here--is nothing but trouble. Reliance on sheer user discipline--for no good reason, for the only reason that SQL is not truly relational and people don't know, understand and appreciate the theoretical foundations of the RDM is what is really absurd here.

That's the reason the IP insists that ALL information is represented EXPLICITLY and in EXACTLY one way, to maximize soundness, power and simplicity. Whoever does not understand this will get nothing better than SQL.

sgeneris · on Sept 7, 2016

Erwin,

To those without a background in logic lots of things can seem absurd.

sgeneris · on Sept 7, 2016

[flagged]

aleyan · on Sept 7, 2016

Normally I would just downvote this kind of comment and this kind of article and move on. However because of your earnestness in educating people on the subject, I will not do that. Instead I will explain what is wrong with it and why it deserves the downvotes.

Quite simply the dbdebunked article is hard to read. Too hard to read in fact. It isn't hard to read because the content is complicated, but rather because of the organization of the content is poor. It starts off with a call to action to read another article and immediately follows it up with what appears to be somehow related but poorly referenced block quote. There was actually several triply nested quotes in there amongst other writing smells. Taking the time to edit is important.

When an author is writing for an audience it is their duty to be understood. Having the audience exert extra energy in reading because the author is unwilling to exert energy in writing is inefficient.

There might be something brilliant in the dbdebunked article, but I am unwilling to puzzle it out or subject others to it.

sgeneris · on Sept 7, 2016

May have something to do with the reader's ability to comprehend.