Response to A critical reading of the Third Manifesto
by Maurice Gittens
Hugh Darwen <hd@thethirdmanifesto.com>
1st October 2004
For further information relating to this issue, visit Maurice Gittens's web site.
Maurice Gittens's article A critical reading of the Third Manifesto appeared in the September, 2004 issue of Database Magazine. The article criticises certain aspects of the book Foundation for Future Database Systems: The Third Manifesto by C.J. Date and Hugh Darwen (2nd edition, Addison-Wesley, 2000). At the invitation of the editor Hugh Darwen has written this response, which has been carefully reviewed by Chris Date.
The response might appear to be rather pernickety in places. We believe that in that respect it matches the vein of some parts of the article we are responding to. We take no offence from pernicketiness and we hope none is given by this response.
This is a general response, attempting to cover the most important points without giving a blow-by-blow commentary. The response's structure is only very loosely keyed to that of the article. Appendix A gives a blow-by-blow commentary in the form of a copy of the article with embedded annotations by Hugh Darwen.
Overall Assessment
The article does not live up to its title. A genuine "critical reading" of The Third Manifesto would mention each of its numbered Prescriptions, Proscriptions and Very Strong Suggestions, or at least those that the author takes issue with. The article does not specifically mention any of them. It makes an incorrect assumption in what appears to be an argument to the effect that relation variables are types. In what appears to be an argument against our rejection of pointers, it defines the term "identity" to refer to a concept we believe we fully embrace and claims that we reject identity. To justify its possible rejection of our rejection of something, it gives some hints about a proposed operator without showing in what way The Third Manifesto does not allow that operator to be supported. It claims a certain equivalence between tuples and relations that is based on at least one demonstrably incorrect assumption. In view of these findings, we obviously reject all of its conclusions. Specific criticisms follow.
Introduction
The introductory section refers to our stated maxim, All Logical Differences Are Big Differences. It attempts to claim that some of our work is inconsistent with this maxim and its corollary, All Logical Mistakes Are Big Mistakes. We recognize that stating our maxim up front exposes us to ridicule by anybody who finds logical inconsistency in our work. By the same token anybody who tries to throw our maxim back at us in this manner is similarly exposed. Gittens has taken that risk. We leave it for others to judge on whose face the egg is.
By the way, our corollary (originally proposed by Darwen) has been claimed to be a logical mistake! It was unkindly pointed out to us that a small basketball player isn't necessarily a small person. In other words, smallness and bigness are "overloaded" for different types and the "big" in "big difference" is not necessarily the same "big" as the one in "big mistake". (This kind of overloading, by the way, is not supported by The Third Manifesto's proposed model of type inheritance.)
Logical Consistency
Gittens appears to think we are guilty of logical inconsistency in The Third Manifesto. According to our understanding, a theory T is logically inconsistent if the set of propositions that can be concluded to be true underT includes a proposition p such that ¬p is also a member of that set; otherwise T is consistent. It is not clear to us which statements of ours are claimed by Gittens to lead to inconsistency.
The First Great Blunder
Gittens appears to be disputing our claim that it is a blunder to equate relation variables (relvars) and object classes. (Our justification for this claim is that a type is neither a relation nor a variable.) In what we take to be an attempt to bolster his argument, Gittens makes a correct statement about relation types, where we are expecting him to say something about relation variables. But of course a relation type is a type! We cannot conclude from that fact that a value of that type is a type; it follows a fortiori that we cannot conclude that a variable of that type is a type. But in any case, Gittens does not explain why he thinks The Third Manifesto is weakened by its rejection of "the wrong equation", if indeed he does think that. (And if he doesn't, why bother to quibble with our argument?)
The Second Great Blunder
Gittens claims that we "reject identity" but offers a definition of that term that appears to make it refer, not to the pointers that are the subject of The Second Great Blunder, but to a concept we fully and earnestly embrace! But in any case, Gittens's explanation of why he thinks The Third Manifesto is weakened by its rejection of pointers concerns a certain operator (called foreach) that he would like to see in his ideal D but believes would be prohibited under the terms of The Third Manifesto. He does not tell us exactly which Prescriptions or Proscriptions he thinks militate against inclusion of that operator. From the little we can see at the moment, we have no reason to suppose that something like foreach could not be supported if desired. Indeed, it is surely made reasonably clear on pages 200-201 that we do not wish to preclude such operators.
Identity
Gittens writes, "Identity is a fundamental property of all things by which they can be counted. If the elements of mathematical sets did not have identity they would not be countable." We wouldn't argue with this, but we note that in most textbooks on logic, the term identity refers to a predicate. This does not necessarily conflict with Gittens's view. The following extract from Wilfrid Hodges's eminently approachable Logic, published by Penguin Books Ltd. in 1977, might illuminate:
One
particularly important predicate is the 2-place predicate
x1 is one and the same thing
as x2
This
predicate is called identity; in
symbols it's written 'x1 =
x2', and the symbol '=' is
read 'equals'. A sentence got by
putting designators in place of 'x1'
and 'x2' in [the identity
predicate] is called an equation.
Various
English phrases can be paraphrased by means of identity. For example:
Everest
is the highest mountain in the
world.
Everest = the highest mountain in the world
Cassius
Clay and Muhammed Ali are the same
person.
Cassius Clay = Muhammed Ali.
This
is none other than the lost city.
This = the lost city.
Two
plus two equals four.
Two plus two = four.
The
word identical is normally used in
English to express close similarity rather than identity. For example, identical twins are not one and
the same twin and two women wearing identical dresses are not wearing one and
the same dress.
The Third Manifesto is in complete accord with Hodges here. Note that Hodges does not refer to counting, though we would have to agree that it is not possible to count things we cannot distinguish from each other. (By the way, the elements of the "mathematical set" of real numbers cannot be counted. Perhaps this is because in Gittens's view they do not all have identity?)
More importantly, note Hodges's use of the term designator for terms that can be substituted for the free variables in a predicate to yield a proposition. His examples of instantiations of the identity predicate show how different designators can refer to the same thing. Does Gittens's "identity" refer to some method of referring to something other than by using a designator? We reject any such notion: Whereof we cannot speak, thereof we must remain silent! Or does Gittens think that for every object in the domain of discourse there must be some special designator for that object, which he calls its identity? We reject that notion, too, in general. The same number is designated by 2 in decimal notation and 10 in binary; we have no reason to pick out either of those, or any other possible designator of the second counting number, as being special.
In Codd's model, attribute values are to be interpreted as designators. Furthermore, it was essential to his model that they represent what we have seen referred to as rigid designators. An occurrence of a designator is rigid if it refers to the same thing in all possible situations. Terms like "two", "Everest", and "Cassius Clay" are normally rigid in the sense that "the highest mountain in the world", "the heavyweight champion of the world", and "the president of the USA" are not normally rigid (they all potentially refer to different things at different times or in different contexts). Rigidity depends on the use: "The president of the USA" is nonrigid in "Hugh Darwen is the president of the USA" but rigid in "The president of the USA is the person most recently elected to that post". (The truth or falsehood of that proposition is, of course, irrelevant here.)
Variable names, pointers, and object identifiers (in the
O-O sense of that term) are all nonrigid designators, precisely because they
designate variables. Contrary to
Gittens's often-repeated claim, we do not reject the logical concept of
identity—on the contrary, we wholeheartedly embrace it. But we firmly reject the use of nonrigid
designators in relations.
Gittens correctly
mentions a secondary reason we have found for rejecting object identifiers:
[They] describe a problem with object identity and their inheritance model. It is a fallacy to assume that this
problem would exist with some other inheritance model.
But we do not make that assumption.
We refer to the work of Zdonik and Maier in which they present four
"desiderata" (for a type system), namely: substitutability; static
type checking; mutability; and "specialisation via constraint". (We put that last desideratum in quotes
because the authors appear to mean simply "type constraints", but
that is not important—specialisation by constraint as we mean it implies the
existence of type constraints.) Zdonik
and Maier conjecture that it is possible to support any three of these desiderata
but not all four together. In Part IV
of the book we believe we refute this conjecture by defining a model of type
inheritance in which all four desiderata are supported. We go on to explain, in Appendix G, that it
appears that Zdonik and Maier are tacitly assuming that a fifth desideratum
(for object identifiers) is always supported.
We find that support for object identifiers cannot coexist with
specialisation by constraint. We
observe that object identifiers had already been rejected on our behalf, so to
speak, by E.F. Codd himself. Therefore
there is no reason for The Third
Manifesto to reject specialisation by constraint, and it doesn't (nor does
it require it, by the way, though it can only be omitted if type inheritance is
omitted altogether).
Types, Domains and Object Classes
With reference to Gittens's Section 3.2, Introduction to predicate logical models, we observe that the domain of discourse of a database, under the interpretation intended by The Third Manifesto, is the set consisting of every value that could legally appear as an attribute value of some tuple of some relation derivable from that database by evaluation of an expression in whatever D is being used to access it. That domain of discourse is partitioned into subsets called types. (Codd called them domains, and thus generated a certain amount of confusion, which is why we no longer use that term for the concept in question.)
Given that attribute values represent designators whereas tuples represent propositions, and given that logicians clearly use the terms designator and proposition to refer to importantly different concepts, we think we are justified in claiming that there is a logical difference between the concept of an attribute value and the concept of a tuple (even though some attribute values happen to be tuples). Now, the set of permissible values for a given attribute is called a type, whereas the body of a relation is a set of tuples. To equate type and relation, therefore, is to equate two logically different concepts. To equate two logically different concepts has to be a logical mistake. Moreover, the logical mistake in question is compounded by the fact that it is not actually types and relations that are being equated, but types and relation variables. So we have two logical mistakes here. We feel fully justified in calling that a blunder, and a great one at that. We are not moved by Gittens to retreat from that position.
Predicate Constants
Gittens expresses a desire for predicate constants to be able to appear as values. In other words, he wants D to support a type, or perhaps a type generator, whose values can be operated on by whatever operators he would like to be available on predicate constants. He does not tell us what those operators are, so we cannot tell if such a type or type generator would be in contravention of The Third Manifesto. In his examples, "Mark loves to love" and "Jane loves to miss", it seems clear that the noun phrases (hence designators) "to love" and "to miss" do not refer to the propositional value of the predicates x loves y and x misses y. They are rigid designators.
Regarding operators on predicate constants,
what operators does Gittens expect to be available on, for example, the
predicate constant of the triadic predicate a + b = c? In any case, what exactly is
that predicate constant? Surely not
what remains when we strike out the variables, for that would yield "+
=", which would also be the predicate constant of the dyadic predicate a + b = a. (According to the definition
given in http://en.wikipedia.org/wiki/First-order_predicate_calculus),
a + b = c includes an appearance of the predicate constant = and an
appearance of the function constant
+. It is not clear from this that every
predicate has exactly one predicate constant.
For example, a = b Ù b = c has two appearances
of =.) [Added
22nd November, 2004: further thoughts on predicate constants are given in
Appendix A.]
The Proposed foreach operator
The Third Manifesto demands the existence in D of certain operators. For example, RM Prescription 8 mandates support for equals and RM Prescription 18 mandates support for "the usual operators of the relational algebra". The RM Proscriptions mention certain kinds of operator that D is expressly forbidden to support, but we cannot find any that would clearly militate against inclusion of foreach. Whether foreach is really a good idea or not, we cannot judge on the evidence available. We remark that it seems peculiar to have an operator that sometimes returns a value when it is invoked and sometimes does not, and whose operands are, even in the cases where a value is returned, required to be variable references in particular and not expressions of arbitrary complexity in general. Thus, some of the varieties of foreach might be in contravention of The Third Manifesto's definition of read-only operator. But that is not a reason given by Gittens for foreach being in contravention of The Third Manifesto.
The Expressive Equivalence of Relation Values and Tuple Values
The relevance of Gittens's Section 4 on this subject is not clear to us, nor is the importance, if any, that he attaches to its conclusions. But we reject it anyway because it appears to contain a logical mistake. In Section 4.4, his proposed ordered triple representation of the body of a relation appears to allow two or more tuples in the same body to have the same value.
A
Footnote on "A Codd inspired amendment ...".
[This footnote does not appear in the response published in Database Magazine]
In "A Codd inspired amendment to my critical reading of The Third Manifesto", Gittens claims that a certain amount of support for his position is expressed in E.F. Codd's "Extending The Relational Model to Capture More Meaning" (1979). I have never expressed any support for the referenced work. Indeed, I recoiled from it at the time, precisely because I had always thought, as I still do, that the strength of the Relational Model, like that of logic, lies partly in its disregard for meaning. Much as we admire Codd's original work, Date and I have found ourselves in disagreement with him on a number of issues that he subsequently addressed.
Regarding the First Great Blunder, it is not clear to me that Codd can be interpreted as having expressed support for the equation relvar = class. In any case I continue to think that equation to be a grave error.
Regarding the Second Great Blunder, I do think that Codd might have overemphasised the importance of surrogate keys, but I do not accept that surrogate key values are object identifiers in a different guise. Any distinction between surrogate keys and nonsurrogate keys is not a logical difference. There is a logical difference between key values and object identifiers. Apart from anything else, an object identifier in general identifies a variable; a key value certainly does no such thing, and Codd would certainly never have proposed or condoned the possible existence of variables other than relvars in a relational database.
HD:
My comments are imbedded in this style. Any that look like questions can
be regarded as rhetorical. The text in which they are imbedded was copied from this
PDF file, sent to me by the editor of Database magazine. With the
author's permission, I have corrected two or three awkward typographical errors
in that draft.
HD:
This title is misleading. The Third
Manifesto per se (Chapter 3 of
the book) consists of six sections containing in all 58 numbered points. These 58 points form the basis of the
dissertation. One would expect a
critical reading to refer explicitly to some or all of these 58 points. This paper refers to none of them, but only
to some of the book's introductory material.
Maurice Gittens <maurice at gittens dot nl>
14th July 2003
Abstract
According to the authors, Hugh Darwen and C.J.
Date, of the book entitled "Foundation for Future Database Systems: The
Third Manifesto" the maxim: All
logical differences are big differences and its corollary All logical mistakes are big mistakes has
been central to their work on this book. Respecting the standard set by this
maxim and its corollary, this paper will proceed to identify a number of issues
with the logical consistency of the dissertation presented in The Third
Manifesto, using maxims such as: logical
conclusions should only be drawn from premises which are both valid and
relevant.
HD:
It is not
exactly incorrect to place the maxim at the centre of the work, but the maxim
is only that: a maxim. To suggest that
the work has a mere maxim at its core without mentioning its solid technical basis (The Relational Model of
Data) might give the impression that the work is somewhat frivolous in nature. We could delete the maxim and all references
to it without altering the substance of the book.
The copyright of this document belongs to its author. Making complete and
unmodified copies of this document is allowed.
Status: draft
· July
14 2003; Fix typo in the title of the document
· April
7 2003; More cleanups
· February
26 2003; Based on comments by Hugh Darwen I reworded a few sentences which
seemed to cause confusion; I also fixed a few typographical errors
· January
8 2003; Rene Jansen made me aware of another reason for the dismissal of
ObjectIDs provided by The Third Manifesto. Add this to the section about the
alleged second great blunder. Thanks Rene.
· January
6 2003; A first draft of this document
Contents
1 Introduction 2
1.1
Background information . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 On a
personal note . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 About the claims made by the author 3
2.1 Regarding
the _rst great blunder . . . . . . . . . . . . . . . . . . . . 3
2.2 Regarding
the second great blunder . . . . . . . . . . . . . . . . . . . 4
3 About Predicates, Relations and their identity
6
3.1 Some
examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2
Introduction to predicate logical models . . . . . . . . . . . . . . . . 7
3.3 Why is
identity deemed a necessity? . . . . . . . . . . . . . . . . . . 8
3.4 Summary .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 On the expressive equivalence of relation values
and tuple values 10
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Defining
tuple values . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 Defining
relation values . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.4 Showing
that all relation values are tuple values . . . . . . . . . . . . 11
4.5 Showing
that all tuple values are relation values . . . . . . . . . . . . 12
4.6 Summary .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5 Conclusions 12
1 Introduction
1.1 Background information
A web page at http://www.gittens.nl/OOR.html raised a number of issues
the author found with the logical consistency of the dissertation presented in
the second edition of the book "Foundation for Future Database
Systems"1[1] by C.J. Date and
Hugh Darwen. In a personal
communication Mr. Hugh Darwen, requested I clarify my use of certain English
words and also that I be more specific as to the issues I found with the
dissertation presented in The Third Manifesto. This paper is written as an
attempt to comply with the request of Mr. Darwen.
The main issue and my primary claim
The main issue I perceive, with the logical consistency of the
dissertation presented in the The Manifesto, follows from the maxim Date and
Darwen presented as central to their work in The Third Manifesto. Date and
Darwen presented the maxim: All logical
difference are big differences and its corollary All logical mistakes are big mistakes as a guiding principle in
their work on the Third Manifesto. The Third Manifesto proceeded to identify
what it refers to as the Two Great Blunders:
HD: If the
dissertation suffers from logical inconsistency, then it must be possible to
discover some proposition p such that
both p and ¬p can be concluded from the dissertation. Gittens does not actually show such a proposition, so I would
argue that the claim that the dissertation is logically inconsistent is not
justified by this paper.
· Equating
relvars and classes
· Mixing
pointers and relations (or more specifically allowing database relvars to
contain object IDs)
However, in my humble opinion, the argumentation used to substantiate the
claim that the alleged blunders are truly to be viewed as blunders is somewhat
weak.[2] This opinion
is based on the following maxim: logical
conclusions should only be drawn from premises which are both logically valid[3] and relevant. HD: We would clarify "valid" as "agreed to be
true". We don't see the relevance
of "relevant" here: surely one can draw a valid conclusion from an
irrelevant premise? Put another way, keeping in mind the maxim, All logical differences are big differences,
logically valid conclusions are conclusions based on premises free of
fallacies, including fallacies of relevance.[4] HD: Why fallacies?
Why not just falsehoods?
I think it important to state explicitely, that my claim in this regard,
is not that Date and Darwen are wrong in
their opinions. My claim, in this regard, is that the substantiation they provide as justification for their statement
that the alleged great blunders are indeed great blunders is rather weak,
relative to the high standard they claim to be central to their work.
HD: There's a
big difference—a logical one, indeed!—between logical inconsistency and
"rather weak" substantiation.
Regardless of which of those two Gittens really means, it is not clear,
yet, whether he
·
agrees that the Great
Blunders are indeed blunders but seeks a stronger justification for the
appellation; or
·
agrees that the
characteristics referred to by the Great Blunders are indeed undesirable, but
thinks that inclusion of those characteristics is not so undesirable as to
merit the term "blunder"; or
·
disagrees that the
characteristics referred to are undesirable (though perhaps this possibility is
supposed to be ruled out by his assurance that he does claim we "are wrong in [our] opinions").
It
seems likely that the second reading is the intended one, in which case we
question whether the paper really is raising a big issue with our work. It seems more like a quibble.
In this regard the question is asked whether or not the alleged two great
blunders are indeed blunders. This subject matter will be addressed in
subsequent sections of this paper.
1.2 On a personal note
I think it appropriate to state my appreciation for the fact that Mr.
Hugh Darwen has thought it appropriate to spend time communicating with me
about the issues I raised.
2 About the claims made by the
author
2.1 Regarding the first great blunder
The first alleged great blunder identified
in The Third Manifesto follows:
Now please consider the question: What
arguments that adhere to the strict discipline of logic does The Third
Manifesto provide for the claim that this equation is indeed a blunder? In
considering an answer to this question it is noted that this claim[5]
is first made on page 15 of second edition of the Third Manifesto, which by its
own admission (on page 14) is informal in nature. Still lacking a
mathematically sound definition of what an object
class is, page 21 decides that object classes and domains are, I quote: "the same thing". HD: Agreed that this claim by us is weakened by the lack of
a rigorous (never mind "mathematically sound") definition of object class. So we are merely arguing that they are the same thing,
rather than rigorously proving it. It
is not clear whether Gittens would argue differently. Since the statement is made in an informal
context, one wonders if classes and domains are informally the same thing or also
formally[6]
the same thing. The reason given to substantiate the claim that object classes
and domains are determined to be the same thing is presented as the fact that
for both, domains and object classes, it holds that their values are
manipulated by operators defined for the type in question. However, the same
argument can be made for relations. Is it
not the case that relations values are manipulated by a set of operators
defined specifically for their types. HD: Yes. Actually, each of the operators of the relational
algebra is defined for all relation types. Yes, relation types have a set of pre-defined operators, does
this make them logically different from a specific class of domains which have
a pre designated set of operators? No, it does not! HD: We definitely agree that relation types
are types. We make the point very
strongly in the book. OK,
since this argumentation is said to be informal, the author proceeded to seek
out, the formal arguments presented, for labeling the first great blunder as
such. Apart from reiterations of the alleged first great blunder, no such
argument has currently been found by the author.[7]
Additionally, Date and Darwen, seem to assume[8] that there is one
so-called right way, in which objects
and relations should be integrated. Does not the discipline of logic dictate
that one must prove, that there exists only one right way, before one could
even claim to provide the one right
way? Is it not possible that there are different ways, each with its own merit,
to achieve the integration between objects and relations?
HD: Gittens
has not even questioned the correctness of The First Great Blunder here. He agrees with us that a relation type is a
type, but The First Great Blunder is to regard a relation variable as a type. Does
Gittens wonder if it's reasonable to regard a variable as a type? A relation as a type? We do not think either is reasonable or desirable, for reasons stated in the
book.
2.2 Regarding the second great blunder
The Third Manifesto identifies the second great blunder as:
Mixing
pointers and relations (or more specifically allowing database relvars to
contain object IDs).
Using the index of The Third Manifesto I have found the following reasons
why object IDs or references[9]
are unwanted by Date and Darwen.
1. Codd's
information principle: All information in
the database at any time must be cast explicitly in terms of values in
relations and in no other way or All
interrelating between different parts of a database must be achieved by
comparison of value.
2. The
reason Codd removed pointers from the relational model is stated as: It is safe to assume that all kinds of users
[including end users in particular] understand the act of comparing values, but
that relatively few understand the complexities of pointers [including the
complexities of referencing and dereferencing in particular]. The relational
model is based on this fundamental principle... [The] manipulation of pointers
is more bug-prone than is the act of comparing values, even if the user happens
to understand the complexities of pointers.
3. On
page 417 of the second edition, the paragraph entitled : "OBJECT IDS
UNDERMINE INHERITANCE".
Concerning the first two points I ask the question: Of what logical value are these arguments?[10] The fact that Codd
rejects pointers on the grounds that they are "difficult to
understand" and "bug-prone" is of no logical value, and as such
in the context of providing a justification for the so-called second great blunder, these arguments
represent a fallacy of relevance. This is not to say that the statement is not
true by some measure. It is only to say that such statements do not provide
logically valid grounds for the dismissal of object-IDs. HD: We wouldn't argue with Gittens on this point. Do we claim anywhere in the book that we
have a rigorous proof of the fact that The Second Great Blunder is a
blunder? Anyway, perhaps we should have
added that according to our understanding the mathematical theory of relations
requires the designators represented by attribute values to be purely referential (Hodges). Pointers certainly are not purely
referential. Concerning the
third point, the following issues:
·
First, a
fallacy is exposed by this quote from the page referenced: "Pointers can lead to a serious problem if type inheritance is
also supported". This provides the in-site that Date and Darwen
confuse object identity and pointers. HD: Actually, we claim that an object identifier (as defined in typical OO programming languages)
has to all intents and purposes the same behaviour as a pointer. It is not clear to us whether Gittens's
concept of object identity is this same OO programming language concept.
·
Second,
they proceed to describe a problem with object identity and their inheritance model. It is a fallacy
to assume that this problem would exist with some other inheritance model. HD: We make no
such assumption. We refer to the work
of Zdonik and Maier in which they present four "desiderata" (for a
type system), namely, substitutability, static type checking, mutability, and
specialisation by constraint. They
conjecture that it is possible to support any three of these desiderata but not
all four. In Part IV of the book we
believe we refute this conjecture by defining a model of type inheritance in
which all four desiderata are supported.
We go on to explain, in Appendix G, that it appears that Zdonik and
Maier are tacitly assuming that a fifth desideratum (for object identifiers) is
always supported. We find that support
for object identifiers cannot coexist with specialisation by constraint. We observe that object identifiers had
already been rejected for us, so to speak, by E.F. Codd himself. Therefore there is no reason for The Third
Manifesto to reject specialisation by constraint, and it doesn't (nor does it
require it, by the way, unless some kind of type inheritance is
supported).
These two points identify the third reason supplied for the dismissal of
ObjectIDs as a fallacy. HD: We do not
understand the point being made by this sentence.
Please consider the position of Hugh Darwen on identity as it was
presented in a personal communication [ref 3].
We
do not recognize any concept of identity of a value v other than v itself. A
truth-valued expression of the form x = y is true if and only if the values
denoted by the expressions x and y are identical are in fact one and the same
value. Given equality, we do not need any other concept to do with distinction
of values. In case the distinction you are referring to is the one found in
some OO programming languages, I remark that in such languages equality is as
in our definition (though "=" is sometimes sacrificed, with
unpleasant consequences, in favor of an operator with the same name but meaning
"approximately equal to"), whereas what you call identity is equality
of pointers (usually called object identifiers), and a pointer points to a
variable, not a value. As you note in your very next section, we do not admit
pointers.
Identity is a fundamental property of all things by which they can be
counted. If elements of mathematical
sets did not have identity they would be not be countable.[11] To put this another way, Identity can be viewed as a property of an
element of a set. HD: Is this
relevant to the real issue at hand (The Second Great Blunder)? Anyway, we remark that the set of real
numbers is not countable. Fortunately,
The Third Manifesto does not deal with such sets; as far as the sets it does
deal with are concerned, we agree with Gittens. Equality on the other hand, is a correspondence between two or more
elements of a set. For example, if v1
; v2; :::; vn represents a set of relation variables.
Each variable in this set has a distinct identity, otherwise it would not be
possible to distinguish it from other variables in this example set. The
identity of these variables is orthogonal to the issue of whether or not some
of these relation variables are equal or not. HD: We agree with that, too. We require the existence of a comparison
operator (called "=") to determine whether two expressions, including
in particular two relvar references, denote the same value. We do not require the existence of an
operator for the specific purpose of determining if two expressions denote the
same variable. This would require the
existence of a type each of whose values is a possible variable name. We do not require the existence of such a
type and in fact we explicitly prohibit it if the operators for that type were
to include what is commonly called "dereferencing". And for the sake of completeness I wish to
state that it should be evident that the fundamental concept of identity has nothing to do with the
concept of pointers as they are know
in different programming languages.[12] HD: Yes, that is evident.
3 About Predicates, Relations and their identity
This section presents what is in the opinion of the author a logically
sound motivation for the support of identity in future databases. To comply
with Hugh Darwen's request for specific examples I start with some examples.
3.1 Some examples
Consider some example functionality. Let rv1 ; rv2; :::; rvk be relation variables of the same relation type.
Let v1 ; v2; :::; vk be the relation values of rv1 ; rv2; :::; rvk. I would like
to be able to ask the equivalent of following questions in the query language
of the database: Which set of relation
variables has the value v2. Or which relation variable has the greatest number of tuples with a
particular property? The database system would in turn respond with a properly typed set of identities
corresponding with the result of the query. The catalog of common SQL-databases
might be used for such purposes however, in current relational database systems
the types of the objects returned would, as you know, be incorrect. This forces
people working on business-repositories, data-mining applications etc, to build
much logic into their applications which, in my opinion, should be gracefully
handled by databases of the future. If the result of a query can be an entity
representing a relation variable, or a type, or a tuple variable etc, the
logical expressiveness[13]
of the database is increased. If the information in the catalog of relational
databases were properly typed much of the necessary machinery would be present
in database systems. HD: We do not deny
that these are interesting requirements, but nor are we aware of having written
anything in The Third Manifesto to militate against their
fulfilment. Perhaps Gittens believes
that OO Proscription 2 is the obstacle in question. In that case, we would draw his attention to the discussion of
that Proscription in Chapter 9, on pages 198-201.
HD: It would
be helpful if Gittens could show exactly which Prescription or Proscription of The
Third Manifesto militates against the existence in D of the proposed operator, and why. From the evidence given here, we have no reason to suppose that
the operator is prohibited.
An operator which in my opinion is necessary, in one form or the other,
in future database is the foreach operator.
This would be the database counter part of the universal quantifier operator
known from predicate logic. [14]Hopefully
self-explanatory, informal 15 examples, using this operator in an SQL like
language follow:[15]
Example statement Description
foreach relation r select * from r; select
all tuples in the default schema
foreach relation r in schema example_schema delete from
r; remove all
tuples from relation variables in a schema
foreach relation r in schema example_schema delete r; drop all
relations from a schema
foreach schema s foreach relation r in s select * from r select
all tuples in the database
foreach relation r select * from attributes(r) select
the attributes of all relations
foreach relation r where r.someProperty() == true select
* from r select all attributes of
relations with some property
It is important to note that the type
of r in a statement like: foreach relation r .... is a relation type. HD: We note that the term relation appears to stand for relation
variable in each of these examples.
It seems that the type of r is
actually several relation types, on account of the fact that several relvars
are typically not of the same type.
Also, it is not clear that what is returned, in the first example at
least, is a relation. (If it is claimed
that it is a relation, then what is its heading?) We cannot comment further on these suggestions without seeing
them fleshed out. The
logical variable r is said to be
bound to a predicate constant, representing the identity but not the propositional value of the predicate[16].
Please note that supporting the foreach
operator, SQL statements like ALTER and DROP statements may be replaced by
appropriate uses of UPDATE and DELETE statements. Thus showing, these and
similar, statements to be redundant. HD: It is not
clear why "delete r" should
be interpreted as "drop r",
nor why DROP is made redundant. It
seems that DROP has merely undergone a change in spelling, to DELETE (without
the FROM). The Third Manifesto
has a Prescription (RM Prescription 25) to support DROP and ALTER via DELETE
and UPDATE on catalog tables.
3.2 Introduction to predicate logical models
A model M for a first order
predicate logical language L is a
pair (D; I) such that :
·
D represents the domain of discourse of the model M. This is the set of objects which can be bound to variables in L. In relational systems, objects in the
domain of discourse may be viewed as domain values. In relational systems,
domains partition the domain of discourse into a set of of disjoint subsets.
Such that the union of the set of all domain values in a RM database is exactly
equal to the set of objects in D. HD: Agreed.
·
I represents the interpretation function of the model M. Since I is a mathematical function it by
definition has a domain and a co-domain, denoted dom(I) and codom(I) respectively. HD: Agreed, though reference to the concept as a function is
a new idea for me.
In first order predicate logic each object d in the domain of discourse D
has an associated constant c in
the predicate logical language L which
represents it in the language L. Using the interpretation function I, each predicate P of arity n in the
predicate logical language L assigns
the property represented by P to a
set of n tuples HD: n-tuples?
{t1 ; :::; tk} where each ti (1 £ i
£ k)
can be written as ti = (d1; :::; dn) where each dj is an object in the domain of discourse D. As an example let us consider a model
M for a predicate logical language L with constants {a; b; c; Mark; Jane} and predicates {odd; love; miss; rich}. In this example the domain of discourse D of M
is D = {1; 2; 3; "Mark"; "Jane"}, while an example interpretation function I for M is presented in the following table.
dom(I) codom(I)
a 1
b 2
c 3
Mark ".Mark"
Jane "Jane"
Love {("Jane";
"Jane"); ("Jane"; "Mark")}
Miss {("Mark";
"Jane")}
Rich {"Jane"; "Mark"}
Odd {1; 3}
This example represents statements like:
·
Jane loves both herself and Mark
·
Mark misses Jane
·
Jane and Mark are both rich
HD: And,
crucially, what statements are represented by the first five lines in the
table?
The identity of the predicate love captured by the predicate constant love which, in the example above,
appears in domain of the interpretation function I. The propositional value or
the value of the predicate love in this example, is the set of
tuples {("Jane"; "Jane"); ("Jane"; "Mark")}. When the Third Manifesto
speaks of the relation value it is referring to the propositional value of a
predicate. HD
Correct. In this example love and
miss are binary predicates, so the
interpretation function I maps them
to sets of binary tuples. The interpretation function I maps the constants in the language L to elements of D and
unary predicates are mapped to subsets of D. The information contained in the
interpretation function of a predicate logical model can be viewed as the
predicate logical equivalent of a database. Relational algebra can thus be
viewed as an algebra defining operations on a subset of the co-domain of
interpretation functions of predicate logical models, more specifically relational algebra defines a number of operations
on the propositional value of predicates.
HD: We
would not dispute any of this, even if we don't use such terminology ourselves.
3.3 Why is identity deemed a necessity?
The predicate logical language L in
the previous section was based on the object constants {a; b; c; Mark; Jane} and the predicate constants {odd; love;
miss; rich}. Now please notice that the codomain of the interpretation function I in the previous example contains no appearances of either object
constants or predicate constants. Put another way, there are no appearances of
elements of dom(I) in codom(I). The reason for this is quite
simply that First Order predicate logic[17] does not allow
object constants and predicate constants to be part of the domain of discourse D. HD: Agreed. As far as my understanding reaches, Codd's information principle is, at least in
spirit, referring to this fact. HD: I never thought of it that way.
When value substitution
is not enough Now
please consider a modified interpretation function as an extension of the
previous example. This example will attempt to illustrate that by allowing
so-called predicate constants to appear in the co-domain of the interpretation
function, more sophisticated HD: i.e., second-order? and higher? logical
statements can be made[18].
dom(I) codom(I)
a 1
b 2
c 3
Mark "Mark"
Jane "Jane"
love {("Jane";
miss); ("Mark"; love)}
miss {("Mark";
"Jane")}
rich {"Jane";
"Mark")}
odd {1; 3}
In this example the predicate love is
used to make the statement that Mark
loves to love and also the statement Jane
loves to miss. Notice that it would be incorrect to substitute {("Mark"; "Jane")}, which is the propositional value of the miss relation, for the predicate
constant miss in this example? Such a
substitution would represent the claim Jane
loves the set {("Mark", "Jane")}, which is clearly a
different statement than the statement Jane
loves to miss. HD: Agreed, but don't see where this is going. We are still in first order.
Of course one could argue that such expressiveness is not necessary. HD: We wouldn't
argue that way, because you get it anyway with relational completeness. Or do we misunderstand something? This
however, does not seem prudent when the purpose is to define a foundation for future databases. By allowing predicate
constants into the domain of discourse it now becomes possible to ask question
like: What does Mark love to do? Or Select all the people who like to love
people or miss people and also What
do people love to do? I would hope that models for future databases, how
ever they are called, would at least define operators which allow, the
manipulation of and access to, objects in the domain dom(I) of the interpretation function I. It is also desired that predicate constants are added to the
domain of discourse.[19]
HD:
Again,
surely this "expressiveness" comes with
relational completeness. The terms
presented to us as standing for predicate constants don't seem to have anything
special about them. If the system is
aware that they stand for predicate constants, then we need to know what
operators are envisaged to operate on values of type PREDICATE_CONSTANT.
Generic data-mining applications which search for "trends" in
databases, generic business repositories, generic database applications, which
automatically generate user interfaces allowing user friendly access to
databases, intelligent agents which master the art of speech, etc. are examples
of applications which would benefit from this.
3.4 Summary
The fact that The Third Manifesto rejects constants representing the
identity of objects in databases is in my opinion a logical error and as a
consequence a big mistake. This rejection of identity is a logical error on the
following counts: HD: Gittens has failed to explain what
"constants representing the identity of objects" are, and why he
thinks we reject them.
·
The
Third Manifesto rejects identity on
grounds which are not relevant in mathematical logic[20] HD: We do not
"equate identity to pointers".
We equate object identifiers (as found in languages like Java and C++)
to pointers.
·
Key
concepts like relation variables and candidate keys, are not recognized within
the relational algebra of The Third Manifesto. Since these concepts are,
according to The Third Manifesto, required
in future databases, it is an
error to not give them a sound mathematical foundation[21]. HD: It doesn't make sense to us for an algebra to
"recognize" variables, nor does it make sense for a relational
algebra in particular to "recognize" candidate keys. By the way, it is not clear what Gittens
means by the relational algebra of The Third Manifesto. RM Pre 18 requires "the usual operators
of the relational algebra (or some logical equivalent thereof)" and lists
some specific operators that are required to be supported "without
excessive circumlocution".
·
In the
definition of a tuple value, it is evident that tuple values include an object
identifier called an attribute name. HD: Now it is
clear that what Gittens means by object identifier is not what we mean! Contrary
to what is claimed by Date and Darwen[22],
the value of a triple, or tuple with a arity of three, representing an
attribute does not define its identity HD: We do not
refer to the concept of defining something's identity. The object
identifier attribute name defines the
identity of this triple in a tuple because it is the attribute name that must be unique.[23] HD: Agreed that the attribute name uniquely identifies the
triple within a given tuple. Disagree
that this contradicts anything else we have written.
Adding insult to injury, the rejection of identity also limits the logical expressiveness of the algebra
upon which future databases are might be based. HD: We reject that we are "guilty"
of rejecting "identity". This opinion has been
substantiated by illustrating the correspondence between relational and
predicate logical knowledge representation models. In terms of relational
database systems the following suggestions are made in this regard:
·
Allow
for a properly typed equivalent of predicate constants, representing the
identity of a predicate. HD: Consider the
predicate a + b = c. According to my recent reading on the
subject, this predicate contains an appearance of the predicate constant = and
also an appearance of the function constant +. I see no predicate constant representing the identity of the
predicate. That said, I think it's a
nice idea to consider a relvar name as being a special case of a predicate
constant, and in this special case the predicate constant can perhaps be
considered to represent the identity of the predicate. But if we want special operators for operating
on relvar names, and I'm right in guessing that such operators can't be
extended to apply to predicate constants in general, why not just call the
operands relvar names? Properly typed object identifiers serve this
purpose well. HD: We reject OO-style object identifiers for
the reasons given in the book. In any
case, I don't see how every oid can be considered to represent the identity of
a predicate. In Java, the invocation
"new point(1.0, 1.0)" returns the oid of a point object (variable)
that has been initialized to the indicated value. In what sense is that oid a predicate constant?
·
Allow
for operators which provide access to, and the manipulation of, the equivalent
of the domain of predicate logical interpretation functions[24]. HD: We subdivide this domain into subsets called types. The operators defined for values and
variables of those types provide "access to" every value in each type
and hence every element in the domain of discourse.
tuple
values
HD:
We do not understand the purpose of this Section. It seems to lead nowhere.
We do not understand "expressive equivalence".
4.1 Introduction
This section will show that every tuple value has a corresponding
representation as a relation value. Conversely every relation value will be
shown to have a corresponding representation as a tuple value. This exercise
will be performed using liberties allowed by The Third Manifesto.
HD: The demonstration
that follows is merely a feat of prestidigitation. We reject it because, even if we accept its validity, it is only
a demonstration of stuctural similarity.
For language design purpose we distinguish things by their
"behaviour" (i.e., the operators defined for them), not their
perceived structure. We define
structure in order to be able to define operators.
4.2 Defining tuple values
Let us consider tuple values and relation values as they are defined in
chapter 3 of The Third Manifesto. A tuple t
is defined as a set of ordered triples (I;
T; V) called attributes. Such that:
_ I is an identifier called the
name of an attribute. No two
attributes in t share a common name.
_ T is an identifier
representing the type of an attribute.
_ V is a value of type T,
called the attribute value.
The set of pairs obtained by eliminating the attribute value from triples
in t is called the heading of t . The heading of a tuple t will
be denoted: heading(t ). When the purpose is to show that
Relation values and Domain values are basically appearances of one and the same
thing, one is inclined to demonstrate that any relation value can also be
represented by a set of triples. So, please read on...
4.3 Defining relation values
The Third Manifesto defines a relation r as a pair (h; b) where
:
·
h represents the heading of r. The heading h is defined to be a tuple heading.
·
b represents the body of r. b is
a set of tuples all conforming to the heading h.
In the following it will be demonstrated that every relation value[25] can be represented
by a mathematically equivalent tuple value[26]
4.4 Showing that all relation values are tuple values
The purpose of this section is to illustrate that, by the liberties
provided by The Third Manifesto, all relation values are tuple values. Let r = (h;
b) be a relation value with heading h and
body b. Since tuple values are sets of ordered triples it becomes necessary to
demonstrate that all relation values are similarly representable as sets of
ordered triples. The body b of the
relation r will now be defined as a
set ts of ordered triples (I; T; V), such that:
·
I is an identifier called an object
identifier HD:
This construct appears to have no counterpart in The Third Manifesto. Therefore we reject it.
·
T is an identifier representing name of a type.
·
V is a value of type T.
To insure that this set of triples t
s forms a valid tuple the following conditions must hold for all triples (I; T; V) in ts :
·
no two
object identifiers in ts are equal
·
the type
T is defined to have the same type as
the heading the h of r, which is to say: T = heading(r).
·
V is a tuple value of type T.
HD: This
3-part definition does not accord with The Third Manifesto and we reject
it out of hand. It permits two or more
tuples to have the same V, contrary
to RM Proscription 3. Furthermore, the
first listed component, I, is
something we do not recognize and would in any case be redundant.
When these conditions hold, ts will
be a valid tuple, containing the same information found in r. Can this exercise not be performed for any and every relation
value? HD: No, not for any of them!
4.5 Showing that all tuple values are relation values
Let t be a tuple value the
heading of which is denoted heading(t). A relation value r = (heading(t); t)
is quickly recognized as a relation value which is in no logically significant
way different from t. HD: We disagree. r
is something that can be operated on by the relational restriction operator,
whereas t is not. Furthermore, the union of tuples t1 and t2 is called "join" defined to return a tuple whose
heading is possibly different from that of t1,
that of t2, or both; the union of
relations r1 and r2 is called "union" (not "join") and is
defined to return a relation whose heading is the same as that of both r1 and r2.
4.6 Summary
This section illustrated that tuples and relations as defined by The
Third Manifesto are appearances of one and the same thing. This implies that
from a logical point of view, only one of the two concepts is a necessity.
Noticing that user defined types in principle, allow types of arbitrary
complexity supporting a diverse set of operators, domains, HD: No, domains are not operators. Or is this some kind of typographical error?
as defined by the Third Manifesto would seem to be the most general of the
types supported by The Third Manifesto. Domains have been equated to object
classes by The Third Manifesto, it is interesting to contemplate the logical
implications these of findings. HD: We do not
understand the point being made by this final sentence. As the section is headed
"Summary", it should refer to something written in the preceding
subsections of Section 4. If instead it
is referring to a candidate area for subsequent investigation, it should say
so. In any case, we would say that much
of the material in the book is the result of our contemplation of the logical
consequences in question. We should
add, though, that "equate", if meant literally, is too strong. The classes of Java and C++ are
OO counterparts of our types but are not logically equivalent to them.
With regard to the subject matter of this article the following
conclusions are drawn:
·
The
Third Manifesto, has provided no logically valid substantiation for the claim
that the alleged first great blunder is
indeed a blunder. HD: We would agree with this claim if it can
be shown that our substantiation includes a contradiction. We have not be shown anything written in The
Third Manifesto that we would accept as a contradiction. If somebody who properly understands our
substantiation wishes to claim that it is weak, we would merely disagree with
that person on that particular point and there is no much more to be said. If somebody who seems not to properly
understand our substantiation claims it to be weak, we would try to point out the
apparent misunderstandings and ask that person to reconsider.
·
The
Third Manifesto, has provided no logically valid substantiation for the claim
that the alleged second great blunder is
indeed a blunder. HD: Our response to the first bullet can stand for this one too.
·
From the
perspective of the relational algebra presented in The Third Manifesto, the requirement that each relation variable
must have at least one candidate key, is an arbitrary one. HD: First, that each relvar has at least one candidate key
is not a requirement; it is an observed property of every relation, and
therefore of every relation that might ever be the value of a relvar (which is
what we mean by the candidate key of a relvar). Anything that does not have a candidate key is not a relvar! Second, as already noted, candidate keys
have nothing to do with relational algebra.
Third, this conclusion appears to come right out of the blue—it does not
seem to follow from anything that has been written elsewhere in the paper.
·
In the
relational algebra of the Third Manifesto identity is rejected by Date and
Darwen, while it is reified, as a requirement,
in the form of candidate keys in a context foreign to this relational algebra.
This fact makes the rejection of identity in relational algebra, an arbitrary
one. HD: Candidate keys certainly do not represent
a reification of identity. A candidate
key of a relation is an observed property of that relation (like parity being
an observed property of an integer) and a candidate key of a relvar is a
constraint, restricting the values that might be assigned to that relvar.
·
Domains,
which have been equated to object classes by The Third Manifesto have been
established to represent a more general class of types than relation
types. HD: We call them types, not domains. The set of all types is obviously a proper
superset of the set of all relation types; is it really worth mentioning as a
conclusion? If so, why? (It is unfortunate that Gittens keeps on
using the term "domains" for what The Third Manifesto calls
"types". In Section 3, he
uses the terms "domain" and "co-domain" in their usual
mathematical sense, which is not exactly the sense intended by E.F. Codd when
he used the term in his Relational Model of Data.)
·
By
rejecting identity in the algebra of future database systems The Third
Manifesto also limits the logical expressiveness of future databases. HD: We disagree without further comment, for reasons already
given.
References
[1] C.J.
Date, Hugh Darwen [2000] Foundation for Future Database Systems: The Third
Manifesto, Addison-Wesley
Publishing Company.
[2] J.
van Eijck, E. Thijsse, [1989], Logica voor alfa's en informatici,Academic
Service
[3] Hugh
Darwen [2002] "Gittens000.pdf", a personal communication
[4] Maurice Gittens [2002]An anatomy of
knowledge representation and a theory of meaning, A document available at
http://www.gittens.nl
[1] This paper will refer to this book as "The Third Manifesto".
[2] I invite the reader to verify for herself or himself, that The Third Manifesto provides no logically valid arguments justify the labeling the presented propositions as "blunders".
[3] A logically valid premise is one that is substantiated in terms of mathematical logic.
[4] or layering violations, if you prefer.
[5] Also much of the argumentation which supports this claim is in this section of the book
[6] as in mathematically equivalent abstractions
[7] I will gracefully, acknowledge being in error if such arguments do exist.
[8] on page 14 of the second edition of the Third Manifesto
[9] There seems to be some confusion that object IDs, references and pointers represent one and the same thing. It should for example, be recognized that identity is a property of every element of a mathematical set. Confusing the identity of an object and the notion of a pointer is a logical error
[10] The reference to logical value of an argument refers to the degree in which an argument can be used to draw logically valid conclusions
[11] Similarly, symbols in mathematical strings also have identity. In the string ltaaalt there are three instances of the symbol a each with their own identity. The fact that the symbols are all the same, is not relevant to their identity in this string.
[12] Of course, based on the identity of objects, pointers can distinguish between them. But this does not equate pointers to identity.
[13] In this regard a formalism f1 is said to be more expressive than a formalism f2 when the set of statements that can be represented using f1 is a super set of the statements that can be represented using f2.
[14] In the context of databases I would suggest this operator be used for quantifying objects which are elements of the domain of predicate logical interpretation functions.
[15] and thus appealing to the goodwill of the reader
[16] The following section will elaborate, so that the distinction becomes clear
[17] The same is true for higher order predicate logic
[18] A superset of higher order logical called extensional type logic is based on allowing predicate constants in the domain of discourse.
[19] This is to say that in my opinion future databases such be _rmly based on extensional type or intensional type logic. At least by supplying the necessary primitives that allow extensional and intensional phenomena to be captured.
[20] For some reason unknown to the author, Date and Darwen equate identity to pointers.
[21] Otherwise, many could claim that The Third
Manifesto judges Object Oriented Systems by different
standards than relational ones
[22] See quote in section 2.2
[23] Also, please see the next section for an illustration that, given the liberties provided by The Third Manifesto, tuple values and relation values are appearances of one and the same thing
[24] No, the catalog of commercial relational databases does not get it right. Have you ever noticed that, given the operators of relational algebra, it is impossible to perform a trivial operation like selecting every tuple in a relational database?
[25] under the definition of relation values dictated by The Third Manifesto
[26] under the definition of tuple values dictated by The Third Manifesto