Relational Tables and Databases were invented in the 1970s. They have a poweful and well-founded mathematical Basis and a high Maturity. They are used in every Sector of Information Technologie, from Mainframe down to Mobile Phone.
The Basics are easy to understand. Mastering them yields a better Understanding for any Software-System.
In Spoc-Web this Theory is seamlessly connected to the new Ideas of the semantic Web. This Combination of proven and novel Technologies yield a System, that is both extremely flexible and highly performant.
Tables are 2-dimensional Data Stores, consisting of Columns and Rows.
Every Row represents an Object/ Thing with their own Identity.
The Columns each contain Data / Values of the same Type with same Meaning.
Allowed Data-Types are:
Especially these References allow for the Flexibility and Power of the relational Modell and contribute to its great Success.
Rows and Columns form the so-called "Cells" at their Crossings. The Cell Content should be very "atomic", for both the User and the Database can process the "Inner" of Cells only with higher Effort. This means for Example, that You should create separate Columns/Cells for first and last Name, if You want to use them separately.
ID | Column 2 | ... | Column N |
---|---|---|---|
1 | Row 1 | ... | ... |
... | ... | ||
m | Row m | ... | ... |
ID |
FirstName |
LastName |
Address |
City |
State |
ZIP |
---|---|---|---|---|---|---|
3 |
John |
Doe |
569 Peach St. |
Martin |
TN |
38237 |
6 |
Jane |
Doe |
569 Peach St. |
Martin |
TN |
38237 |
8 |
Jack |
Doe |
569 Peach St. |
Martin |
TN |
38237 |
9 |
Jill |
Doe |
569 Peach St. |
Martin |
TN | 38237 |
When describing Objects we explained, how important a unique Name is. The same is true for Tables: the unique Name is called "Key" there. Tables usually have at least two disjoint Keys: a technical ID (usually an integer Number counting up) and a unique Combination of Domain Columns.
Every Row should have a single, unique Name, that allows to distinguish if from all other Rows. A specific Column is created for this Purpose: the so-called "ID" or "Identity" Column. Usually an
integer Number is created for this Purpose and counted up for each new Row. It is paramount, that this ID is NEVER changed. It must also NOT be re-used for other Rows, should the
original Row be deleted. This is why a Number is so well suited, because there should be no Need to whimsily change a Number. Usually these Numbers are not published either, so there is no desire
to change it too.
When these Numbers are published though, like Account Numbers, these are usually started with at least six digits, so that Customers don't ask to change them.
Apart from the technical Key there is at least one unique "natural Key" consisting of one or more Columns that allow to uniquely identify a Row. Because the technical Key is usually not published, the natural Key is needed to distinguish the Rows without resorting to the ID, using only publicly available Properties. With People a common Combination is: First and Last Name together with Birth Date and Place.
The Choice of Columns fundamentally depends on the Business Domain, especially on which Information is actually available.
It shows that the Correspondence between relational Columns to the Classes described here is complete and NOT accidental. The following Table compares the structural Elements of relational Tables and the semantic Conzepts of Individuals and Collections.
Table | Row | Cell | Column | Reference | Reference-Column | Key | ID |
Semantic | Thing | Property | Predicate | Connection | Relation | Name | ID |
Relational Databases always use several Tables to store their Data. This is different from simple, tabular Lists and Spreadsheet Applications that use only a single Table. References within these Tables correspond to Connections between the Row-Objects. These References allow to avoid Duplication of Data, reducing the Load, but even more important: modelling Reality better, because real- World Objects are never duplicated.
In the Example above not only the Adresses is repeated, but also the Family Name.
Introducing a new Table called "Family" eliminates repeating Data and reflects the Fact that Families usually live together at the same Address:
Normalized-Table:
ID |
FirstName |
FamilyId |
---|---|---|
3 |
John |
4 |
6 |
Jane |
4 |
8 |
Jack |
4 |
9 |
Jill |
4 |
The Table to the left repeats the one from avove, but in a normalized Form: Family and Address are given by the Reference to Family#4 using the FamilyId Column.
The Data are uuniquely stored there, next to those from other Families.
Associated Family-Tabelle:
ID |
LastName |
Address |
City |
State |
ZIP |
---|---|---|---|---|---|
... |
... |
... |
... |
... |
... |
4 |
Doe |
569 Peach St. |
Martin |
TN |
38237 |
... | ... | ... | ... | ... | ... |
As You can see, Address and Family Name are repeated only once. This has several Advantages:
It may seem as if this are Problems of human Operations, that should not appear with Software, but exactly this happens, when the Database-Structures don't reflect the real Situation (any more). The Liste of Examples where Software-Users are forced to repeat or re-enter Data, because of wrong or outdated Structures, is endless.
As shown in the Chapter about Normalization, a Relation from Table A to Table B is created ba adding a Column to A (e.g. "B_ID"), that uses ("references") the unique Values of Column B.ID. Since the unique (1) Values from B.ID can be used in in any Number (N) of Rows of Table A, we have a N (many A) to 1(one B) Relation from A to B.
Vice versa we also have the so called "inverse" 1:N Relation from Table B to Table A, which is NOT separately maintained in Spoc-Web. This Change in Direction/Perspective corresponds to the Change between active and passive Voice: Subject and Object switch Positions and Roles, the Verb changes between active and passive Conjugation.
By Default, all Relations can be traced down to 1:N Types. Despite this, the following special Cases should be considered:
Pure N:M Relations are rather rare. Very often during Development more Data appears that need to be stored for the Relationship, like e.g. the Duration of the Participation in a Course. These Values can only be stored in Table P.
As soon as these additional Data also contain Relations to the Link Record, References to Rows of Table P must be enabled. So P receives a techncal ID Column, promoting it to a full Table/Class and giving its Rows unique Identities.
Spalten have a lot of Properties. Apart from the Data Type, especially the Constraints determine the Role of the Column. These Constraints are described in the following Sections.
Relational Databases have a special Value "NULL", which can be set to demonstrate that the Cell is not filled or the Value is unknown. NULL can be used in Columns of any Data Type. NULL is unequal to any other Value including itself, i.e. NULL <> NULL, different from Programming Languages. Comparing with NULL always yields "false", because You cannot tell with unknown Values. Actually this would require ternary Logic, that allows for NULL, in addition to true and false. Instead, SQL offers special Operators: IS NULL, IS NOT NULL and the Function isNull().
Not-Null-Constraints ensure, that there always is a Value entered into this Column. This is usually the Case for Key Columns and for Fields and Relations that are mandatory due to Business Reasons.
Unique Constraints make sure, that no duplicate Values are entered into the specified Column(s).
Uniqueness should be configured for every Key:
According to the SQL Standards, You can define Unique Constraints with Nullable Columns. Rows with NULL-Values are exempt from Uniqueness, because comparing with NULL always yields false (see above). SQL Server (Microsoft) only allows for a single Row with NULL, except You activate SET ANSI_NULL ON. Alternatively You can define a "filtered Unique Constraint" that ignores Rows with NULL Values.
Eindeutigkeit sollte für jeden Schlüssel eingestellt werden, sowohl für den technischen wie auch für alle fachlichen Schlüssel. Gemäß SQL Standard kann man Unique Constraints auch auf Nullable Spalten definieren. SQL Server (Microsoft) erlaubt allerdings nur eine Zeile mit NULL, außer man aktiviert SET ANSI_NULL ON. Alternativ kann man dort auch einen "filtered Unique Constraint" setzen.
Foreign Key Constraints make sure, that References from this Column always exist. Defined on the Foreign Key Column, they only allow for NULL or Values, that exist in the referenced ID-Column.
Vice versa, Foreign Key Constraints also prevent accidental Deletion or ID-Modification of Rows while there are Cells pointing to them. Instead of preventing Deletion or Modification the followind Behaviors can be configured:
Behavior | Result |
CASCADE |
|
SET DEFAULT | resets the referencing Cells to the Default-Values defined on their Columns |
SET NULL | resets the referencing Cells to NULL |
prevents Update and Deletion |
Constraints on Foreign Key Columns are very important: They essentially formen the Cardinalities of the allowed Relations.
The Tabelle to the right shows the different Combinations of Constraints for a Foreign-Key-Column and the emerging Types of Relations.
FK | Unique | Nullable | Relation |
X | X | X | 0/1 : 0/1 |
X | X | O | 1 : 0/1 |
X | O | X | 0/1 : N |
X | O | O | 1 : N |