From data model to tables
In the introductory chapter we also used the name relation instead of table. To be a relation, some extra conditions must hold.
； rows contain data about an entity
columns contain data about attributes of entities ；
； all entries in a column are of the same kind
； each column has a unique name
； cells of the table hold a single value
； the order of the columns is unimportant
； the order of the rows is unimportant
no two rows may be identical ；
2 Functional dependency
Consider s=f(t), z=g(x,y), total=count*price.
s depends on t, z depends on x and y, total depends on count and price. We say ； s is functionally dependent on t
； z is functionally dependent on x and y
； total is functionally dependent on count and price
and we write
This notation stands for the functional dependency and not the formula. What‟s to the left of the arrow is called a determinant. A determinant can be composite or not.
A functional dependency exists when the value of one or more attributes determines the value of another attribute (or other attributes).
Not every functional dependency can be expressed by an equation. In fact this holds true in most cases because if an attribute can be simply computed, it can be simply omitted. E.g. consider tables and chairs where tables always are white or black and chairs red or green. Based on the color we then know if we deal with a table or a chair. Thus
； color;kind (table or chair)
As another example consider (for books)
； ISBN;(Title, Author)
A key is a combination of one or more columns to identify particular rows from a table. If a key contains more than one column, it is called a composite key.
A candidate key is a determinant that determines all other columns.
Candidate keys identify precisely one row of a relation (table). Given the value of a candidate key, we can find one and only one row in the relation that has this value.
From data model to tables 1
Consider the table WHO_TEACHES from the introductory chapter.
SubjectCode GroupCode Professor
ALG 1BACA Hellings
ALG 1BACB Verraedt
ALG 1BACE Vanbeveren
ANA 1BACA Van Huyck
ANA 1BACB Van Huyck
ANA 1BACE Vanbeveren
A given value for a SubjectCode identifies a number of rows. A SubjectCode equal to ALG identifies the first three rows and SubjectCode ANA identifies the last three rows. (SubjectCode, GoupCode) is a candidate key. Given a value for SubjectCode (e.g. ALG) and a value for GroupCode (e.g. 1BACE) there is only one row containing those values for the columns given namely ALG, 1BACE, Vanbeveren.
During the design of the database one must chose one candidate key as primary key. The DBMS
will use this primary key. Each table has only one primary key (composite or not). Because no row of our database contains identical rows, each table has at least one candidate key namely the combination of all columns.
A surrogate key is an artificial column that is added to a table to serve as the primary key. This key has no meaning to the users and can be added automatically by the DBMS when the row is created. In case the primary key contains many columns, a surrogate key is recommended. A foreign key is a column or combination of columns that is the primary key of a table other than the one in which it appears. It represents a relation.
Figure 1 foreign key
Consider the situation of Figure 1. sporting club is the primary key of the second table. sportclub s added as foreign key to the first table. i
In most cases the values of the foreign key must correspond to an existing value of the primary key. This is a referential integrity constraint.
From data model to tables 2
4 From Entities and Relations to Tables
Each entity becomes a table and an identifier of an entity is chosen as primary key of the table.
Composite attributes of an entity must be split in several attributes. One can also transform a
composite attribute to an entity if this attribute plays a role on its own. Consider the entity EMPLOYEE.
Figure 2 entity EMPLOYEE
We then get the following table.
EMPLOYEE(EmployeeNr, Name, FirstName, Function, MonthSalary, BirthDate) After the name of the table follow inside parenthesis the column names and the primary key is
Consider the entity DEPARTMENT.
Figure 3 entity DEPARTMENT
This becomes a table
DEPARTMENT(DepartmentNr, Name, Location)
The relations from an ERD must also represented. We will add extra columns to the tables and/or
4.2 One to one relations
The primary key of one table is stored in another table as a foreign key. Consider the example EMPLOYEE is manager of/has as manager DEPARTMENT.
Figure 4 the relation "is manager of/has as manager"
Two possibilities arise.
； The primary key of the table DEPARTMENT (thus DepartmentNr) can be stored as foreign
key of the table EMPLOYEE. In this case the table becomes
EMPLOYEE(EmployeeNr, Name, FirstName, Function, MonthSalary, BirthDate, HeadOf)
From data model to tables 3
where the column HeadOf contains the DepartmentNr of the department of which he/she is
the head. We underlined the foreign key with a dotted line.
This choice is not that good because most of the employees are not the head of a department
and the column hence will contain a NULL value.
； The primary key of the table EMPLOYEE (thus EmployeeNr) can be stored as a foreign key
of the DEPARTMENT table. In this case the table becomes
, Name, Location, Head) where the column Head contains DEPARTMENT(DepartmentNr
the EmployeeNr of the head of the department.
An attribute of the relation gives rise to an extra column.
4.3 One to many relation: the general case
The primary key on the one side (the parent) is stored as foreign key of the table on the many side (the child).
Consider the example EMPLOYEE works in/has as employee DEPARTMENT
Figure 5 the relation "works in/has as employee"
DEPARTMENT is the parent and EMPLOYEE is the child. We store the primary key of DEPARTMENT (thus DepartmentNr) as foreign key of EMPLOYEE.
We then get
EMPLOYEE(EmployeeNr, Name, FirstName, Function, MonthSalary, BirthDate, DepartmentNr)
Then consider EMPLOYEE is manager of/has as manager EMPLOYEE. The same table is parent and child. The primary key of the parent (thus EmployeeNr) is stored once again in the table as foreign key. We must choose a good column name. We then get.
EMPLOYEE(EmployeeNr, Name, FirstName, Function, Manager, MonthSalary, BirthDate,
EmployeeNr and Manager both contain the number of an employee. EmployeeNr plays the role of primary key and Manager the role of foreign key (tells who is the manager of the employee given).
4.4 One to many relation: ID-dependent entity as a special case.
We consider the other one-to-many relations from the IEE-example. We have EMPLOYEE teaches/is taught by IMPLEMENTATION and IMPLEMENTATION is implementation of/has as implementation COURSE.
From data model to tables 4
Figure 6 two one-to-many relations
There are no problems handling the first relation. We just follow the general rule. The primary key of the parent (EmployeeNr from EMPLOYEE) is stored as a foreign key with name Prof to the child (IMPLEMENTATION).
The second relation is a special one. The entity IMPLEMENTATION namely is a ID-dependent that cannot exist without the existence of the entity COURSE. In this case the primary key of the parent (thus Code from COURSE, we will use the name Course) becomes part of the primary key of the child.
Hence we get
IMPLEMENTATION(Course, StartDate, Location, Prof).
4.5 Many to many relation
In the case of a many-to-many relation a new table is constructed and the primary keys of both tables involved are part of the primary key of the new table. The attributes of the relation become new columns of this table.
We consider the relation EMPLOYEE enrolls for/has as participator IMPLEMENTATION. From the above we already have the following tables.
EMPLOYEE(EmployeeNr, Name, FirstName, Function, Manager, MonthSalary, BirthDate,
IMPLEMENTATION(Course, StartDate, Location, Prof)
The relation also has an attribute Evaluation.
We construct a new table ENROLMENT
with key EmployeeNr (we will use CourseMember as name), Course, StartDate ENROLMENT(CourseMember, Course, Evaluation, StartDate).
As a second example we take the library example and the relation READER borrows/is borrowed by BOOK.
Based on the tables
READER(ReaderNumber, ReaderName, ReaderAddress, ReaderCity)
BOOK(BookNumber, Title, Author, ISBN)
we construct the new table BORROWING. We let ReaderNumber and BookNumber be part of the primary key. As a primary key we then can take e.g. the combination (ReaderNumber, BookNumber, BorrowDate). (ReaderNumber, BookNumber) is not sufficient because the same reader can borrow the book more than once. We then get
BORROWING(ReaderNumber, BookNumber, BorrowDate, ReturnDate)
Another possibility is to use a different ERD. In this ERD we add an extra BORROWING entity with a unique BorrowNumber, BorrowDate and ReturnDate. Then there will be a one to many relation between READER and BORROWING and a many to many relation between From data model to tables 5
BORROWING and BOOK. The attributes which previously belonged to the relation, disappear; they now are attributes of the new entity. The conversion of the many to many relation gives rise to a new table with as primary key the combination of BorrowNumber and BookNumber. 5 Normalization
It is important to construct a good data model; this means a simple data model without redundancy. Simple means no multivalued attributes and attributes that logically belong to an entity.
A good ER-diagram mostly gives rise to normalized tables. If not, tables need to be normalized. Normalization is a process that converts tables to other tables without anomalies. For an existing database the following holds: the higher the normal form (see below) of the tables, the better the database (less problems).
Consider the table PROF_SUB_POINT(Prof, Subject, Points)
Prof Subject Point
Hellings Algebra 4
Van Huyck Analysis 6
Verraedt Comp & Data 4
Vanbeveren Analysis 6
Verraedt Algebra 4
If we delete the row “Verraedt, Comp & Data, 4”, we lose not only the fact that Verraedt teaches the subject Comp & Data but also the fact that the subject Comp & Data has a weight of 4 points. This is called a deletion anomaly: deleting data of one entity (the fact that Verraedt teaches
Comp & Data) means that unintentionally we also delete data of another entity (the fact that Comp & Data counts for 4 points).
If we want to add that Chemistry has a weight of 8, this is only possible if we also have a prof who teaches Chemistry.
Data concerning a particular entity can be added only if we also have information about another entity. This is called an insertion anomaly.
In the table above some data are recorded more than once. This is the case for the points of a subject if there are more than one professor who teaches this subject.
We can solve the problem by splitting the table into 2 tables PROF_SUB(Prof, Subject) and
From data model to tables 6
Van Huyck Analysis
Verraedt Comp & Data
Comp & Data 4
For the two tables some referential integrity constraints can be added.
； e.g. each subject from the column Subject of SUBJECT_POINT must exist in the column
Subject of PROF_SUBJECT (each “existent” subject must be taught)
； e.g. each column Subject of PROF_SUBJECT must exist in the column Subject of
SUB_POINT (each subject that is taught must have a weight)
； both conditions must hold
The real problem arises from the fact that the original table contains data of two different subjects (entities). A normalized table only can have one subject.
Anomalies can be avoided by normalizing the tables. Several normal forms exist: the first, second, third, Boy-Codd, fourth and fifth normal form where each next one implies the previous one.
5.2 First and second normal form
A table is in first normal form if the table represents a relation. Hence all tables considered are in first normal form.
The previous example contains a dependency where only part of the key is needed. (Prof, Subject) is the key and Subject ;Point. Subject is a determinant and only a part of the key or Point is partly dependent on the key.
A table is in the second normal form if it is in the first normal form and all attributes not included in the key are dependent on the whole key.
The table can be transformed to tables in second normal form by splitting it as explained above. If the key is simple, the table automatically is in second normal form. A table only can be not in second normal form is it has a composite key.
From data model to tables 7
5.3 Third normal form
Consider the following table.
EmployeeNr Department Location
7499 sales Kontich
7521 sales Kontich
7654 sales Kontich
7782 main Antwerpen
7788 education Leuven
We have EmployeeNr;Department and Department;Location because each employee works
in one department and each department has only one location. Thus EmployeeNr;Location is a transitive dependency.
Also in this case deletion and insertion anomalies occur. E.g. if we delete the data of employee
nr. 7782, we also lose the fact that the main office is located in Antwerp. And how do we record
that the department human resources is located in Heverlee?
We can delete the transitive dependency by splitting the table in two EMP_DEP(EmployeeNr, Department) and DEP_LOC(Department, Location).
7788 human resources
human resources Leuven
A table is in the third normal form if it is in the second normal form and it contains no
From data model to tables 8
5.4 Boyce-Codd normal form
Consider the table GROUP_SUBJECT_PROF Group Subject Prof
1 mathematics Cauchy
1 physics Maxwell
2 chemistry Curie
2 mathematics Euler
3 physics Einstein
3 chemistry Curie
3 mathematics Cauchy
Each group studies more subjects, each subject is taught by more than one prof but every prof
teaches only one subject.
Group determines neither Subject nor Prof, thus Goup is no candidate key. (Group, Subject) and
(Group, Prof) both are candidate keys and Prof;Subject.
The table is in the third normal form. Once again anomalies occur. E.g. if we eliminate group 3, we lose the fact that Einstein teaches
We say that the table is in Boyce-Codd normal form (BCNF) if the table is in the third
normal form and each determinant is a candidate key. We can transform the above table to BCNF by splitting it in two.GROUP_PROF(Group, Prof) and PROF_SUBJECT(Prof, Subject).
From data model to tables 9
5.5 More normal forms
As already mentioned, a fourth and a fifth normal form exist. It hereby concerns multivalued dependencies (a generalization of functional dependency) and generalizations of multivalued dependencies. We will not discuss this topic in this course.
1We end by mentioning the result of Fagin who defines the domain/key normal form. A table is
in domain/key normal form if every constraint is a logical consequence of candidate keys and domains (domain = set of possible values of an attribute). Fagin proves that the domain/key normal form implies all previous normal forms and excludes anomalies.
1 Fagin R. „A Normal Form for Relational Databases that is based on Domains and Keys.‟ Transactions on Database Systems 6 (September 1981).
From data model to tables 10