DOCX

# datamodel_to_tables_English_2008_2009

By Mario Payne,2014-01-19 23:03
9 views 0
datamodel_to_tables_English_2008_2009

From data model to tables

1 Relations

In the introductory chapter we also used the name relation instead of table. To be a relation, some extra conditions must hold.

rows contain data about an entity

columns contain data about attributes of entities

all entries in a column are of the same kind

each column has a unique name

cells of the table hold a single value

the order of the columns is unimportant

the order of the rows is unimportant

no two rows may be identical

2 Functional dependency

Consider s=f(t), z=g(x,y), total=count*price.

s depends on t, z depends on x and y, total depends on count and price. We say s is functionally dependent on t

z is functionally dependent on x and y

total is functionally dependent on count and price

and we write

t;s

(x,y);z

(count,price);total

This notation stands for the functional dependency and not the formula. What‟s to the left of the arrow is called a determinant. A determinant can be composite or not.

A functional dependency exists when the value of one or more attributes determines the value of another attribute (or other attributes).

Not every functional dependency can be expressed by an equation. In fact this holds true in most cases because if an attribute can be simply computed, it can be simply omitted. E.g. consider tables and chairs where tables always are white or black and chairs red or green. Based on the color we then know if we deal with a table or a chair. Thus

color;kind (table or chair)

As another example consider (for books)

ISBN;(Title, Author)

3 Keys

A key is a combination of one or more columns to identify particular rows from a table. If a key contains more than one column, it is called a composite key.

A candidate key is a determinant that determines all other columns.

Candidate keys identify precisely one row of a relation (table). Given the value of a candidate key, we can find one and only one row in the relation that has this value.

From data model to tables 1

Consider the table WHO_TEACHES from the introductory chapter.

WHO_TEACHES

SubjectCode GroupCode Professor

ALG 1BACA Hellings

ALG 1BACB Verraedt

ALG 1BACE Vanbeveren

ANA 1BACA Van Huyck

ANA 1BACB Van Huyck

ANA 1BACE Vanbeveren

A given value for a SubjectCode identifies a number of rows. A SubjectCode equal to ALG identifies the first three rows and SubjectCode ANA identifies the last three rows. (SubjectCode, GoupCode) is a candidate key. Given a value for SubjectCode (e.g. ALG) and a value for GroupCode (e.g. 1BACE) there is only one row containing those values for the columns given namely ALG, 1BACE, Vanbeveren.

During the design of the database one must chose one candidate key as primary key. The DBMS

will use this primary key. Each table has only one primary key (composite or not). Because no row of our database contains identical rows, each table has at least one candidate key namely the combination of all columns.

A surrogate key is an artificial column that is added to a table to serve as the primary key. This key has no meaning to the users and can be added automatically by the DBMS when the row is created. In case the primary key contains many columns, a surrogate key is recommended. A foreign key is a column or combination of columns that is the primary key of a table other than the one in which it appears. It represents a relation.

Figure 1 foreign key

Consider the situation of Figure 1. sporting club is the primary key of the second table. sportclub s added as foreign key to the first table. i

In most cases the values of the foreign key must correspond to an existing value of the primary key. This is a referential integrity constraint.

From data model to tables 2

4 From Entities and Relations to Tables

4.1 Entities

Each entity becomes a table and an identifier of an entity is chosen as primary key of the table.

Composite attributes of an entity must be split in several attributes. One can also transform a

composite attribute to an entity if this attribute plays a role on its own. Consider the entity EMPLOYEE.

Figure 2 entity EMPLOYEE

We then get the following table.

EMPLOYEE(EmployeeNr, Name, FirstName, Function, MonthSalary, BirthDate) After the name of the table follow inside parenthesis the column names and the primary key is

underlined.

Consider the entity DEPARTMENT.

Figure 3 entity DEPARTMENT

This becomes a table

DEPARTMENT(DepartmentNr, Name, Location)

The relations from an ERD must also represented. We will add extra columns to the tables and/or

extra tables.

4.2 One to one relations

The primary key of one table is stored in another table as a foreign key. Consider the example EMPLOYEE is manager of/has as manager DEPARTMENT.

Figure 4 the relation "is manager of/has as manager"

Two possibilities arise.

The primary key of the table DEPARTMENT (thus DepartmentNr) can be stored as foreign

key of the table EMPLOYEE. In this case the table becomes

EMPLOYEE(EmployeeNr, Name, FirstName, Function, MonthSalary, BirthDate, HeadOf)

From data model to tables 3

where the column HeadOf contains the DepartmentNr of the department of which he/she is

the head. We underlined the foreign key with a dotted line.

This choice is not that good because most of the employees are not the head of a department

and the column hence will contain a NULL value.

The primary key of the table EMPLOYEE (thus EmployeeNr) can be stored as a foreign key

of the DEPARTMENT table. In this case the table becomes

the EmployeeNr of the head of the department.

An attribute of the relation gives rise to an extra column.

4.3 One to many relation: the general case

The primary key on the one side (the parent) is stored as foreign key of the table on the many side (the child).

Consider the example EMPLOYEE works in/has as employee DEPARTMENT

Figure 5 the relation "works in/has as employee"

DEPARTMENT is the parent and EMPLOYEE is the child. We store the primary key of DEPARTMENT (thus DepartmentNr) as foreign key of EMPLOYEE.

We then get

EMPLOYEE(EmployeeNr, Name, FirstName, Function, MonthSalary, BirthDate, DepartmentNr)

Then consider EMPLOYEE is manager of/has as manager EMPLOYEE. The same table is parent and child. The primary key of the parent (thus EmployeeNr) is stored once again in the table as foreign key. We must choose a good column name. We then get.

EMPLOYEE(EmployeeNr, Name, FirstName, Function, Manager, MonthSalary, BirthDate,

DepartmentNr)

EmployeeNr and Manager both contain the number of an employee. EmployeeNr plays the role of primary key and Manager the role of foreign key (tells who is the manager of the employee given).

4.4 One to many relation: ID-dependent entity as a special case.

We consider the other one-to-many relations from the IEE-example. We have EMPLOYEE teaches/is taught by IMPLEMENTATION and IMPLEMENTATION is implementation of/has as implementation COURSE.

From data model to tables 4

Figure 6 two one-to-many relations

There are no problems handling the first relation. We just follow the general rule. The primary key of the parent (EmployeeNr from EMPLOYEE) is stored as a foreign key with name Prof to the child (IMPLEMENTATION).

The second relation is a special one. The entity IMPLEMENTATION namely is a ID-dependent that cannot exist without the existence of the entity COURSE. In this case the primary key of the parent (thus Code from COURSE, we will use the name Course) becomes part of the primary key of the child.

Hence we get

IMPLEMENTATION(Course, StartDate, Location, Prof).

4.5 Many to many relation

In the case of a many-to-many relation a new table is constructed and the primary keys of both tables involved are part of the primary key of the new table. The attributes of the relation become new columns of this table.

We consider the relation EMPLOYEE enrolls for/has as participator IMPLEMENTATION. From the above we already have the following tables.

EMPLOYEE(EmployeeNr, Name, FirstName, Function, Manager, MonthSalary, BirthDate,

DepartmentNr)

IMPLEMENTATION(Course, StartDate, Location, Prof)

The relation also has an attribute Evaluation.

We construct a new table ENROLMENT

with key EmployeeNr (we will use CourseMember as name), Course, StartDate ENROLMENT(CourseMember, Course, Evaluation, StartDate).

As a second example we take the library example and the relation READER borrows/is borrowed by BOOK.

Based on the tables

BOOK(BookNumber, Title, Author, ISBN)

we construct the new table BORROWING. We let ReaderNumber and BookNumber be part of the primary key. As a primary key we then can take e.g. the combination (ReaderNumber, BookNumber, BorrowDate). (ReaderNumber, BookNumber) is not sufficient because the same reader can borrow the book more than once. We then get

Another possibility is to use a different ERD. In this ERD we add an extra BORROWING entity with a unique BorrowNumber, BorrowDate and ReturnDate. Then there will be a one to many relation between READER and BORROWING and a many to many relation between From data model to tables 5

BORROWING and BOOK. The attributes which previously belonged to the relation, disappear; they now are attributes of the new entity. The conversion of the many to many relation gives rise to a new table with as primary key the combination of BorrowNumber and BookNumber. 5 Normalization

5.1 Anomalies

It is important to construct a good data model; this means a simple data model without redundancy. Simple means no multivalued attributes and attributes that logically belong to an entity.

A good ER-diagram mostly gives rise to normalized tables. If not, tables need to be normalized. Normalization is a process that converts tables to other tables without anomalies. For an existing database the following holds: the higher the normal form (see below) of the tables, the better the database (less problems).

Consider the table PROF_SUB_POINT(Prof, Subject, Points)

PROF_SUB_POINT

Prof Subject Point

Hellings Algebra 4

Van Huyck Analysis 6

Verraedt Comp & Data 4

Vanbeveren Analysis 6

Verraedt Algebra 4

If we delete the row “Verraedt, Comp & Data, 4”, we lose not only the fact that Verraedt teaches the subject Comp & Data but also the fact that the subject Comp & Data has a weight of 4 points. This is called a deletion anomaly: deleting data of one entity (the fact that Verraedt teaches

Comp & Data) means that unintentionally we also delete data of another entity (the fact that Comp & Data counts for 4 points).

If we want to add that Chemistry has a weight of 8, this is only possible if we also have a prof who teaches Chemistry.

Data concerning a particular entity can be added only if we also have information about another entity. This is called an insertion anomaly.

In the table above some data are recorded more than once. This is the case for the points of a subject if there are more than one professor who teaches this subject.

We can solve the problem by splitting the table into 2 tables PROF_SUB(Prof, Subject) and

SUB_POINT(Subject, Point).

From data model to tables 6

PROF_SUB

Prof Subject

Hellings Algebra

Van Huyck Analysis

Verraedt Comp & Data

Vanbeveren Analysis

Verraedt Algebra

SUB_POINT

Subject Point

Algebra 4

Analysis 6

Comp & Data 4

For the two tables some referential integrity constraints can be added.

e.g. each subject from the column Subject of SUBJECT_POINT must exist in the column

Subject of PROF_SUBJECT (each “existent” subject must be taught)

e.g. each column Subject of PROF_SUBJECT must exist in the column Subject of

SUB_POINT (each subject that is taught must have a weight)

both conditions must hold

The real problem arises from the fact that the original table contains data of two different subjects (entities). A normalized table only can have one subject.

Anomalies can be avoided by normalizing the tables. Several normal forms exist: the first, second, third, Boy-Codd, fourth and fifth normal form where each next one implies the previous one.

5.2 First and second normal form

A table is in first normal form if the table represents a relation. Hence all tables considered are in first normal form.

The previous example contains a dependency where only part of the key is needed. (Prof, Subject) is the key and Subject ;Point. Subject is a determinant and only a part of the key or Point is partly dependent on the key.

A table is in the second normal form if it is in the first normal form and all attributes not included in the key are dependent on the whole key.

The table can be transformed to tables in second normal form by splitting it as explained above. If the key is simple, the table automatically is in second normal form. A table only can be not in second normal form is it has a composite key.

From data model to tables 7

5.3 Third normal form

Consider the following table.

EMP_DEP_LOC

EmployeeNr Department Location

7499 sales Kontich

7521 sales Kontich

7654 sales Kontich

7782 main Antwerpen

7788 education Leuven

We have EmployeeNr;Department and Department;Location because each employee works

in one department and each department has only one location. Thus EmployeeNr;Location is a transitive dependency.

Also in this case deletion and insertion anomalies occur. E.g. if we delete the data of employee

nr. 7782, we also lose the fact that the main office is located in Antwerp. And how do we record

that the department human resources is located in Heverlee?

We can delete the transitive dependency by splitting the table in two EMP_DEP(EmployeeNr, Department) and DEP_LOC(Department, Location).

EMP_DEP

EmployeeNr Department

7499 sales

7521 sales

7654 sales

7782 main

7788 human resources

DEP_LOC

Department Location

sales Kontich

main Antwerpen

human resources Leuven

A table is in the third normal form if it is in the second normal form and it contains no

transitive dependencies.

From data model to tables 8

5.4 Boyce-Codd normal form

Consider the table GROUP_SUBJECT_PROF Group Subject Prof

1 mathematics Cauchy

1 physics Maxwell

2 chemistry Curie

2 mathematics Euler

3 physics Einstein

3 chemistry Curie

3 mathematics Cauchy

Each group studies more subjects, each subject is taught by more than one prof but every prof

teaches only one subject.

Group determines neither Subject nor Prof, thus Goup is no candidate key. (Group, Subject) and

(Group, Prof) both are candidate keys and Prof;Subject.

The table is in the third normal form. Once again anomalies occur. E.g. if we eliminate group 3, we lose the fact that Einstein teaches

physics.

We say that the table is in Boyce-Codd normal form (BCNF) if the table is in the third

normal form and each determinant is a candidate key. We can transform the above table to BCNF by splitting it in two.GROUP_PROF(Group, Prof) and PROF_SUBJECT(Prof, Subject).

GROUP_ PROF

Group Prof

1 Cauchy

1 Maxwell

2 Curie

2 Euler

3 Einstein

3 Curie

3 Cauchy

From data model to tables 9

PROF_SUBJECT

Prof Subject

Cauchy mathematics

Maxwell physics

Curie chemistry

Euler mathematics

Einstein physics

5.5 More normal forms

As already mentioned, a fourth and a fifth normal form exist. It hereby concerns multivalued dependencies (a generalization of functional dependency) and generalizations of multivalued dependencies. We will not discuss this topic in this course.

1We end by mentioning the result of Fagin who defines the domain/key normal form. A table is

in domain/key normal form if every constraint is a logical consequence of candidate keys and domains (domain = set of possible values of an attribute). Fagin proves that the domain/key normal form implies all previous normal forms and excludes anomalies.

1 Fagin R. „A Normal Form for Relational Databases that is based on Domains and Keys.‟ Transactions on Database Systems 6 (September 1981).

From data model to tables 10

Report this document

For any questions or suggestions please email
cust-service@docsford.com