Key Words: database, database
management system, DBMS, relational database management system, RDBMS,
entities, relations, attributes, queries, structured query language,
SQL, Microsoft Access, forms, reports, rows, tuples, table, columns,
fields, data type, Text
, Number
,
identifier,
inconsistent, table design, table data, primary key, foreign key,
normalize, AutoNumber
, One-To-Many
See Wikipedia (http://www.wikipedia.org/) for a
detailed description of the key words.
You must read Chapter 13 ("Getting to First Base") before starting this Lab. Also skim through Chapter 14 (part about "Queries: Creating Views").
(Suggestion: After the lab session, print this and read it with highlighter in hand, to pick out the terms you want to remember.)
Refer to Chapter 13 in the book and Wikipedia for more detailed
explanations of the database terminology. (A caution: The
use of the word "relationship" by the text in Chaper 14, and by the
software we'll be using (MS Access), can be confusing, as it sounds
like it should have something to do with "relational" in "relational
database". It doesn't, and in fact, the two refer to
contradictory things. We are trying to be precise in our usage
here. Just watch out for "relationship" versus "relational" and
"relation" - "relational" and "relation" are formal terms in math and
relational databases; "relationship" isn't - it mainly has its normal
English meaning.)
A database is a structured collection of information
that allows users to store, retrieve, change, and ask questions about
("perform queries on")
structured information. A relational
database is the most common type of database. The word
"relational" comes from "relational algebra", which in turn comes from
the mathematical term "relation". The relational algebra
describes precisely results of doing queries that just ask questions
about the data, and the changes that are made in the contents of the
database as a result of doing queries that insert, delete, or update
information.
If you remember your introductory math, a "relation" is a set of
"tuples", and a tuple is a list of values, each of which has an
assigned role in the tuple. Tuples are usually written with
parentheses around them, with the values separated by commas. For
instance, in the mathematical relation "greater than", we could express
the fact that 3 is greater than 1 by writing (3,1) and putting that in
a set called Greater, or perhaps more clearly, just writing
Greater(3,1). This is fine so long as we are
clear that the first "slot" in the tuple means "the number on the left
side of the greater than sign" and the second slot is the number on the
right. More complicated relations have more slots.
In a relational database, the information is split up into different types or categories of data, and that each type is described by attributes. For instance, "people" might be one type, and "books" another, and "companies" a third. Some attributes of a person might be age, gender, employer, mother, father, address.
A member of a type - an individual person or book or company, etc.
- is referred to as an entity.
Each entity is described by specific
values of attributes. What binds a collection of entities
together as one type is that they all
have the same generic attributes.
Where this ties up to "relations" in math is that each entity's
attributes form a tuple - those particular values are "related to"
each other because they belong to
one actual entity. The entire collection of entities of a
particular type would be a "relation" in the mathematical sense.
In a relational database, the collection of entities of one type is
called a table - think of
information laid out in rows and columns, where each entity is
represented by one row in the
table, and attributes are lined up in columns.
We put all of one sort of attribute in the same column - this is like
each slot in a tuple having a fixed meaning. Rows may also be
called records, and columns
are also called fields.
Some attributes are just simple values, e.g. age is a number.
But attributes can also be references
to other entities. For instance, the value of a person's
"employer" attribute would be a reference to a "company" entity.
We say that the employer attribute refers to the company table. A
reference can point into the same table - "mother" and "father" are
also people.
In a relational database, an attribute that refers to a row in
another table actually contains some information that identifies that
row. Almost without exception, each table includes a field that
is a unique identifier (usually a number) for each row - this is called
the key (or primary key). The name of
the table and the value of the key together allow finding the
row. So if one table needs to refer to another, it can have a
field for the key of the other table. (An attribute that is a key
belonging to some other table is called a foreign key.)
There is
another, quite different, way of describing the structure of data,
called the entity-relationship (ER)
model. Instead of merely saying that one entity
refers to another, an ER model for the data gives a name to each
reference, that describes the relationship
between one entity and the
other that is implied by the reference. (Here, "relationship" is a formal term, but this is for
ER models, not for relational models...) For instance, for the
person and their address, the relationship could be called "lives
at". ER descriptions of data are often shown as diagrams, with
boxes for entities and diamonds for relationships - we'll see one
below. Here, we might write [person] --- <lives at> ---
[address].
Pulling out the relationship by itself can make some structure in
data more clear than if we tried to go straight to tables and
fields. For instance, think about this relationship:
[consultant] --- <consults for> --- [company]. A consultant
may very well be consulting for more than one company at a time, and
that number is arbitrary, so we wouldn't want to put a field in the
consultant table for the company's key - we'd have to put in enough
fields for the most prolific consultant. So we might think, we'll
put the consultant's id in the company table. Wrong! A
company could have lots of consultants working for it. Instead,
what we can do is have a separate table, just for the consultant /
company relationship! It can hold pairs of a consultant id and a
company id. Now a consultant or a company can have as many
references to the other type as it needs, because they're stored in the
rows of a separate table.
A database management system, or DBMS, is
software that allows users to store, retrieve, change, and ask
questions about ("perform queries
on") information stored in the database.
A relational database management system,
or RDBMS, which handles these tasks for relational databases,
is the most common type of DBMS.
Once a
database is created, you can enter data, modify the data, and view the
data by combining different entities. This is done by sending
queries to the RDBMS. There is a standard language for
writing queries - Structured
Query Language, or SQL,
but it is also
common for an RDBMS to provide a graphical user interface (GUI) that
hides the actual
SQL. (Not all RDBMSs have GUIs, and if they do,
they are unlikely to be much alike.)
Microsoft Access is a RDBMS that lets you manage your database using a GUI. You can also create graphical forms to manipulate your data and create reports to show results from a search/query.
Example: Persons could be stored in a table called
tPersons
(where we're including the "t
" to
help you remember
that it is a table). The table may have the fields Name
(data type: text), BirthYear
(data type: number), and
ShoeSize
(data type: number) that will store a person's
name, birth year, and shoe size, respectively. To uniquely
identify a
person, it is good practice to add another field as a key (normally
data type: number) - we'll call that PersonId
. The
table tAddresses
is
set up in a similiar manner, but with its own fields.
To hold the
reference from a person to their address, tPersons
includes a field, AddressId
, which contains the key for
an address in the table tAddresses
. It isn't the
fact that the name of the AddressId
field is the same in
both tPersons
and tAddresses
that tells the
RDBMS that you intend one to refer to the other - the names of the
fields could be different. Instead, the RDBMS must be told
explicitly that the AddressId
field in tPersons
is intended to refer to the AddressId
field in tAddresses
.
In the Access screen shot that follows, this is shown by the line
between the fields in the two tables' lists of fields.
There are several good reasons to split up the persons and the
addresses into two tables. One is that the number of columns in
tPersons
is greatly reduced and easier to manage.
More important is that (as mentioned above) we might have several
persons living at the same
address. By putting the persons and the addresses in separate
tables,
the address only needs to be stored once in the table
tAddresses
instead of in several places in the table
tPersons
. If the same address was stored in
several places, one could easily enter a slightly different address by
mistake, which would cause the database to become
inconsistent.
Enough chat -- let's get started.
But let's not start blindly - let's think about our data, to figure out how we should split it into tables, and how the tables might need to refer to each other.
First, what entities are there? Well, CDs of course, but also the performers ("artists" in the example below) or groups who recorded the CDs, the songs on them, etc. For classical music, we might have not only performers, but a composer.
lab8
directory on dante
, go
to the
new directory and create a text file called notes.txt
with your favorite text editor (e.g. Notepad, Pico, Emacs). Put
all of
your
comments, hints and reminders to yourself, questions and answers (if
there are any in this lab ;-) in this
file!lab8.mdb
.
Do not forget to copy this file to your lab8
directory on
dante
when you are
done
with the lab!
Become familiar with this window! You will use it often. If you lose it, you can get it back by selecting View>Database Objects>Tables.
You can right click almost anywhere in MS Access to get a popup menu. Sometimes you need to select an item before right clicking.
CDId
with the data type AutoNumber
,
CDName
, ArtistName
and StyleName
all with the data type Text
.
StyleName is for
the style of music, e.g., Jazz, Rock, Classical, etc. If you
like, you
may put a description for each one, but since this table is (currently)
pretty simple, you don't have to. (The description here serves
the
same purpose as comments on variable declarations in JavaScript - it
tells what the field will be used for.)AutoNumber
? We want every CDId to be
unique, and one way to get that is to start off at 1, and every time we
add a CD, we choose the next number for the CDId. All RDBMSs
provide a function that will remember what number we're currently at,
and give us the next one. For Access, making the type AutoNumber
will automatically put in the next number each time we add a new CD.)
CDId
field. Right click and select Primary Key (or click
directly on
the
primary key icon in the main MS Access
window). A
key symbol should appear immediately to the left of the field name CDId
.tDraftCDs
. After the table is saved
you
can close the table design window. Under Tables in your
database dialog window you should now see your tDraftCDs
table. Once a table is created, you can view it, add data to it,
and
modify
it.tDraftCDs
is correctly saved by
right
clicking
on the table and selecting Design View. Do not change
anything. It should look the same as it did before you closed
it. Close the design view again when you are done.tDraftCDs
to enter data by right clicking
on the
table and selecting Open. A window with your table data
will
appear. Each row corresponds to one CD. Notice that you do
not
need to explicitly fill in the CDId field. Since it is an AutoNumber
field, it will be
filled in automatically.
The
other columns represent the name, artist and musical style for the
CD.
The table should currently be empty. Now add the following CDs
(you can replace them with others later...). Just start typing
the CD name in the CDName field, and Access will automatically assign
that row's CDId number, and display a new blank row below the one
you're working on.
Data that is entered into a database table is saved directly after you leave the field. So there is no need (or possibility) to save it manually, like you must do in the design view. However, if you modify the style of the table, such as the width of the columns, Access will ask if you want to save the layout (which you probably want to do).
You have now created a database with information about music
CDs!
There are, however, a few undesirable things to fix before you will
spend your nights entering information about all your CDs:
Normalizing a database means to remove
redundant data, that is, to remove multiple occurrencies of the same
information. Instead of writing the same information several
times in
one table, the table is redesigned to have a relation to another table
in which the information is stored only once. The following
diagram shows what we
ultimately want.
This is an entity-relationship (ER) diagram. The
rectangles are entities, the rounded boxes are their attributes, and
the diamonds are relationships between them. If you'll recall the
discussion of references and relationships above, you'll likely guess
that each relationship will just turn into a reference from one
entity's table to the other.
To relate each CD in a CD table to an artist in an artist table, each CD must have a field with the identifier (aka foreign key) corresponding to a particular artist in the artist table. In the artist table, an artist is identified by its value in the primary key field.
In this section you will create a better designed CD table, move
the artists and styles into their own tables, and set up relationships
between the three tables. All of the data in the old CD table
(tDraftCDs
) will be used. By using some features of
a
RDBMS,
you will not have to type in the data again!
tDraftCDs
, but the artist and musical style should
have
slightly different names and use different data types. It is also
nice to have a comment that explains what the IDs will be used for:
Do not forget to specify the primary key and to save your new
table!
Save the new table as tCDs
.
tArtists
. It should have two fields: ArtistId
(data type AutoNumber
) and ArtistName
(data
type: Text
). The primary key should be ArtistId
.
Do not
forget to save the table!tStyles
. It should be very
similiar to tArtists
,
but use slightly different names for the
fields: StyleId
(data type: AutoNumber
) and
StyleName
(data type Text
). The primary key should be StyleId
.
Do not forget to save the table!tArtists
, tCDs
and tStyles
.
Do not
include the old CD table. To select multiple items in a list, you
can
press and hold the Ctrl
key while you click on the items
you
would like to select. End with clicking on Add. You
should
see the relationships window with three tables and their fields.
Bold
fields indicate primary keys. It is also possible to rearrange
the
tables if you do not like the automaticaly generated order.
Example:
If you forgot a table or picked the wrong table, you can add more tables by right clicking on background and selecting Show Table...
Create one relationship at a time by dragging the primary key
from
one table and dropping it on a foreign key in another table, for
example drag the ArtistId
from the table tArtists
to the ArtistId
in the table tCDs
. The
following relationship
window will appear:
Make sure "Enforce Referential Integrity" is
checked.
This means that the ArtistId
in the tCDs table must refer
to an actual artist in the tArtists
table. Also
note the
Relationship Type at the bottom of the pop-up window: One-To-Many.
Click on Create when you are ready to
add
the relationship.
You have now successfully designed a relational database!
The
next step is to extract the data from the old CD table tDraftCDs
and put it into the three new tables tArtists
, tStyles
and tCDs
! Close the
relationships window
(answer yes on the "save layout" question).
In your old table you had only four CDs. In reality, the number of rows would be many more. It would be a painful experience to retype the information. In this section you will extract and transfer the data without any unnecessary typing!
The key is to use a query to view (or extract) certain columns from a table, and to paste the results into the new tables.
tDraftCDs
,
since that is where all data is located.ArtistName
from
the
table tDraftCDs
. In the first column, select the desired
field.
Save the query and close the query design window (or close the
query design window and save the query upon request). Give this query
the name qArtistNames
. The "q" is a reminder that
it is a
query.
Your query should give you the names of all of the artists in
your
database. When you open the query you should see the rows and
columns
that you requested in the query. A query is just another way of
viewing your table data and is, therefore, sometimes simply called a view.
If you change any information in the query result,
the
original table (tDraftCDs
) will be updated.
These names will be used later in the table tArtists
.
Before you can use them, you need to get rid of unwanted
duplicates.
This means you must modify the query.
Change the settings for Unique Values (a few lines down from the top) from No to Yes. This will give you only the unique values in the table. When you are done, close the property window, close the query design window (save changes to the query) and open your query to see the results. It should now have no duplicate names.
ArtistName
).
All rows should be selected (dark or black colored). Copy by
pressing Ctrl-c
or by using Edit>Copy. Once the rows are copied, you
can close the query.tArtists
table: Open the tArtists
table. The table
should be empty. The first column corresponds to the ArtistId
and will get values automatically. The second column is where
you'll add
all of the rows from the query!
Select the second column by clicking on the column title ArtistName
.
Paste the previously copied rows by
pressing Ctrl-v
or use Edit>Paste.
Answer Yes
to
the question if you want to paste the rows. After a successful
paste,
you can close the tArtists
table (and the query if you
didn't close it before).
StyleName
field
and the tStyles
table! Save the query as qStyleNames
.
The
result should only be two rows, since there are only two styles of
music in the old table (i.e., Jazz and Pop/Rock).tArtists
and tStyles
.
You
need a query
that takes all of the rows in the tDraftCDs
, finds the
row in tArtists
that has a matching ArtistName
,
finds the row
in tStyles
that has a matching StyleName
,
combines these
rows, and gives the fields CDName
, ArtistId
and StyleId
as a result.
Open a new query in design view. Add the tables tDraftCDs
,
tArtists
and tStyles
. Connect the
fields
that should
match. That is, drag ArtistName
in the tArtists
to ArtistName
in tDraftCDs
. Do the
equivalent for StyleName
. Then make sure the query
shows
the three
fields CDName
, ArtistId
and StyleId
in the bottom half
of the query window. Make sure the order is correct. Save
the query
design as qDraftCDs
.
Select all rows and all columns. The easiest way is to
click
and
drag over the column titles. Copy the data with Ctrl-c
or Edit>Copy. Open tCDs
to enter data
and
select the
three columns CDName
, ArtistId
and StyleId
.
Paste in the data with Ctrl-v
or Edit>Paste.
Accept
to insert the four rows. Close tCDs
.
tCDs
, tArtists
and tStyles
tables.
Note that you do not need to specify any relationships or matching
requirements between the tables. They are already set up in the
design. Have the query show the fields CDName
, ArtistName
and StyleName
. Save the query
as qCDs
and open it to see the query result. It should contain the following
four rows:
lab8.mdb
.
This will remove unneeded space in your
database file, which otherwise can become very large!lab8.mdb
to your lab8
directory on dante
.
Select Tools>Database Utilities>Compact and Repair Database... before you close your database, after you have worked with the database for a while. This will decrease the size of your database file. Also close your database file and Access before copying your database file to your dante account, otherwise you may end up with an empty file!
Ensure that all of your files are accessible from a web browser.
|
I understand the key words for this lab and can give examples of what they mean. |
|
I know how to design tables and how to set up relationships between them. |
|
I know how to add data to a table. |
|
I know how to create a query. |
|
I know how to view the results from a query. |