Comp 521 Fall 2015 Final Exam

Don't Panic!

You may use only your book (print or pdf), your notes, homework, anything else you have downloaded to your computer and the IPython command prompt.

You may not access the web, instant messaging, phones, or spirit guides. You will pledge on your honor that you have neither given nor received unauthorized help.

You must submit your exam before you leave the room.

Enter your onyen below.

Honor Pledge

I certify that no unauthorized assistance has been received or given in the completion of this work.

Enter your full name as the value of the Pledge variable below.

Part 1

The first part of the exam will consist of multiple choice questions. You'll answer each question by assigning a single character string to the variable just below the question. For example:

Q0_1 How many letters are in SQL?

A) 1
B) 3
C) 2
D) 42

A0_1 = 'B'

Your answer must be one of: 'A' 'B' 'C' 'D' or 'E'. No other answer is acceptable. Each question has a single best answer.

Q1_1. Of the following, which best characterizes the objective of database table normalization?

A) Evaluating queries more efficiently
B) Efficient testing of functional dependencies
C) Creating more independent relations
D) Minimizing redundancy

Q1_2. What is the primary technique used to normalize tables?

A) Vertical decomposition
B) Horizontal decomposition
C) Table Merging
D) SQL Assertions

Q1_3. How does concurrency improve database performance?

A) By making transactions durable
B) By making transactions atomic
C) By isolating transactions
D) By allowing aborts of unnecessary transactions
E) By scheduling CPU and disk resources for maximum utilization

Q1_4. Suppose that the Strict 2PL scheme is modified to use only exclusive locks, which are requested before both reads and writes. What is the result of this change?

A) Concurrent transactions always deadlock
B) The system functions correctly, but with lower performance
C) The resulting schedules are not serialized
D) It behaves identically to Strict 2PL

Q1_5. When a transaction is aborted, which of the following may be performed by the DBMS?

A) Changes to pages are undone
B) Its locks are freed
C) It may be rescheduled
D) Other transactions may be aborted
E) All of the above

Q1_6. Consider the following valid instance of a relation:

sid    name    userid        age  gpa
50000  Dave    dave@cs        19  3.3
53666  Jones   jones@cs       18  3.4
53688  Smith   smith@ee       18  3.2
53650  Smith   smith@math     19  3.8
53831  Madayan mada@music     19  1.8
53832  Guldu   guldu@music    22  2.0

Which of the following can you infer is not a candidate key based on this instance?

A) sid
B) name
C) userid
D) <name, age>

Q1_7. Which of the following database properties states that, once committed, data should persist in the database, even if the system crashes?

A) Atomicity
B) Buffer Writes
C) Consistency
D) Durability
E) Equality

Q1_8. Of the following disk-transfer timing characteristics, which is likely to be greatest when reading a random block?

A) Seek time
B) Rotational latency
C) Data-transfer time
D) Parity checking

Q1_9. Which of the following database enhancements created the possibility of deadlock?

A) Error recovery
B) Locks
C) Aborting transactions
D) Committing transactions
E) All of the above

Q1_10. Which of the following are not consequences of database redundancy?

A) Redundant storage
B) Update anomalies
C) Insertion anomalies
D) Deletion anomalies
E) Lossless-Join Decompositions

Q1_11. If relation R1 has degree D1 and cardinatily C1 and relation R2 had degree D2 and cardinality C2, what is the degree DJ and cardinatily CJ of their natural join (that is joining on equality between fields with the same name)?

A) max(D1, D2) <= DJ <= D1+D2, 0 <= CJ <= C1*C2
B) min(D1, D2) <= DJ <= D1*D2, min(C1,C2) < CJ < max(C1,C2)
C) D1+D2 < DJ <= D1*D2, CJ = C1+C2
D) D1 < DJ < D2, C1 < CJ < C2

Q1_12. What can you say about the cardinality Co and degree Do of the output of a relational algebra sigma operator when applied to a relation with cardinality Ci and degree Di?

A) Co > Ci, Do < Di
B) Co <= Ci, Do = Di
C) Co >= Ci, Do > Di
D) Co < Ci, Do = Di
E) Co < Ci, Do < Di

Part 2

For this part of the exam you are given a database final.db with records resulting from analysis of one day of the server logs for our Tar Heel Reader web site. The database has several tables as shown below:

create table Reads ( -- a record for each book that was read

    -- these next 6 fields are when the book was read
    -- broken into parts for easy access
    year integer,    -- 2015
    month integer,   -- (1-12)
    day integer,     -- (1-31)
    hour integer,    -- (0-23)
    minute integer,  -- (0-59)
    second integer,  -- (0-59)

    ip integer,      -- ip address of the computer
    slug text,       -- key to Books table

    primary key (time, ip),
    foreign key (slug) references Books,
    foreign key (ip) references Locations)

create table Books (       -- real information about each book
    slug text primary key, -- unique key for each book
    title text,            -- book title
    author integer,        -- key to Authors table
    reviewed integer,      -- 1 if the book is reviewed, 0 otherwise
    pagecount integer,     -- number of pages in the book
    foreign key (author) references Authors)

create table Authors ( -- fake information about each author
    id integer primary key,
    name text)         -- author name (these may not be unique)

create table Locations (    -- real information about geolocation
    ip integer primary key, -- ip address of the computer
    country text)           -- country eg 'United States'


For simplicity assume that the ip address identifies a unique user. Of course it doesn't for a variety of reasons but we'll simplify for the exam.

For each question you must fill in the SQL query string that will provide the requested result. Simply write your query between the provided quotes, for example:

A2_0 = '''
select * from Authors
'''

The following cell will print a few lines from each table so you can see the format of the fields.

Q2_1. How many different books were read from Australia?

Q2_2. How many different authors wrote books that are reviewed?

Q2_3. How many different countries read books?

Q2_4. What were the top three countries by number of books read? List the name of the country and the number of books they read sorted in descending order by count then ascending order by country name. Don't worry about countries possibly having the same number of books read.

Q2_5. How many books were read per hour for each hour in the day? List the hour (0-23) and the number of books read in order by hour.

Q2_6. What was the average number of pages of all the books read in each country that read at least 100 books? List the country name and the average number of pages in the books they read. I'm asking for all books they read, not different books.

Q2_7. How many times were books read that have the three-letter sequence 'dog' anywhere in the title? List the title of the book and the number of times it was read. Sort by the title and then the number of times read in descending order.

Q2_8. How many ip addresses outside of the United States read at least five different books?

Q2_9. Which authors wrote books that were read in 10 or more countries. List the author name and the number of different countries in which their books were read. Sort by number of countries in descending order and then by author name in ascending order.

Done!

Now, to be really certain you've got it all right, you should:

  1. save (File -> Save and Checkpoint),
  2. check your work by restarting the kernel and rerunning each cell with shift-Enter,
  3. save your work,
  4. submit your notebook,
  5. verify that it was successfully submitted in the box below, and
  6. check that you got the confirmation email.

I hope you have a great break!