Comp 521 Fall 2015 Midterm 2

Don't Panic!

You may use only your book (print or pdf), your notes, homework, anything else you have downloaded to your computer and your Jupyter notebook.

You may not access the web, instant messaging, phones, or spirit guides. You will pledge on your honor that you have neither given nor received unauthorized help.

You must submit your exam before you leave the room. You may only submit once! Get it right the first time.

Enter your onyen below.

Honor Pledge

I certify that no unauthorized assistance has been received or given in the completion of this work.

Enter your full name as the value of the Pledge variable below.

1. Which of the following statements concerning record ids is false?

  1. They identify the disk address of the page containing the record
  2. They appear frequently in indexes
  3. They are used to determine if the page is saved in a page buffer
  4. Given one, only one I/O is required to read the desired record
  5. They uniquely identify a record

2. What is the point of data striping in RAID configurations?

  1. All of these.
  2. It supports the recovery from a failed drive
  3. It increases the effective disk transfer rate
  4. It provides redundancy
  5. It reduces the mean time between failures.

3. Which of the following is NOT an advantage of an external merge sort over simply using a conventional in-memory quicksort in conjunction with a huge virtual memory space? That is, why not allow the virtual memory system to handle the I/O?

  1. Merge sort is faster than quicksort
  2. We can limit the amount of real memory used with the external sort
  3. We can double-buffer and pre-fetch to overlap work with disk operations in the external sort.
  4. We can do many fewer disk operations using an external sort

4. Which of the following keys is central to a database index?

  1. A search key
  2. A candidate key
  3. A superkey
  4. A primary key
  5. A foreign key

5. Which following advantage does a hash-based index have over a tree-based index, in general?

  1. Immune to “data-skew” effects
  2. Fewer page accesses per lookup
  3. More records per page
  4. Overflow pages
  5. Supports a broader range of queries

6. Why does scanning a file with a clustered B+ tree take about 50% longer than scanning a heap file?

  1. Because the typical tree occupancy is only about 2/3
  2. Because we have to follow the links in the tree for every record
  3. Because we have to compute the hash for each key
  4. Because successive records may be far apart on disk

7. The rough estimate for the time to insert a record into a sorted file is Search + B*D; where B is the number of data pages and D is the average time to read or write a page. Why is B*D in the equation?

  1. We have to copy half the file on average to insert a record in order
  2. We insert the record at the end of the file
  3. We have to copy the entire file do the insert
  4. Searching for the record requires B*D I/O operations.

8. Using an index always makes a query faster.

  1. true
  2. false

9. What does the acronym RAID stand for?

  1. Rolaids Are Icky Disks
  2. Redundant Arrays of Independent Disks
  3. Reduced Area Ideal Data
  4. Random Access Interleaved Disks

10. What advantage does an ISAM tree index have over a B+ tree index?

  1. It better suits dynamic data sets with lots of inserts and deletes
  2. Higher typical utilization
  3. No locking required on the index
  4. None, a B+ Tree is always better

11. We learned that the introduction of NULL values requires a 3-valued logic with values true, false, unknown. What is the value of the expression: (true or (true and unknown))?

  1. true
  2. false
  3. unknown

12. Why is an SQL query that uses the distinct keyword likely to be slower than one that doesn't even though it produces fewer results?

  1. Processing that additional word in the query takes longer
  2. Redundancy is bad for performance
  3. Detecting and eliminating duplicates requires extra computation
  4. Relational algebra does not allow duplicates

13. An index contains a collection of data entries and supports efficient retrieval of all data entires k* with a given key value k. Given a data entry k*, what is the maximum number of disk I/O operations required to fetch the record from a file with N records?

  1. 0
  2. sqrt(N)
  3. N
  4. 2*N
  5. 1

14. Why may we have only one clustered index on a relation?

  1. To do otherwise would violate the third law of thermodynamics.
  2. It is a limit of SQL
  3. We can only store the data in one sorted order without introducing redundancy
  4. It is a limit of relational algebra

Database info

In the setup code below I will connect to a small database that is included in the zip file for this exam and will create a cursor for your use. There is no need for you to create a new connection object or cursor.

The schema of the data base is identical to that in the first midterm.

create table Sailors 
    (id integer primary key, 
     name text, 
     rating integer, 
     age integer)
create table Boats
    (id integer primary key, 
     name text, 
     color text)
create table Reserves 
    (sid integer, 
     bid integer, 
     day date, 
     primary key (sid, day) 
     foreign key (sid) references Sailors(id)
     foreign key (bid) references Boats(id))

15. How many reservations were made by sailors with each different rating?

That is, produce a list of tuples where the first number is the rating and the second number is the number of reservations by sailors with that rating.

The list should be ordered by increasing rating.

You should do this with a single query using the aggregating operators. Only include ratings that actually occur in the database.

16. Which sailors have reserved red boats more than once?

Reserving the same red boat more than once counts. I'm expecting a list of sailor names in alphabetical order. Each name should occur no more than once. You'll need to write a bit of Python code to unpack the result returned by the query but that is all your Python code should do.

17. How many different boats were reserved by sailors with each different rating?

Again, I'm expecting a list of tuples of 2 integers, the first is the rating, the second is the number of different boats reserved by all sailors with that rating.

Sort the list by the number of reservations in decreasing order then by rating.

Again, you should do this with a single query and only include ratings that actually occur.

18. What is the average age of the sailors with ratings higher than 5?

I'm expecting a tuple with one floating point number to be returned by this query.

19. What is the average age of the sailors who made more than 3 reservations?

Again, I'm expecting a tuple with a single floating point number.

Done!

Now, you should:

  1. save (File -> Save and Checkpoint),
  2. restart the kernel (Kernel menu -> Restart),
  3. run all the cells,
  4. check your report,
  5. save again,
  6. and submit your notebook below.