Assignment 3

Assigned: 20 October 2015
Due: 3 November 2015

Part 1

Suppose you have a roster from a census return with $N$ entries in alphabetical order by first name. Each page has $P$ entries.

1. What is the lower bound $B_l$ and the upper bound $B_u$ on the number of pages that must be examined to find a given first name using a binary search? Assume that having computed a page number, you can open directly to that page.

2. Now suppose a 1-page table of contents is added to the roster. It lists the first "first name" on each of the other pages (Assume no first name has more than 1 page of entries). How many pages that must be examined to look up a given first name?

3. Now suppose that the roster is reorganized such that each page contains a list of names with the same initials. For example, 'Thomas Jam Andrew' falls on the same page as 'Travis Jam Anand' (both have initials 'TJA'). Assume that no pattern takes more than one page, and there are a total of $M$ pages. Finally, assume that given the pattern you can go directly to the page. How many pages must be checked to find a particular name?

4. Using the same organization as question 3, assuming a page still has room for up to $P$ entries, what is the occupancy percentage of an average page in the new roster?

5. Now suppose we add a two-level index to the roster. The one-page primary index lists the first name on each of the secondary index pages. The secondary index pages list the first name on each of the roster pages. Assume no first name has more than 1 page of entries. How many pages must be examined to look up a given first name?

Part 2

Consider a disk with a total disk capacity of $C_{disk}$ TB, a surface capacity of $C_{surf}$ GB, and a sector size of $C_{sect}$ bytes. There are $S_t$ sectors per track. The average seek time is $T_{seek}$ seconds.

I'm using the binary versions of Tera, Mega, etc. $T = 2^{40}$, $G = 2^{30}$, $M = 2^{20}$, etc.

6. What is the capacity of a single track in MB?

7. How many tracks are on each surface?

8. How many platters are in this disk? Each platter has 2 surfaces.

9. How many cylinders are in this disk?

10. If the maximum rotational delay is $T_{rotation}$ what is the spinning speed of the disk in revolutions per minute (RPM)?

11. Suppose that a file containing $N_{file}$ records of 250 bytes each is to be stored on such a disk, and that no record is allowed to span two blocks. Assume a block size of $C_{block}$ bytes. How many blocks are required to store the entire file?

Part 3

Download and unzip actors.db.zip to produce an actors.db file in your working folder. This database has the following schema.

CREATE TABLE Actors (
    aid  INTEGER PRIMARY KEY,
    FName TEXT,
    LName TEXT);
CREATE TABLE Casts (
    aid  INTEGER,
    mid INTEGER,
    role TEXT,
    PRIMARY KEY(aid, mid),
    FOREIGN KEY(aid) REFERENCES Actors
    FOREIGN KEY(mid) REFERENCES Movies);
CREATE TABLE Movies (    mid  INTEGER PRIMARY KEY,
    Title TEXT,
    Year INTEGER);

In this part, you will try to speed up a few queries using indexes.

The basic syntax for creating indexes using SQLite is: CREATE INDEX index_name on table_name(column1, column2,...)

Queries

Here are the queries we'll run. Add your own if you like.

Add indexes

Can you do better adding indexes? I was able to get the speedups shown below by adding two indexes on single columns. Experiment with it. I actually found one index that reduced the speedup (a bad thing) from about 800 to about 1.3. You'll want to avoid that one.

Report your results

Now I want you to report your results. The times for the queries without indexes should be in variable TimeWithout and with indexes in TimeWith. I want you to compute the speedup (TimeWithout[i] / TimeWith[i]) for each query. If you're doing it right, you should be able to achieve speedups of greater than 1 (duh) but they will vary considerably depending on your disk drives, processor speed, memory size, etc.

Write your report by changing the numbers in the second column of the table below.

QuerySpeedup
024.6
13.1
2783

This part will count 40 points but I'm not including in the checker results.