Assignment 5

Assigned: 6 November 2015
Due: 24 November 2015

In this assignment you will design a database and write functions to query it. I have provided data for 999 books from Tar Heel Reader. The books are provided in JSON format which is easy to read in Python.

Now the variable books is an array of dictionaries. A typical book looks like this:

 {u'categories': [u'Animals and Nature'],
 u'created': u'2015-04-02 07:57:35',
 u'email': u'cwhitei0@miitbeian.gov.cn',
 u'first_name': u'Christopher',
 u'language': u'es',
 u'last_name': u'White',
 u'login': u'cwhitei0',
 u'pages': [{u'height': 310,
   u'caption': u'La rosa m\xe1s hermosa',
   u'url': u'/cache/images/48/3329876048_2b480e2243.jpg',
   u'width': 500},
  {u'height': 375,
   u'caption': u'Aunque crezca en un desierto',
   u'url': u'/cache/images/19/5886437819_01932b8f38.jpg',
   u'width': 500},
  {u'height': 500,
   u'caption': u'siempre tiene donde estar',
   u'url': u'/cache/images/17/5886448317_fcbf6c7d4d.jpg',
   u'width': 375}],
 u'title': u'La rosa'}

categories is a list of strings each indicating a library-like classification for the book. The list may be empty. Assume that additional categories may be created in the future.

email, first_name, last_name, and login relate to the author of the book. Of course, one author may write many books and the information must remain consistent. login is unique for every author.

language is the two-letter code for the language of the book.

title is, of course, the title of the book and certainly not unique.

pages is an array. For each page of the book we have several fields. caption is the caption or text for this page of the book. url is the URL for the picture and its dimensions are in width and height. The dimensions of a particular image are always the same. There is significant reuse of pictures across books. The url for a picture sometimes changes and must be changed across all instances.

Your mission

Your mission is to design a database in BCNF, write code to load it from the books array, and code to query it in various ways as described below.

Imagine you are implemeting the database part of the backend for Tar Heel Reader.

1. Design the Database

Create a set of tables with schemas that are in Boyce-Codd Normal Form (the values are determined by the key, the whole key, and nothing but the key) or as close to it as you can reasonably get. Use the same names for attributes as in the JSON format data. Give any new attributes you create reasonably descriptive names.

I'm going to assume below that you can find a given book by a unique integer id that you (or the DB) will assign.

2. Load the data into your database.

You should not need additional datastructures besides the db itself to accomplish this.

Verify your tables

I'm going to include some code below to help you verify the sizes of your tables. I think that if you've factored and loaded them up as expected they should have the same number of entiries as mine.

3. List the categories with the number of books in each for a given language.

A visitor to Tar Heel Reader might want to see how many books in their native language are available in each category.

Fill in the body of the function so that it uses a single query to produce a list of tuples with the category first and the number of books in that category second. Sort by category in ascending order.

For my table design I was able to do this with a single SQL statement.

Always use the ? parameter substitution form to avoid SQL injection attacks.

4. List the books in a category.

The next step for our visitor is to select a category and browse the available books. We don't want to send all the books, that would take too long, so we will send only 12 books (a chunk) at a time.

Fill in the body of the function below so that it returns a list of tuples with the book's unique id, title, author's last name, page count, and the URL of the picture on the first page of the book. The frontend will use this information to construct a grid of book previews with a link to read each book.The books should be ordered with the most recently created books first.

You should do this with a single query.

Your result should look like this:

[(u'Farm Animals', u'Murphy', 980, 8, u'/cache/images/89/291005289_a05ae73053.jpg'),
 ...
]

5. Fetch a single book

Now that our vistor has clicked on a book, we want to fetch all of its content so the Javascript in the browser can render it without further reference to the server. We want to produce a result formatted like the ones we read in earlier so create a Python dictionary and assign values to keys with the same names. Exclude the id from the result to make testing easier. Finally return a JSON encoded string, I've included that code for you inside the function.

I can't see any way to do this step with only one query. I got it down to three. That isn't so bad because after this step our visitor will be happily reading a book without bothering us again for a while.

Fill in the body of the function below so that it produces a JSON encoded string that looks like one of the entries in your original books array. Do not simply return an item from that array, you must reconstruct the book from the data in your database.

6. List the books by an author.

The authors at our site like to see a list of the books they have written. Fill in the code below to return a list of the books in the same format as listBooks above in part 3.

Done!

Get your report, save, and upload your notebook on Sakai.