Assigned: 15 September 2015
Due: 29 September 2015
# fill in your info here
onyen = "jpuccio"
collaborators = ['Zen Yang']
Download and unzip movies.db.zip to produce a movies.db file in your working folder. This database is about 836 megabytes so some of the queries may take a while, start early. This database has the following schema:
Customers (
cardNo INTEGER PRIMARY KEY,
first TEXT,
last TEXT,
sex CHAR,
dob DATE
)
Movies (
movieId INTEGER PRIMARY KEY,
title TEXT,
year INTEGER
)
Rentals (
cardNo INTEGER,
movieId INTEGER,
date DATE,
rating INTEGER,
PRIMARY KEY(cardNo, movieID, date),
FOREIGN KEY (cardNo) REFERENCES Customers,
FOREIGN KEY (movieId) REFERENCES Movies
)
Write code in the cells below to answer each question.
# setup
import sqlite3
conn = sqlite3.connect('movies.db')
cursor = conn.cursor()
import comp521
check, report = comp521.start('A2')
1a. How many male customers are there?
# write your query as a single string here
a1a = """SELECT COUNT(sex) FROM Customers WHERE sex='M'"""
# code to execute your query, leave this alone
q1a = cursor.execute(a1a).fetchone()[0]
check('1a', q1a, points=10)
1a correct
1b. How many movies were released in either 2004 or 2006?
a1b = """SELECT COUNT(movieId) FROM Movies WHERE (year='2004' OR year='2006')"""
# test code
q1b = cursor.execute(a1b).fetchone()[0]
check('1b', q1b, points=10)
1b correct
1c. What are the first and last names of the oldest customers in alphabetical order by last and first name?
a1c = """SELECT first,last FROM Customers WHERE dob=(SELECT MIN(dob) FROM Customers) ORDER BY last,first"""
# test
q1c = cursor.execute(a1c).fetchall()
check('1c', q1c, points=10)
1c correct
1d. How many customers were born before January 1st, 1950 (not inclusive)?
a1d = """SELECT COUNT(cardNo) FROM CUSTOMERS WHERE dob<'1950-01-01'"""
# test
q1d = cursor.execute(a1d).fetchone()[0]
check('1d', q1d, points=10)
1d correct
1e. What is the average rating for all movies rented between January 1st 2000, and December 31st 2000 (inclusive)?
#Question is, are these distinct movies? Probably. Apparently not.
a1e = """SELECT AVG(rating) FROM Rentals WHERE (date>='2000-01-01' AND date<='2000-12-31')"""
# test
q1e = cursor.execute(a1e).fetchone()[0]
check('1e', q1e, points=10)
1e correct
2a. Fill in code to make this function return a list with the titles of the films released in the given year, sorted alphabetically.
def f2a(year):
queryText = "SELECT title FROM Movies WHERE year = '"+ str(year) +"' ORDER BY title"
return [row[0] for row in cursor.execute(queryText).fetchall()]
#your code here
q2a = f2a(1935)
check('2a', q2a, points=10)
2a correct
2b. Write a function that takes a string representing some part of a film title, and returns a list of tuples with the first and last names of customers who rented those films and rated them 3 or higher. Sort this list alphabetically by last name and first names.
def f2b(title):
#queryText = "SELECT C.first,C.last FROM Customers C, Movies M, Rentals R WHERE M.title \
#LIKE '%"+title+"%' AND R.rating >= 3 ORDER BY C.last,C.first"
queryText = "SELECT C.first,C.last FROM Customers C WHERE C.cardNo IN (SELECT R.cardNo FROM Rentals R WHERE R.movieId IN (SELECT M.movieId FROM Movies M WHERE M.title LIKE '%"+title+"%') AND R.rating >=3) ORDER BY C.last, C.first"
return cursor.execute(queryText).fetchall()
q2b = f2b('rash Di')
check('2b', q2b, points=20)
2b correct
2c. Write a function that takes year as input, and returns a list of the titles of the most rented films among the films released that year.
#I guess think of group by as partitioning into distinct groups
#We essentially repeat the query twice.
#Inner one is to select the maxNumberOfRenters,
#the outer one is to select the movieIds that have that number of renters
def f2c(year):
queryText = "SELECT M.title FROM Movies M WHERE M.movieId IN (SELECT R.movieId FROM Rentals R WHERE R.movieId IN \
(SELECT M.movieId FROM Movies M WHERE M.year='"+str(year)+"')\
GROUP BY R.movieId HAVING COUNT(*) = \
(SELECT MAX(numberOfRenters) FROM (SELECT COUNT(*) AS numberOfRenters, R.movieId FROM Rentals R WHERE R.movieId IN \
(SELECT M.movieId FROM Movies M WHERE M.year='"+str(year)+"')\
GROUP BY R.movieId)))"
return [row[0] for row in cursor.execute(queryText).fetchall()]
q2c = f2c(2001)
check('2c', q2c, points=20)
2c correct
Now get your report, save, and submit your notebook. I recommend you restart the kernel, and run all, just to be sure things are going to work correctly when I run it.
report(onyen, collaborators)
1a 1b 1c 1d 1e 2a 2b 2c Report for jpuccio Collaborators: ['Zen Yang'] 8 of 8 correct, 100 of 100 points
Click the button below to submit your assignment. Watch for a confirmation message that your notebook was successfully uploaded. You may submit as often as you wish, only the last submission will count.