Assigned: 15 September 2015
Due: 29 September 2015
# fill in your info here onyen = "jpuccio" collaborators = ['Zen Yang']
Download and unzip movies.db.zip to produce a movies.db file in your working folder. This database is about 836 megabytes so some of the queries may take a while, start early. This database has the following schema:
Customers ( cardNo INTEGER PRIMARY KEY, first TEXT, last TEXT, sex CHAR, dob DATE ) Movies ( movieId INTEGER PRIMARY KEY, title TEXT, year INTEGER ) Rentals ( cardNo INTEGER, movieId INTEGER, date DATE, rating INTEGER, PRIMARY KEY(cardNo, movieID, date), FOREIGN KEY (cardNo) REFERENCES Customers, FOREIGN KEY (movieId) REFERENCES Movies )
Write code in the cells below to answer each question.
# setup import sqlite3 conn = sqlite3.connect('movies.db') cursor = conn.cursor() import comp521 check, report = comp521.start('A2')
1a. How many male customers are there?
# write your query as a single string here a1a = """SELECT COUNT(sex) FROM Customers WHERE sex='M'""" # code to execute your query, leave this alone q1a = cursor.execute(a1a).fetchone() check('1a', q1a, points=10)
1b. How many movies were released in either 2004 or 2006?
a1b = """SELECT COUNT(movieId) FROM Movies WHERE (year='2004' OR year='2006')""" # test code q1b = cursor.execute(a1b).fetchone() check('1b', q1b, points=10)
1c. What are the first and last names of the oldest customers in alphabetical order by last and first name?
a1c = """SELECT first,last FROM Customers WHERE dob=(SELECT MIN(dob) FROM Customers) ORDER BY last,first""" # test q1c = cursor.execute(a1c).fetchall() check('1c', q1c, points=10)
1d. How many customers were born before January 1st, 1950 (not inclusive)?
a1d = """SELECT COUNT(cardNo) FROM CUSTOMERS WHERE dob<'1950-01-01'""" # test q1d = cursor.execute(a1d).fetchone() check('1d', q1d, points=10)
1e. What is the average rating for all movies rented between January 1st 2000, and December 31st 2000 (inclusive)?
#Question is, are these distinct movies? Probably. Apparently not. a1e = """SELECT AVG(rating) FROM Rentals WHERE (date>='2000-01-01' AND date<='2000-12-31')""" # test q1e = cursor.execute(a1e).fetchone() check('1e', q1e, points=10)
2a. Fill in code to make this function return a list with the titles of the films released in the given year, sorted alphabetically.
def f2a(year): queryText = "SELECT title FROM Movies WHERE year = '"+ str(year) +"' ORDER BY title" return [row for row in cursor.execute(queryText).fetchall()] #your code here q2a = f2a(1935) check('2a', q2a, points=10)
2b. Write a function that takes a string representing some part of a film title, and returns a list of tuples with the first and last names of customers who rented those films and rated them 3 or higher. Sort this list alphabetically by last name and first names.
def f2b(title): #queryText = "SELECT C.first,C.last FROM Customers C, Movies M, Rentals R WHERE M.title \ #LIKE '%"+title+"%' AND R.rating >= 3 ORDER BY C.last,C.first" queryText = "SELECT C.first,C.last FROM Customers C WHERE C.cardNo IN (SELECT R.cardNo FROM Rentals R WHERE R.movieId IN (SELECT M.movieId FROM Movies M WHERE M.title LIKE '%"+title+"%') AND R.rating >=3) ORDER BY C.last, C.first" return cursor.execute(queryText).fetchall() q2b = f2b('rash Di') check('2b', q2b, points=20)
2c. Write a function that takes year as input, and returns a list of the titles of the most rented films among the films released that year.
#I guess think of group by as partitioning into distinct groups #We essentially repeat the query twice. #Inner one is to select the maxNumberOfRenters, #the outer one is to select the movieIds that have that number of renters def f2c(year): queryText = "SELECT M.title FROM Movies M WHERE M.movieId IN (SELECT R.movieId FROM Rentals R WHERE R.movieId IN \ (SELECT M.movieId FROM Movies M WHERE M.year='"+str(year)+"')\ GROUP BY R.movieId HAVING COUNT(*) = \ (SELECT MAX(numberOfRenters) FROM (SELECT COUNT(*) AS numberOfRenters, R.movieId FROM Rentals R WHERE R.movieId IN \ (SELECT M.movieId FROM Movies M WHERE M.year='"+str(year)+"')\ GROUP BY R.movieId)))" return [row for row in cursor.execute(queryText).fetchall()] q2c = f2c(2001) check('2c', q2c, points=20)
Now get your report, save, and submit your notebook. I recommend you restart the kernel, and run all, just to be sure things are going to work correctly when I run it.
1a 1b 1c 1d 1e 2a 2b 2c Report for jpuccio Collaborators: ['Zen Yang'] 8 of 8 correct, 100 of 100 points
Click the button below to submit your assignment. Watch for a confirmation message that your notebook was successfully uploaded. You may submit as often as you wish, only the last submission will count.