Lecture 2 - How to operate with data from SQL

The Sakila Database -

Similarly to how we read a CSV file, we can do this as well with SQL databases which are commonly used for storing data.

In order to do this we need the following imports :

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sqlite3

%matplotlib inline

conn = sqlite3.connect('data/sakila.db')

df = pd.read_sql('''
    SELECT
        rental.rental_id, rental.rental_date, rental.return_date,
        customer.last_name AS customer_lastname,
        store.store_id,
        city.city AS rental_store_city,
        film.title AS film_title, film.rental_duration AS film_rental_duration,
        film.rental_rate AS film_rental_rate, film.replacement_cost AS film_replacement_cost,
        film.rating AS film_rating
    FROM rental
    INNER JOIN customer ON rental.customer_id == customer.customer_id
    INNER JOIN inventory ON rental.inventory_id == inventory.inventory_id
    INNER JOIN store ON inventory.store_id == store.store_id
    INNER JOIN address ON store.address_id == address.address_id
    INNER JOIN city ON address.city_id == city.city_id
    INNER JOIN film ON inventory.film_id == film.film_id
    ;
''', conn, index_col='rental_id', parse_dates=['rental_date', 'return_date'])

We can use this to load our data and like CSV we can use the following commands :

df.head()
df.info
df.shape
df.describe()

we can create arrays in Python that holds data that we specifically outline to be held. An example of this is lets say i want to create a list of films with the highest replacement cost. These arrays are formed through the following

df['Film_replacement_cost'].max() #Finds the maximum (aka highest/biggest)
df.loc[df['film_replacement_cost'] = df['film_replacement_cost'].max() , 'film_title'].unique()