Similarly to how we read a CSV file, we can do this as well with SQL databases which are commonly used for storing data.
In order to do this we need the following imports :
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sqlite3
%matplotlib inline
conn = sqlite3.connect('data/sakila.db')
df = pd.read_sql('''
SELECT
rental.rental_id, rental.rental_date, rental.return_date,
customer.last_name AS customer_lastname,
store.store_id,
city.city AS rental_store_city,
film.title AS film_title, film.rental_duration AS film_rental_duration,
film.rental_rate AS film_rental_rate, film.replacement_cost AS film_replacement_cost,
film.rating AS film_rating
FROM rental
INNER JOIN customer ON rental.customer_id == customer.customer_id
INNER JOIN inventory ON rental.inventory_id == inventory.inventory_id
INNER JOIN store ON inventory.store_id == store.store_id
INNER JOIN address ON store.address_id == address.address_id
INNER JOIN city ON address.city_id == city.city_id
INNER JOIN film ON inventory.film_id == film.film_id
;
''', conn, index_col='rental_id', parse_dates=['rental_date', 'return_date'])
We can use this to load our data and like CSV we can use the following commands :
df.head()
df.info
df.shape
df.describe()
we can create arrays in Python that holds data that we specifically outline to be held. An example of this is lets say i want to create a list of films with the highest replacement cost. These arrays are formed through the following
df['Film_replacement_cost'].max() #Finds the maximum (aka highest/biggest)
df.loc[df['film_replacement_cost'] = df['film_replacement_cost'].max() , 'film_title'].unique()