import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
Pandas can easily read data stored in different file formats like CSV, JSON, XML or even Excel. Parsing always involves specifying the correct structure, encoding and other details. The read_csv method reads CSV files and accepts many parameters.
pd.read_csv?
df = pd.read_csv('data/btc-market-price.csv') #Retrieves the info from csv file
df.head() #displays the csv
The CSV file we are reading has only two columns which are timestamp and price. This CSV file doesn't have a header, it only contains whitespaces and has values separated by commas. Pandas automatically assigned the first row of data as headers which is incorrect. We can over write this behaviour with the header parameter.
df = pd.read_csv('data/btc-market-price.csv', header = none)
df.head()
WE can set the names of the columns though in order to improve visibility
df.columns = ['Timestamp','Price'] #Columns for the data set
df.shape
df.head()
df.tail(3) #Retrieves last 3 rows
df.dtypes
pd.to_datetime(df['Timestamp']).head()
df['TimeStamp'] = pd.to_datetime(df['Timestamp'])
df.set_index('Timestamp', inplace = true) #Sets the index time stamp to df and saves it to the data frame
df.loc['2017-09-29']
Desired steps of the data frame to parse our CSV file
df = pd.read_csv('data/btc-market-price.csv', header=None)
df.columns = ['Timestamp', 'Price']
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df.set_index('Timestamp', inplace=True)
#################################################
Final steps for our desired CSV
#################################################
However, This can be quite repetitive and there is a faster way to achieve this, while also keeping it easier to read
df = pd.read_csv(
'data/btc-market-price.csv',
header=None,
names=['Timestamp', 'Price'],
index_col=0,
parse_dates=True
)
df.plot() #Plots a graph using the entire CSV
Behind the scenes of This, it is using matplot.lib.pyplot
interface. We can create a similar plot with the plt.plot() function:
plt.plot(df.index, df['Price'])
x = np.arange(-10,11) #From -10 to 10 1D array
plt.plot(x,x ** 2) #y^2 creates a quadratic graph
plt.plot(x, -1 * ( x ** 2)) #Negative quadratic graph