Pandas

Getting Started

I'll explore the time series analytics with Python 3 and Pandas.

$ mkdir timeseries-analytics && cd timeseries-analytics
$ python3 -mvenv venv
$ source venv/bin/activate
$ python3 -mpip install pandas jupyter
$ ipython
Python 3.7.2 (default, Jan 13 2019, 12:50:01)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.4.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import pandas as pd

Data Structures

Pandas provides two fundamental data structures, Series and DataFrame. You can simply consider them as Column and Table.

Series

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.).

To create a Series, use pd.Series() :

In [2]: pd.Series([2, 3, 5, 7, 11])
Out[2]:
0     2
1     3
2     5
3     7
4    11
dtype: int64

pd.Series() supports an additional index option to set labels explicitly, for example, pd.Series([2, 3, 5, 7, 11], index=[1, 2, 3, 4, 5]).

It's also essential to know that each Series has a dtype, which is the data type of each value. For example, it's int64 in above code snippet.

Series is dict-like, so you can use index label s["key"] to get value if the index is a list of strings.

Series is ndarray-like, so you can use multiple ways to slice the index. For example,

Last but not least, each Series can have a name. You can set it by an additional name option, such as pd.Series([2, 3, 5, 7, 11], name="prime").

DataFrame