I'll explore the time series analytics with Python 3 and Pandas.
$ mkdir timeseries-analytics && cd timeseries-analytics
$ python3 -mvenv venv
$ source venv/bin/activate
$ python3 -mpip install pandas jupyter
$ ipython
Python 3.7.2 (default, Jan 13 2019, 12:50:01)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.4.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import pandas as pd
Pandas provides two fundamental data structures, Series and DataFrame. You can simply consider them as Column and Table.
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.).
To create a Series, use pd.Series()
:
In [2]: pd.Series([2, 3, 5, 7, 11])
Out[2]:
0 2
1 3
2 5
3 7
4 11
dtype: int64
pd.Series()
supports an additional index
option to set labels explicitly, for example, pd.Series([2, 3, 5, 7, 11], index=[1, 2, 3, 4, 5])
.
It's also essential to know that each Series has a dtype
, which is the data type of each value. For example, it's int64
in above code snippet.
Series is dict-like, so you can use index label s["key"]
to get value if the index is a list of strings.
Series is ndarray-like, so you can use multiple ways to slice the index. For example,
s[0]
: Get the first element.s[:3]
: Get the first, second, and third element.s[s > s.median()]
: Get those elements whose value is greater than median value.s[[0, 2]]
: Get the first, and the third element.Last but not least, each Series can have a name. You can set it by an additional name
option, such as pd.Series([2, 3, 5, 7, 11], name="prime")
.