I'll explore the time series analytics with Python 3 and Pandas.
$ mkdir timeseries-analytics && cd timeseries-analytics $ python3 -mvenv venv $ source venv/bin/activate $ python3 -mpip install pandas jupyter $ ipython Python 3.7.2 (default, Jan 13 2019, 12:50:01) Type 'copyright', 'credits' or 'license' for more information IPython 7.4.0 -- An enhanced Interactive Python. Type '?' for help. In : import pandas as pd
Pandas provides two fundamental data structures, Series and DataFrame. You can simply consider them as Column and Table.
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.).
To create a Series, use
In : pd.Series([2, 3, 5, 7, 11]) Out: 0 2 1 3 2 5 3 7 4 11 dtype: int64
pd.Series() supports an additional
index option to set labels explicitly, for example,
pd.Series([2, 3, 5, 7, 11], index=[1, 2, 3, 4, 5]).
It's also essential to know that each Series has a
dtype, which is the data type of each value. For example, it's
int64 in above code snippet.
Series is dict-like, so you can use index label
s["key"] to get value if the index is a list of strings.
Series is ndarray-like, so you can use multiple ways to slice the index. For example,
s: Get the first element.
s[:3]: Get the first, second, and third element.
s[s > s.median()]: Get those elements whose value is greater than median value.
s[[0, 2]]: Get the first, and the third element.
Last but not least, each Series can have a name. You can set it by an additional
name option, such as
pd.Series([2, 3, 5, 7, 11], name="prime").