Data Analyst Role -
Getting the data
- Depending on the company, Getting the data can be as simple as a SQL query or as difficult as scraping the entire website.
Parsing and cleaning the data
- Depending on the sources, you will need to do a bit of preparation such as excluding outliers, filling in null values, translating values
Doing the analysis
- This involves your own domain expertise as well as the tools available for this job. It is important to know the principles of statistics but you can also use statsmodels to simplify the job. The analysis part is usually iterative and involves other
Building Models
- The whole point of the analysis is finding patterns and in particular cases to build more general models. Your models can be predictions, clustering's or just automated reports. In a general sense, it is the result of all the previous phases.
Numpy : computing library -
- Numpy allows for numeric computations with C primitives
- Efficient collections with vectorized operations
Firstly we need to import numpy as it is a library. We can do this through the following :
import numpy as np
Basic Numpy Arrays -
Arrays allow us to hold data and work with it similarly to an CSV file. To create an array that is using the numpy library we can do the following things :
#Creates an array of items
np.array([1,2,3,4])
#passing a list to a variable
a = np.array([1,2,3,4])
b = np.array([0,2,12,0.4,2,1.2])
#Retrieving elements from a numpy array
a[0], a[1]
#Retrieving from a certain element to the end or specific element
a[0:]
a[1:3]
#Creating a numpy array
b[[0,2,-1]]