Intro

Types of Plots

There are numerous types of plots used in data visualization, each serving different purposes. Here are some common types:

  1. Line Plot: Shows data points connected by straight lines. Useful for visualizing trends over time.
  2. Bar Chart: Displays categorical data with rectangular bars. Suitable for comparing values across different categories.
  3. Histogram: Represents the distribution of continuous data by dividing it into intervals and showing the frequency of observations within each interval.
  4. Pie Chart: Illustrates the proportion of each category in a dataset as a circular chart divided into slices.
  5. Scatter Plot: Displays individual data points as dots on a two-dimensional graph, useful for exploring relationships between two variables.
  6. Box Plot (Box-and-Whisker Plot): Shows the distribution of a dataset by displaying the median, quartiles, and outliers.
  7. Heatmap: Visualizes data in a matrix format using colors to represent values, often used for displaying correlation matrices or large datasets.
  8. Area Plot: Similar to a line plot but with the area below the line filled with color, useful for representing cumulative values over time.
  9. Violin Plot: Combines the features of a box plot and a kernel density plot to show the distribution of data across different categories.
  10. Bubble Plot: Represents data points as bubbles with varying sizes and/or colors, often used to visualize three-dimensional data.

Types of Plot Libraries

Library Main Purpose Key Features Programming Language Level of Customization Dashboard Capabilities Types of Plots Possible
Matplotlib General - purpose plotting Comprehensive plot types and variety of customization options Python High Requires additional components and customization Line plots, scatter plots, bar charts, histograms, pie charts, box plots, heatmaps, etc.
Pandas Fundamentally used for data manipulation but also has plotting functionality Easy to plot directly on Panda data structures Python Medium Can be combined with web frameworks for creating dashboards Line plots, scatter plots, bar charts, histograms, pie charts, box plots, etc.
Seaborn Statistical data visualization Stylish, specialized statistical plot types Python Medium Can be combined with other libraries to display plots on dashboards Heatmaps, violin plots, scatter plots, bar plots, count plots, etc.
Plotly Interactive data visualization Interactive web - based visualizations Python, R, JavaScript High Dash framework is dedicated for building interactive dashboards Line plots, scatter plots, bar charts, pie charts, 3D plots, choropleth maps, etc.
Folium Geospatial data visualization Interactive, customizable maps Python Medium For incorporating maps into dashboards, it can be integrated with other frameworks/libraries Choropleth maps, point maps, heatmaps, etc.
PyWaffle Plotting Waffle charts Waffle charts Python Low Can be combined with other libraries to display waffle chart on dashboards Waffle charts, square pie charts, donut charts, etc.

Data Processing — Using Pandas

Task Syntax Description Example
Load CSV data pd.read_csv('filename.csv') Read data from a CSV file into a Pandas DataFrame df_can = pd.read_csv('data.csv')
Handling Missing Values df.dropna Drop rows with missing values df_can.dropna()
df.fillna(value) Fill missing values with a specified value df_can.fillna(0)
Removing Duplicates df.drop_duplicates() Remove duplicate rows df_can.drop_duplicates()
Renaming Columns df.rename(columns={'old_name': 'new_name'}) Rename one or more columns df_can.rename(columns={'Age': 'Years'})
Selecting Columns df['column_name'] or df.column_name Select a single column df_can.Age or df_can['Age']
df[['col1', 'col2']] Select multiple columns df_can[['Name', 'Age']]
Filtering Rows df[df['column'] > value] Filter rows based on a condition df_can[df_can['Age'] > 30]
Applying Functions to Columns df['column'].apply(function_name) Apply a function to transform values in a column df_can['Age'].apply(lambda x: x + 1)
Creating New Columns df['new_column'] = expression Create a new column with values derived from existing ones df_can['Total'] = df_can['Quantity'] * df_can['Price']
Grouping and Aggregating df.groupby('column').agg({'col1':'sum', 'col2':'mean'}) Group rows by a column and apply aggregate functions df_can.groupby('Category').agg({'Total':'mean'})
Sorting Rows df.sort_values('column', ascending=True/False) Sort rows based on a column df_can.sort_values('Date', ascending=True)
Displaying First n Rows df.head(n) Show the first n rows of the DataFrame df_can.head(3)
Displaying Last n Rows df.tail(n) Show the last n rows of the DataFrame df_can.tail(3)
Checking for Null Values df.isnull() Check for null values in the DataFrame df_can.isnull()
Selecting Rows by Index df.iloc[index] Select rows based on integer index df_can.iloc[3]
df.iloc[start:end] Select rows in a specified range df_can.iloc[2:5]
Selecting Rows by Label df.loc[label] Select rows based on label/index name df_can.loc['Label']
df.loc[start:end] Select rows in a specified label/index range df_can.loc['Age':'Quantity']
Summary Statistics df.describe() Generates descriptive statistics for numerical columns df_can.describe()

Basic and Specialized Visualization Tools