References:

General functions — pandas 2.2.3 documentation

Pandas Cheat Sheet & Quick Reference

Pandas is an essential library for data manipulation and analysis in Python, renowned for its capabilities to handle and process large datasets efficiently. Whether you're a beginner or a seasoned data professional, Pandas provides flexible data structures and functions that make data analysis seamless and straightforward.

Pandas is an open-source programming library offering programmers working in Python a more efficient way to analyze data, create visualizations, and manipulate data sets. Although the primary use for Pandas is data analysis, this library also supports machine learning, allowing you to prepare the data that you will ultimately use when training your machine learning model.

The Pandas library has several features that can help simplify your job. When working with large data sets, you can use Pandas to sort through all that information and find the data you’re looking for based on specific conditions. It also helps to improve the overall quality of your data, with the ability to remove irrelevant values, empty sections of your data set, and correct missing values. In some cases, you may need to manipulate your data, and Pandas conveniently offers features that allow you to do things such as restructure and combine data sets. Additionally, you can create data visualizations with Panda visualization tools or integrate them with other Python libraries. Pandas has applications beyond data analysis. The machine learning models built in other frequently used Python libraries, such as TensorFlow, can use the structured data sets put together in Pandas. The Pandas library is also popular in the data science community since it integrates well with data science Python libraries and provides you with more options regarding what you can accomplish with your data.

Install Pandas on PyCharm:

pip install pandas in Terminal

Or:

Look for the Pandas package in Python Packages

Use: import pandas

The following cheat sheet is a quick reference guide for some of the most common operations you might perform with the Pandas library:

Importing Data

Action Definition Example Code Snippet
Import Standard import statement to bring Pandas into the script import pandas as pd
Read_CSV Reads a comma-separated values (CSV) file into DataFrame df = pd.read_csv('file.csv')
Read_Table Reads a general delimited file into DataFrame df = pd.read_table('file.txt')
Read_Excel Reads an Excel file into DataFrame df = pd.read_excel('file.xlsx')
Read_SQL Reads SQL query or database table into DataFrame df = pd.read_sql('SELECT * FROM table', conn)
Read_JSON Reads a JSON formatted string into DataFrame df = pd.read_json('file.json')
Read_HTML Reads HTML tables into DataFrame df = pd.read_html('url')
Clipboard Reads text from the clipboard into DataFrame df = pd.read_clipboard()

Exporting Data

Action Definition Example Code Snippet
To_CSV Writes DataFrame to a comma-separated values (CSV) file df.to_csv('file.csv')
To_Excel Writes DataFrame to an Excel file df.to_excel('file.xlsx')
To_SQL Writes DataFrame to a SQL database df.to_sql('table_name', conn)
To_JSON Writes DataFrame to a JSON formatted string df.to_json('file.json')
To_HTML Writes DataFrame to HTML tables df.to_html('file.html')
To_Clipboard Writes DataFrame to the clipboard df.to_clipboard()