Jupyter | Python | Pandas

This project is a hands-on exploration of what data engineering can do in the sports world — from scraping and cleaning, to analyzing and visualizing large-scale NBA data. Using Python, Jupyter Notebooks, and tools like nba_api and pandas, I walk through a fully functioning pipeline designed to uncover meaningful patterns between NBA player performance and the climates they’ve played in.

Our guiding question is simple but revealing:

Do players perform differently in hot vs. cold NBA cities?

Why This Project?

Modern data analytics have revolutionized professional basketball. From load management to shot selection, data drives real decisions. But beyond expected stats and averages, I wanted to explore the game through an unconventional lens — climate. Could temperature correlate with performance? Are there patterns that teams or analysts overlook?

A Thematic Hook: Team Fire vs Team Ice

We’ll be building toward a creative goal: dividing players into two fictional teams — Fire (warm-weather standouts) and Ice (cold-weather performers) — and comparing their styles, trends, and potential matchups. But first, we need a foundation. All great data systems start small and scale. By leveraging accessible data through an unconventional lens, we have the chance to uncover compelling, data-driven stories about the game.

Task 1 - Proof of Concept: LeBron James

Before working with many players, it’s important to validate the entire pipeline. That’s where LeBron James comes in — the King himself, a perfect test case with a long, geographically diverse career. If we can make this process work for him, we can scale it.

This section walks through the complete end-to-end flow for retrieving, enriching, cleaning, and analyzing data for a single player.

Step 1: Data Collection - Retrieving Player Game Logs

To query any player data from the NBA API, we first need their unique player ID. The nba_api library lets us search for player metadata and extract this identifier — in this case, for LeBron James.

image.png

Step 2: Pull LeBron’s Game Logs by Season

Using that player ID, we loop through each season from 2003 to 2024, pulling game-level data including points, assists, rebounds, and opponent info. A short delay is built into the loop (time.sleep(0.6)) to respect the API’s rate limits — an important detail in any real-world data pipeline.

image.png

Step 3 + 4: Map NBA Team Abbreviations to City Names | Add Average Temperatures for Each City

Each NBA team is represented by a three-letter abbreviation. To identify where games were played, we’ll map these abbreviations to their corresponding cities (e.g., 'LAL' → 'Los Angeles').

We then manually add the historical average temperature (in Fahrenheit) for each NBA city. This simplifies the analysis by associating each city with a representative climate baseline, allowing us to later evaluate how performance varies by temperature.