
Github
https://github.com/LeeHaEun1/OTT_Log_Analysis
Project Overview
Data Overview
- The dataset contains user interaction logs from an OTT platform, capturing key steps in the content discovery and engagement journey. It includes:
- User context (e.g., country, watching device)
- Whether the content was auto-recommended
- Search-related behavior (e.g., search queries, titles surfaced through search)
- User engagement following recommendations or search results
- Timestamps of each interaction (UTC)
Objectives
- Clarifying Field Definitions
- The original dataset lacked clear definitions for many fields and values. To ensure proper interpretation, I refined the definitions using a combination of online research and logical inference.
- Data Preprocessing
- Based on the refined field definitions, I filtered out rows with invalid or unjustifiable null values to maintain data integrity.
- Multi-dimensional Visualization Design
- While the analysis relied primarily on bar charts, I tried to incorporate multiple dimensions within each figure.
- For example, in Figure 2, bars were grouped by country (x-axis), subdivided by interaction type, and color-coded by discovery path, enabling layered insights in a single plot.
- Extracting Business Insights
- Based on the patterns observed in exploratory visualizations, I drew insights for improving user engagement.
Data Preprocessing

- As a preliminary check, I performed a simple missing value analysis, which revealed that two fields—
Query Typed
and Displayed Name
—contained null values. These fields appeared to be semantically related to the Section
field, which indicates how the result was generated. (Table 1)

- The
Section
field was a categorical variable with three values: Prequery Results
, Suggestion Results
, and Title Results
. However, since these categories were not clearly defined, I clarified their meanings through a combination of online research and logical inference before proceeding with the analysis. (Table 2)

- According to the clarified definitions in Table 2, the three cases listed in Table 3 were logically inconsistent, so I removed the corresponding rows from the dataset.
- Case 1 and 2: The user did not type a query (
Query Typed
= null), but the result was categorized as generated based on user input (Section
= Suggestion Results
or Title Results
).
- Case 3: There was no record of an autocomplete result (
Displayed Name
= null), yet the result was categorized as system-generated (Section
= Suggestion Results
).