1. In this folder, read the round5_wiki.md carefully to understand the algorithm trading task and read the Trader_class.md and overall.py to understand the format for writing the strategy. Write a EDA (exploratory data analysis) about the tickers: Instant Translators, Construction Panels: Understand and identify the price patterns. After this, give practical profitable trading strategy (taker, maker, stats arbitrage, signal, momentum, and etc), visualize whenever you can and return me a Jupyter Notebook file named round5_translators_panel_EDA.ipynb
  2. Read the dashboard.html file. Currently, it does not work for parsing multiple days of log data. We want to implement a function in which we can have different tabs at the top to switch between days if we upload a logbook that has multiple days. So each page should still look and function like it currently does, but it needs to be able to parse all 3 days. Provide me with a new dashboard, a. html file.
  3. You are helping me research IMC Prosperity 4 Round 5 algorithmic trading data.

Goal:

I want an exploratory research notebook/script for the 50 Round 5 products. Focus only on data analysis and visualization:

Do NOT implement trading rules.

Do NOT generate buy/sell signals.

Do NOT optimize thresholds.

Do NOT use trader_id, buyer, seller, or any identity fields.

Context:

Product groups:

GROUPS = {

"GALAXY_SOUNDS": [

    "GALAXY_SOUNDS_DARK_MATTER",

    "GALAXY_SOUNDS_BLACK_HOLES",

    "GALAXY_SOUNDS_PLANETARY_RINGS",

    "GALAXY_SOUNDS_SOLAR_WINDS",

    "GALAXY_SOUNDS_SOLAR_FLAMES",

],

"SLEEP_POD": [

    "SLEEP_POD_SUEDE",

    "SLEEP_POD_LAMB_WOOL",

    "SLEEP_POD_POLYESTER",

    "SLEEP_POD_NYLON",

    "SLEEP_POD_COTTON",

],

"MICROCHIP": [

    "MICROCHIP_CIRCLE",

    "MICROCHIP_OVAL",

    "MICROCHIP_SQUARE",

    "MICROCHIP_RECTANGLE",

    "MICROCHIP_TRIANGLE",

],

"PEBBLES": [

    "PEBBLES_XS",

    "PEBBLES_S",

    "PEBBLES_M",

    "PEBBLES_L",

    "PEBBLES_XL",

],

"ROBOT": [

    "ROBOT_VACUUMING",

    "ROBOT_MOPPING",

    "ROBOT_DISHES",

    "ROBOT_LAUNDRY",

    "ROBOT_IRONING",

],

"UV_VISOR": [

    "UV_VISOR_YELLOW",

    "UV_VISOR_AMBER",

    "UV_VISOR_ORANGE",

    "UV_VISOR_RED",

    "UV_VISOR_MAGENTA",

],

"TRANSLATOR": [

    "TRANSLATOR_SPACE_GRAY",

    "TRANSLATOR_ASTRO_BLACK",

    "TRANSLATOR_ECLIPSE_CHARCOAL",

    "TRANSLATOR_GRAPHITE_MIST",

    "TRANSLATOR_VOID_BLUE",

],

"PANEL": [

    "PANEL_1X2",

    "PANEL_2X2",

    "PANEL_1X4",

    "PANEL_2X4",

    "PANEL_4X4",

],

"OXYGEN_SHAKE": [

    "OXYGEN_SHAKE_MORNING_BREATH",

    "OXYGEN_SHAKE_EVENING_BREATH",

    "OXYGEN_SHAKE_MINT",

    "OXYGEN_SHAKE_CHOCOLATE",

    "OXYGEN_SHAKE_GARLIC",

],

"SNACKPACK": [

    "SNACKPACK_CHOCOLATE",

    "SNACKPACK_VANILLA",

    "SNACKPACK_PISTACHIO",

    "SNACKPACK_STRAWBERRY",

    "SNACKPACK_RASPBERRY",

],

}

Step 1: Load and clean data

  1. Inspect the local directory and find all Round 5 price/orderbook CSVs and trade CSVs.

  2. Standard IMC orderbook files may contain columns like:

  3. Standard trade files may contain:

  4. Normalize column names:

  5. Completely ignore buyer, seller, trader_id, and any identity-related columns.

  6. Create clean DataFrames:

  7. Add a global time index:

    so that the three days can be plotted continuously.

  8. Save cleaned versions to:

Step 2: Compute price, spread, and order book features

For every product and timestamp, compute:

  1. best_bid:

  2. best_ask:

  3. mid:

  4. spread:

  5. relative_spread:

  6. top_level_imbalance:

  7. microprice:

  8. wall_mid:

  9. log_mid:

  10. returns:

Save product-level feature data to:

Create summary tables:

A. outputs/product_summary.csv

For each product:

B. outputs/group_summary.csv

For each group:

Step 3: Group factor and residual research

For each group of 5 products:

  1. Create a timestamp-aligned matrix of mid prices:

    rows = timestamps

    columns = products

  2. Create normalized price plots:

    A. normalized by first value:

    indexed_price_i,t = 100 * mid_i,t / mid_i,0

    B. log-normalized:

    log_mid_i,t - log_mid_i,0

  3. Compute group common factors:

    A. mean_log_factor:

    average of the five log_mid series

    B. median_log_factor:

    median of the five log_mid series

    C. mean_indexed_factor:

    average of indexed prices

    D. optional PCA factor:

    if sklearn is available, compute first principal component of standardized log prices

  4. For each product in each group, fit a simple OLS residual model:

    log_mid_i,t = alpha_i + beta_i * group_factor_t + residual_i,t

    Do this separately for:

  5. Store for each product:

  6. Compute residual diagnostics:

    For each product residual:

  7. Compute pairwise correlations:

    For each group:

  8. Save outputs:

Visualizations:

Create a figures/ directory and save the following.

For each group:

  1. figures/normalized_prices_{group}.png

  2. figures/log_normalized_prices_{group}.png

  3. figures/spreads_{group}.png

  4. figures/relative_spreads_{group}.png

  5. figures/return_correlation_heatmap_{group}.png

  6. figures/level_correlation_heatmap_{group}.png

  7. figures/residuals_{group}.png

  8. figures/residual_zscores_{group}.png

  9. figures/residual_acf_{group}.png

  10. figures/return_acf_{group}.png

  11. figures/residual_future_change_scatter_{group}.png

  12. figures/orderbook_dashboard_{group}.html

Final written summary:

At the end of the notebook, print a concise research summary with:

  1. Which groups have the strongest common movement?

  2. Which groups have the highest average pairwise return correlation?

  3. Which groups have the cleanest residual structure?

  4. Which products have the most negative residual mean-reversion gamma?

  5. Which products have the strongest negative return ACF?

  6. Which products have residuals that frequently cross zero?

  7. Which products have spreads too wide to be useful?

  8. Which groups/products should be prioritized for the next step of strategy research?

    Important:

Result

Products