Innovation AI

DataHandler/ Feature Engineer / Signal / Portfolio Constructor / Execution Model/ Performance Analyzer

  1. (Engineered a modular Python/SQL pipeline processing 15K+ macroeconomic and sector ETF time series for real-time data ingestion and backtesting)

数据频率

  1. Stock, Daily Frequency, Adjusted Daily Closed (log return), Yahoo Finance

dropping missing data on non_trading days

carry forward , but for ETF’s missing values are rare

  1. Macro Data: based on the date released, FRED,

we have to store the macro indicator in to files 1. release_data 2. effective date

align it to the nxt trading day,

Macro data are aligned using their actual release datas, signals only become tradable from the next trading dat after the release to avoid look ahead bias

forward-fill until the next date

回测触发和成交假设

Signal Computed at t close

Trade executed at t+1 open

  1. Developed machine-learning signals (Random Forest, XGBoost) using technical indicators (RSI, MACD, etc), achieving a R² of 0.67 after optimizing hyperparameters via grid search and cross-validation

预测目标是什么: Next-day return

R^2 是在训练集还是验证集,是在ts1 spit还是随机split? (no random split)

R squared of 0.67 is computed on a walk forward validation set, using expanding windows, validation is strictly out of sample / Sanity check rather than performance claim

R2 is not the primary metric for return prediction but a diagonostic tool to ensure that signal captured beyong noise; the actual evaluation focuses on IC, RankIC, turnover adjusted PnL and sharp, hit ratio

for each day we calculated the correlation between the signal at time t and realized excessisve return at t + 1

XGboost and hyper parameters

  1. control complexity (防止过拟合) (max depth 3 - 6, subsample 0.6-0.9 and colsample_bytree), eta = 0.01 -0.1, reg_lambda > 1
  2. 参数不乱调