DataHandler/ Feature Engineer / Signal / Portfolio Constructor / Execution Model/ Performance Analyzer
数据频率
dropping missing data on non_trading days
carry forward , but for ETF’s missing values are rare
we have to store the macro indicator in to files 1. release_data 2. effective date
align it to the nxt trading day,
Macro data are aligned using their actual release datas, signals only become tradable from the next trading dat after the release to avoid look ahead bias
forward-fill until the next date
回测触发和成交假设
Signal Computed at t close
Trade executed at t+1 open
预测目标是什么: Next-day return
R^2 是在训练集还是验证集,是在ts1 spit还是随机split? (no random split)
R squared of 0.67 is computed on a walk forward validation set, using expanding windows, validation is strictly out of sample / Sanity check rather than performance claim
R2 is not the primary metric for return prediction but a diagonostic tool to ensure that signal captured beyong noise; the actual evaluation focuses on IC, RankIC, turnover adjusted PnL and sharp, hit ratio
for each day we calculated the correlation between the signal at time t and realized excessisve return at t + 1
XGboost and hyper parameters