The end of 11-5 is as follows, and the ‘predictions.h5’ file contains only ‘test’ data.
for lookahead in [1, 5, 10, 21]:
if lookahead > 1:
continue
print(f’\nLookahead: {lookahead:02}‘)
data = (pd.read_hdf(‘data.h5’, ‘stooq/japan/equities’))
labels = sorted(data.filter(like=‘fwd’).columns)
features = data.columns.difference(labels).tolist()
label = f’fwd_ret_{lookahead:02}’
data = data.loc[:, features + [label]].dropna()
・
・
・
by_day = test_predictions.groupby(level=‘date’)
for position in range(10):
if position == 0:
ic_by_day = by_day.apply(lambda x: spearmanr(x.y_test, x[position])[0]).to_frame()
else:
ic_by_day[position] = by_day.apply(lambda x: spearmanr(x.y_test, x[position])[0])
@y5432@anthonberg notebook 5 ch11 creates predictions.h5; you’re right the code only saves the train data and 11-7 then asks for train as well.
You can store the train data in 11-5 as well if you like and then use those in 11-7 to run the backtest over both train and test periods. It’s not difficult to adapt the code accordingly, hope this helps.
@Stefan
Thank you for your reply.
I also referenced the following.
Does it mean using the train data created in 11-5?
The train predictions are created during cross validation and stored in the last line of cell currently labeled 43 with pd.concat(predictions).to_hdf(cv_store, ‘predictions/’ + key).