I think the spread is being calculated wrong in chapter 9

I’m looking at this function in Chapter 9 Notebook 6 and the logic of calculating the hedge ratio is unlike anything I’ve seen. Typically in a pairs-based strategy you create a spread based on linear regression, minimum variance, VAR, PCA, dollar value, or beta value. But in these instances, you will determine the spread for the duration of a period. Then look for standard deviations from this value to create long/short opportunities. In the code below the hedge ratio is dynamically changing each day based on the price values of x and y. I’m not familiar with this approach and I’m wondering where it is derived from. @Stefan is this something you created that you found to be successful? Or is it an error in the calculation?

It would seem to me to take the initial value of the hedge ratio to calculate the spread.

            dynamic_hedge_ratios = KFHedgeRatio(x=KFSmoother(prices.loc[t:test_end, y]),
                                                y=KFSmoother(prices.loc[t:test_end, x]))[:, 0]
            
            fixed_hedge_ratio = dynamic_hedge_ratios[0]

Or perhaps an average of some previous period’s hedge ratio.

            dynamic_hedge_ratios = KFHedgeRatio(x=KFSmoother(prices.loc[t:test_end, x]),
                                                y=KFSmoother(prices.loc[t:test_end, y]))[:, 0]
            
            average_hedge_ratio = np.mean(dynamic_hedge_ratios)

Here is the original function I’m referencing.

def get_spread(candidates, prices):
    pairs = []
    half_lives = []

    periods = pd.DatetimeIndex(sorted(candidates.test_end.unique()))
    start = time()
    for p, test_end in enumerate(periods, 1):
        start_iteration = time()

        period_candidates = candidates.loc[candidates.test_end == test_end, ['y', 'x']]
        trading_start = test_end + pd.DateOffset(days=1)
        t = trading_start - pd.DateOffset(years=2)
        T = trading_start + pd.DateOffset(months=6) - pd.DateOffset(days=1)
        max_window = len(prices.loc[t: test_end].index)
        print(test_end.date(), len(period_candidates))
        for i, (y, x) in enumerate(zip(period_candidates.y, period_candidates.x), 1):
            if i % 1000 == 0:
                msg = f'{i:5.0f} | {time() - start_iteration:7.1f} | {time() - start:10.1f}'
                print(msg)
            pair = prices.loc[t: T, [y, x]]
            pair['hedge_ratio'] = KFHedgeRatio(y=KFSmoother(prices.loc[t: T, y]),
                                               x=KFSmoother(prices.loc[t: T, x]))[:, 0]
            pair['spread'] = pair[y].add(pair[x].mul(pair.hedge_ratio))
            half_life = estimate_half_life(pair.spread.loc[t: test_end])                

            spread = pair.spread.rolling(window=min(2 * half_life, max_window))
            pair['z_score'] = pair.spread.sub(spread.mean()).div(spread.std())
            pairs.append(pair.loc[trading_start: T].assign(s1=y, s2=x, period=p, pair=i).drop([x, y], axis=1))

            half_lives.append([test_end, y, x, half_life])
    return pairs, half_lives