Chapter 22: Q-learning for trading

I get an errror when running this chunk of code:

trading_environment = gym.make('trading-v0', 
                               ticker='AAPL',
                               trading_days=trading_days,
                               trading_cost_bps=trading_cost_bps,
                               time_cost_bps=time_cost_bps)
trading_environment.seed(42)

In particular, the error I am getting is:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[86], line 1
----> 1 trading_environment = gym.make('trading-v0', 
      2                                ticker='AAPL',
      3                                trading_days=trading_days,
      4                                trading_cost_bps=trading_cost_bps,
      5                                time_cost_bps=time_cost_bps)
      6 trading_environment.seed(42)

File ~/anaconda3/lib/python3.11/site-packages/gym/envs/registration.py:640, in make(id, max_episode_steps, autoreset, apply_api_compatibility, disable_env_checker, **kwargs)
    637     render_mode = None
    639 try:
--> 640     env = env_creator(**_kwargs)
    641 except TypeError as e:
    642     if (
    643         str(e).find("got an unexpected keyword argument 'render_mode'") >= 0
    644         and apply_human_rendering
    645     ):

File ~/Library/CloudStorage/OneDrive-UniversidaddeLaRioja/CEMFI/Second_year/material_master_thesis/CODE/Machine-Learning-for-Algorithmic-Trading-Second-Edition-master/22_deep_reinforcement_learning/trading_env.py:242, in TradingEnvironment.__init__(self, trading_days, trading_cost_bps, time_cost_bps, ticker)
    238 self.simulator = TradingSimulator(steps=self.trading_days,
    239                                   trading_cost_bps=self.trading_cost_bps,
    240                                   time_cost_bps=self.time_cost_bps)
    241 self.action_space = spaces.Discrete(3)
...
    101     )
    103 # Capture the boundedness information before replacing np.inf with get_inf
    104 _low = np.full(shape, low, dtype=float) if is_float_integer(low) else low

ValueError: Box shape is inferred from low and high, expect their types to be np.ndarray, an integer or a float, actual type low: <class 'pandas.core.series.Series'>, high: <class 'pandas.core.series.Series'>

To solve it, I have tried modifying the metadata class in trading_env.py. In particular, I modified this line of code:

self.observation_space = spaces.Box(self.data_source.min_values,self.data_source.max_values)

by this one:

 self.observation_space = spaces.Box(np.array(self.data_source.min_values),np.array(self.data_source.max_values),dtype=np.float32)

But still the error persists.

I had a similar issue. I modified the line to
self.data_source.min_values.to_numpy()
instead of doing np.array(self.data_source.min_values)

I was able to move on after that.
Reported an issue on github this morning, including my modified notebook

running on Google Colab which brings other problems (version issues)

I consulted ChatGPT 4 and have been able to close the issue reported against 04_q_learning_for_trading in Chapter 22
I’ve updated github issue accordingly
I’m currently training the model for 1000 episdes using AAPL as the chapter indicates. I don’t know why there was a size mismatch iniitially, and it’s disappointing to have to essentially increase the size of targets every episode (can’t be an economical thing to do).
At some point i may investigate further, but happy it’s running for now.
Currently on episode 160 after 57 minutes. So 16 hours to run all 1000 episodes? Need to save the model sometimes, so it can be picked up on another day…

thanks, keep me updated if you figure out something else!