Ch22: Question about experience replay

ashuarsh · May 9, 2021, 5:47pm

Hi ,
I’m a beginner with RL and wanted to ask a silly question about the following aspect of the Trading Agent in the q learning notebook:

minibatch = map(np.array, zip(*sample(self.experience, self.batch_size)))

This minibatch being a random sample, shall produce sequences where the done array may have flipping entries like:
[0, 1, 1, 0, 1, …]

Since the done flag is used to determine the rewards
td_target = rewards + done * self.gamma * target_q

I’m trying to understand if this will affectlearning adversely, since the underlying problem has a sequential nature?

Thanks
Arsh