Hi,
I am not sure but I think there is an error for the example on page 681. The episode is BBCCCCBAT -> Fail so then we have S0=B, S1=B, ..., S8=T. So we have rewards R1, R2, ..., R8. So G0 = R1 + g*R2 + .. + g^{7} * R8 and R8=+1. So are we off by 1 in time?
Hi,
I am not sure but I think there is an error for the example on page 681. The episode is BBCCCCBAT -> Fail so then we have S0=B, S1=B, ..., S8=T. So we have rewards R1, R2, ..., R8. So G0 = R1 + g*R2 + .. + g^{7} * R8 and R8=+1. So are we off by 1 in time?