i know that the this function is used to calculate the extrinsic reward, but when doing PPO to update the network, the advantage function only include the intrinsic reward(advantages = rollouts.returns[:-1] - rollouts.value_preds[:-1]),then how can the extrinsic reward influence the policy network and what does this function do