Optimism in Reinforcement Learning and Kullback-Leibler Divergence