diff --git a/docs/arxiv/empirical/main.tex b/docs/arxiv/empirical/main.tex index a9b2a0b..87d2dbf 100644 --- a/docs/arxiv/empirical/main.tex +++ b/docs/arxiv/empirical/main.tex @@ -171,6 +171,7 @@ \section{Introduction} \todo[inline]{Introduction to DDPG and recent advances in deep RL. } +[INSERT OPENING SENTENCE HERE] The current state-of-the-art in deep reinforcement learning is the Deep Deterministic Policy Gradient (DDPG) algorithm [\cite{lillicrap2015ddpg}] which expanded the deterministic policy gradient algorithm [\cite{silver2014dpg}] to continuous, high dimensional action spaces, with much success. The basic idea of DDPG is to use an actor-critic algorithm based on the DPG algorithm, where the critic $Q(s, a)$ is learned as in deep Q network learning [\cite{mnih2013dqn}], which is a model-free learning regime, and the actor $\mu(s)$ is updated based on sampling the policy gradient from [\cite{silver2014dpg}]. This algorithm had success comparable to planning based solvers on many physical control problems. \todo[inline]{Biological diffusion of dopamine in the brain$\implies$ error backpropagation is not biologically feasible.}