From 372065d390f3d889134c37b608c77cca6294332e Mon Sep 17 00:00:00 2001 From: James Bartlett Date: Thu, 22 Sep 2016 11:09:54 -0700 Subject: [PATCH] First draft of intro paragraph on DDPG and DQNs --- docs/arxiv/empirical/main.tex | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/arxiv/empirical/main.tex b/docs/arxiv/empirical/main.tex index a9b2a0b..87d2dbf 100644 --- a/docs/arxiv/empirical/main.tex +++ b/docs/arxiv/empirical/main.tex @@ -171,6 +171,7 @@ \section{Introduction} \todo[inline]{Introduction to DDPG and recent advances in deep RL. } +[INSERT OPENING SENTENCE HERE] The current state-of-the-art in deep reinforcement learning is the Deep Deterministic Policy Gradient (DDPG) algorithm [\cite{lillicrap2015ddpg}] which expanded the deterministic policy gradient algorithm [\cite{silver2014dpg}] to continuous, high dimensional action spaces, with much success. The basic idea of DDPG is to use an actor-critic algorithm based on the DPG algorithm, where the critic $Q(s, a)$ is learned as in deep Q network learning [\cite{mnih2013dqn}], which is a model-free learning regime, and the actor $\mu(s)$ is updated based on sampling the policy gradient from [\cite{silver2014dpg}]. This algorithm had success comparable to planning based solvers on many physical control problems. \todo[inline]{Biological diffusion of dopamine in the brain$\implies$ error backpropagation is not biologically feasible.}