From 372065d390f3d889134c37b608c77cca6294332e Mon Sep 17 00:00:00 2001
From: James Bartlett <jmabartlett2@gmail.com>
Date: Thu, 22 Sep 2016 11:09:54 -0700
Subject: [PATCH] First draft of intro paragraph on DDPG and DQNs

---
 docs/arxiv/empirical/main.tex | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/arxiv/empirical/main.tex b/docs/arxiv/empirical/main.tex
index a9b2a0b..87d2dbf 100644
--- a/docs/arxiv/empirical/main.tex
+++ b/docs/arxiv/empirical/main.tex
@@ -171,6 +171,7 @@
 
 \section{Introduction}
 \todo[inline]{Introduction to DDPG and recent advances in deep RL. }
+[INSERT OPENING SENTENCE HERE] The current state-of-the-art in deep reinforcement learning is the Deep Deterministic Policy Gradient (DDPG) algorithm [\cite{lillicrap2015ddpg}] which expanded the deterministic policy gradient algorithm [\cite{silver2014dpg}] to continuous, high dimensional action spaces, with much success. The basic idea of DDPG is to use an actor-critic algorithm based on the DPG algorithm, where the critic $Q(s, a)$ is learned as in deep Q network learning [\cite{mnih2013dqn}], which is a model-free learning regime, and the actor $\mu(s)$ is updated based on sampling the policy gradient from [\cite{silver2014dpg}]. This algorithm had success comparable to planning based solvers on many physical control problems. 
 
 \todo[inline]{Biological diffusion of dopamine in the brain$\implies$ error backpropagation is not biologically feasible.}