notes/papers.md at master · DongjunLee/notes

Papers

If you want to sync papers.

python scripts/sync_papers --sync_path {path_name}

Background knowledge

Gaussian Process
- Supervised, Regression
- note
Importance Sampling
- Approximate
- notes
Information Theory: A Tutorial Introduction (2018. 2)
- Shannon's Theory
- arXiv

Research Paper

Deep Learning (2015) Review
- nature, note

Adversarial Example

Explaining and Harnessing Adversarial Examples (2014. 12)
- FGSM (Fast Gradient Sign Method), Adversarial Training
- arXiv
The Limitations of Deep Learning in Adversarial Settings (2015. 11)
- JSMA (Jacobian-based Saliency Map Approach), Adversarial Training
- arXiv
Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization (2015. 11)
- Adversarial Training (generated adversarial examples), Proactive Defense
- arXiv
Practical Black-Box Attacks against Machine Learning (2016. 2)
- Black-Box (No Access to Gradient), Generate Synthetic
- arXiv
Adversarial Patch (2017. 12)
- Patch, White Box, Black Box
- arXiv, the_morning_paper

AI

Machine Theory of Mind (2018. 2)
- ToMnet, Meta-Learning, General Model, Agent
- arXiv

Cognitive

Building Machines That Learn and Think Like People (2016. 4)
- Human-Like, Learn, Think
- arXiv, note, the morning paper

Computer Vision

Network In Network (2013. 12)
- Conv 1x1, Bottleneck
- arXiv, note
Fractional Max-Pooling (2014. 12)
- Max-Pooling, Data Augmentation, Regularization
- arXiv, note
Deep Residual Learning for Image Recognition (2015. 12)
- Residual, ImageNet 2015
- arXiv, note
Spherical CNNs (2018. 1)
- Spherical Correlation, 3D Model, Fast Fourier Transform (FFT)
- arXiv, open_review
Taskonomy: Disentangling Task Transfer Learning (2018. 4)
- Taskonomy , Transfer Learning, Computational modeling of task relations
- arXiv
AutoAugment: Learning Augmentation Policies from Data (2018. 5)
- Search Algorithm (RL), Sub-Policy
- arXiv
Exploring Randomly Wired Neural Networks for Image Recognition (2019. 4)
- Randomly wired neural networks, Random Graph Models (ER, BA and WS)
- arXiv
MixMatch: A Holistic Approach to Semi-Supervised Learning (2019. 5)
- MixMatch, Semi-Supervised, Augumentation -> Label Guessing -> Average -> Sharpening
- arXiv

Framework & System

Snorkel: Rapid Training Data Creation with Weak Supervision (2017. 11)
- Labelling Functions, Data Programming
- arXiv, the_morning_blog
Training classifiers with natural language explanations (2018. 5)
- Babble Labble, Data Programming
- arXiv, the_morning_blog

Model

Dropout (2012, 2014)
- Regulaizer, Ensemble
- arXiv (2012), arXiv (2014), note
Regularization of Neural Networks using DropConnect (2013)
- Regulaizer, Ensemble
- paper, note, wanli_summary
Recurrent Neural Network Regularization (2014. 9)
- RNN, Dropout to Non-Recurrent Connections
- arXiv
Batch Normalization (2015. 2)
- Regulaizer, Accelerate Training, CNN
- arXiv, note
Training Very Deep Networks (2015. 7)
- Highway, LSTM-like
- arXiv, note
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks (2015. 12)
- Variational RNN, Dropout - RNN, Bayesian interpretation
- arXiv
Deep Networks with Stochastic Depth (2016. 3)
- Dropout, Ensenble, Beyond 1000 layers
- arXiv, note
Adaptive Computation Time for Recurrent Neural Networks (2016. 3)
- ACT, Dynamically, Logic Task
- arXiv
Layer Normalization (2016. 7)
- Regulaizer, Accelerate Training, RNN
- arXiv, note
Recurrent Highway Networks (2016. 7)
- RHN, Highway, Depth, RNN
- arXiv, note
Using Fast Weights to Attend to the Recent Past (2016. 10)
- Cognitive, Attention, Memory
- arXiv, note
Professor Forcing: A New Algorithm for Training Recurrent Networks (2016. 10)
- Professor Forcing, RNN, Inference Problem, Training with GAN
- arXiv, note
Equality of Opportunity in Supervised Learning (2016. 10)
- Equalized Odds, Demographic Parity, Bias
- arXiv, the_morning_paper
Categorical Reparameterization with Gumbel-Softmax (2016. 11)
- Gumbel-Softmax distribution , Reparameterization, Smooth relaxation
- arXiv, open_review
Understanding deep learning requires rethinking generalization (2016. 11)
- Generalization Error, Role of Regularization
- arXiv
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer (2017. 1)
- MoE Layer, Sparsely-Gated, Capacity, Google Brain
- arXiv, note
A simple neural network module for relational reasoning (2017. 6)
- Relational Reasoning, DeepMind
- arXiv, note, code
On Calibration of Modern Neural Networks (2017. 6)
- Confidence calibration, Maximum Calibration Error (MCE)
- arXiv
When is a Convolutional Filter Easy To Learn? (2017. 9)
- Conv + ReLU, Non-Gaussian Case, Polynomial Time
- arXiv, open_review
mixup: Beyond Empirical Risk Minimization (2017. 10)
- Data Augmentation, Vicinal Risk Minimization, Generalization
- arXiv, open_review
Measuring the tendency of CNNs to Learn Surface Statistical Regularities (2017. 11)
- not learn High Level Semantics, learn Surface Statistical Regularities
- arXiv, the_morning_paper
MentorNet: Regularizing Very Deep Neural Networks on Corrupted Labels (2017. 12)
- MentorNet - StudentNet, Curriculum Learning, Output is Weight
- arXiv
Deep Learning Scaling is Predictable, Empirically (2017. 12)
- Power-Law Exponents, Grow Training Sets
- arXiv, the_morning_paper
Sensitivity and Generalization in Neural Networks: an Empirical Study (2018. 2)
- Robustness, Data Perturbations, Survey
- arXiv, open_review
Can recurrent neural networks warp time? (2018. 2)
- RNN, Learnable Gate, Chrono Initialization
- open_review
Spectral Normalization for Generative Adversarial Networks (2018. 2)
- GAN, Training Discriminator, Constrain Lipschitz, Power Method
- open_review
On the importance of single directions for generalization (2018. 3)
- Importance, Confusiing Neurons, Selective Neuron, DeepMind
- arXiv, deepmind_blog
Group Normalization (2018. 3)
- Group Normalization (GN), Batch (BN), Layer (LN), Instance (IN), Independent Batch Size
- arXiv
Fast Decoding in Sequence Models using Discrete Latent Variables (2018. 3)
- Autoregressive, Latent Transformer, Discretization
- arXiv
Delayed Impact of Fair Machine Learning (2018. 3)
- Outcome Curve, Max Profit, Demographic Parity, Equal Opportunity
- arXiv, the_morning_paper, bair_blog
How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) (2018. 5)
- Smoothing Effect, BatchNorm’s Reparametrization
- arXiv
When Recurrent Models Don't Need To Be Recurrent (2018. 5)
- Approximate, Feed-Forward
- arXiv, bair_blog
Relational inductive biases, deep learning, and graph networks (2018, 6)
- Survey, Relation, Graph
- arXiv
Universal Transformers (2018. 7)
- Transformer, Weight Sharing, Adaptive Computation Time (ACT)
- arXiv, google_ai_blog
Identifying Generalization Properties in Neural Networks (2018. 9)
- Generalization, PAC-Bayes, Hessian, Perturbation
- arXiv, salesforce_blog
No Multiplication? No Floating Point? No Problem! Training Networks for Efficient Inference (2018. 9)
- Quantization, Store Multiplication Table, Memory/Power Resources
- arXiv

Natural Language Processing

Distributed Representations of Words and Phrases and their Compositionality (2013. 10)
- Word2Vec, CBOW, Skip-gram
- arXiv
GloVe: Global Vectors for Word Representation (2014)
- Word2Vec, GloVe, Co-Occurrence
- paper
Convolutional Neural Networks for Sentence Classification (2014. 8)
- CNN, Classfication
- arXiv, code
Neural Machine Translation by Jointly Learning to Align and Translate (2014. 9)
- Seq2Seq, Attention(Align), Translation
- arXiv, note, code
Text Understanding from Scratch (2015. 2)
- CNN, Character-level
- arXiv
Ask Me Anything: Dynamic Memory Networks for Natural Language Processing (2015. 6)
- Memory, QA, bAbi
- arXiv, code
Pointer Networks (2015. 6)
- Seq2Seq, Attention, Combinatorial
- arXiv, note
Skip-Thought Vectors (2015. 6)
- Sentence2Vec, Unsupervised
- arXiv, note
A Neural Conversational Model (2015. 6)
- Seq2Seq, Conversation
- arXiv
Teaching Machines to Read and Comprehend (2015. 6)
- Deepmind, Attention, QA
- arXiv, note
Effective Approaches to Attention-based Neural Machine Translation (2015. 8)
- Seq2Seq, Attention, Translation
- arXiv, note, code
Character-Aware Neural Language Models (2015. 8)
- CNN, Character-level
- arXiv
Neural Machine Translation of Rare Words with Subword Units (2015. 8)
- Out-Of-Vocabulary, Translation
- arXiv, note
A Diversity-Promoting Objective Function for Neural Conversation Models (2015. 10)
- Conversation, Objective
- arXiv, note
Multi-task Sequence to Sequence Learning (2015. 11)
- Multi-Task, Seq2Seq
- arXiv, note
Multilingual Language Processing From Bytes (2015. 12)
- Byte-to-Span, Multilingual, Seq2Seq
- arXiv, note
Strategies for Training Large Vocabulary Neural Language Models (2015. 12)
- Vocabulary, Softmax, NCE, Self Normalization
- arXiv, note
Incorporating Structural Alignment Biases into an Attentional Neural Translation Model (2016. 1)
- Seq2Seq, Attention with Structural Biases, Translation
- arXiv
Long Short-Term Memory-Networks for Machine Reading (2016. 1)
- LSTMN, Intra-Attention, RNN
- arXiv
Recurrent Memory Networks for Language Modeling (2016. 1)
- RMN, Memory Bank
- arXiv
Exploring the Limits of Language Modeling (2016. 2)
- Google Brain, Language Modeling
- arXiv, note
Swivel: Improving Embeddings by Noticing What's Missing (2016. 2)
- Word2Vec, Swivel , Co-Occurrence
- arXiv
Incorporating Copying Mechanism in Sequence-to-Sequence Learning (2016. 3)
- CopyNet, Seq2Seq
- arXiv, note
Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models (2016. 4)
- Translation, Hybrid NMT, Word-Char
- arXiv, note
Adversarial Training Methods for Semi-Supervised Text Classification (2016. 5)
- Regulaizer, Adversarial, Virtual Adversarial Training (Semi-Supervised)
- arXiv, note
SQuAD: 100,000+ Questions for Machine Comprehension of Text (2016. 6)
- DataSet, Reading Comprehension
- arXiv, note, dataset
Sequence-Level Knowledge Distillation (2016. 6)
- Distil, Teacher-Student
- arXiv, note
Attention-over-Attention Neural Networks for Reading Comprehension (2016. 7)
- Attention, Cloze-style, Reading Comprehension
- arXiv, note
Recurrent Neural Machine Translation (2016. 7)
- Translation, Attention (RNN)
- arXiv
An Actor-Critic Algorithm for Sequence Prediction (2016. 7)
- Seq2Seq, Actor-Critic, Objective
- arXiv, note
Pointer Sentinel Mixture Models (2016. 9)
- Language Modeling, Rare Word, Salesforce
- arXiv, note
Multiplicative LSTM for sequence modelling (2016. 10)
- mLSTM, Language Modeling, Character-Level
- arXiv
Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models (2016. 10)
- Diverse, DBS
- arXiv, note
Fully Character-Level Neural Machine Translation without Explicit Segmentation (2016. 10)
- Translation, CNN, Character-Level
- arXiv, note
Neural Machine Translation in Linear Time (2016. 10)
- ByteNet, WaveNet + PixelCNN, Translation, Character-Level
- arXiv, note
Bidirectional Attention Flow for Machine Comprehension (2016. 11)
- QA, BIDAF, Machine Comprehension
- arXiv, note, code
Dynamic Coattention Networks For Question Answering (2016. 11)
- QA, DCN, Coattention Encoder, Machine Comprehension
- arXiv
Dual Learning for Machine Translation (2016. 11)
- Translation, RL, Dual Learning (Two-agent)
- arXiv, note
Neural Machine Translation with Reconstruction (2016. 11)
- Translation, Auto-Encoder, Reconstruction
- arXiv, note
Quasi-Recurrent Neural Networks (2016. 11)
- QRNN, Parallelism, Conv + Pool + RNN
- arXiv, note
A recurrent neural network without chaos (2016. 12)
- RNN, CFN, Dynamic, Chaos
- arXiv
Comparative Study of CNN and RNN for Natural Language Processing (2017. 2)
- Systematic Comparison, CNN vs RNN
- arXiv
A Structured Self-attentive Sentence Embedding (2017. 3)
- Sentence Embedding, Self-Attention, 2-D Matrix
- arXiv, note
Dynamic Word Embeddings for Evolving Semantic Discovery (2017. 3)
- Word Embedding, Temporal, Alignment
- arXiv, the morning paper
Learning to Generate Reviews and Discovering Sentiment (2017. 4)
- Sentiment, Unsupervised , OpenAI
- arXiv
Ask the Right Questions: Active Question Reformulation with Reinforcement Learning (2017. 5)
- QA, Active Question Answering, RL, Agent (Reformulate, Aggregate)
- arXiv, open_review
Reinforced Mnemonic Reader for Machine Reading Comprehension (2017. 5)
- QA, Mnemonic (Syntatic, Lexical), RL, Machine Comprehension
- arXiv
Attention Is All You Need (2017. 6)
- Self-Attention, Seq2Seq (without RNN, CNN)
- arXiv, note, code
Depthwise Separable Convolutions for Neural Machine Translation (2017. 6)
- SliceNet, Super-Separable Conv, Depsewise + Conv 1x1
- arXiv, open_review, note
MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension (2017. 7)
- MEMEN, QA(MC), Embedding(skip-gram), Full-Orientation Matching
- arXiv
On the State of the Art of Evaluation in Neural Language Models (2017. 7)
- Standard LSTM, Regularisation, Hyperparemeter
- arXiv
Text Summarization Techniques: A Brief Survey (2017. 7)
- Summarization, Survey
- arXiv, note
Adversarial Examples for Evaluating Reading Comprehension Systems (2017. 7)
- Concatenative Adversaries(AddSent, AddOneSent), SQuAD
- arXiv
Learned in Translation: Contextualized Word Vectors (2017. 8)
- Word Embedding, CoVe, Context Vector
- arXiv
Simple and Effective Multi-Paragraph Reading Comprehension (2017. 10)
- Document-QA, Select Paragraph-Level, Confidence Based, AllenAI
- arXiv, note
Unsupervised Neural Machine Translation (2017. 10)
- Train with both direction (tandem), Shared Encoder, Denoising Auto-Encoder
- arXiv, open_review
Word Translation Without Parallel Data (2017. 10)
- Unsupervised, Multilingual Embedding, Parallel Dictionary Induction
- arXiv, open_review
Unsupervised Machine Translation Using Monolingual Corpora Only (2017. 11)
- Unsupervised, Adversarial, Monolingual Corpora
- arXiv, open_review
Neural Text Generation: A Practical Guide (2017. 11)
- Seq2Seq, Decoder Guide
- arXiv, note
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model (2017. 11)
- MoS (Mixture of Softmaxes), Softmax Bottleneck
- arXiv
Neural Speed Reading via Skim-RNN (2017. 11)
- Skim-RNN, Speed Reading, Big(Read)-Small(Skim), Dynamic
- arXiv, open_review
Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks (2017. 11)
- SCAN, Compositional, Mix-and-Match
- arXiv
The NarrativeQA Reading Comprehension Challenge (2017. 12)
- NarrativeQA, Dataset, DeepMind
- arXiv, dataset
Hierarchical Text Generation and Planning for Strategic Dialogue (2017. 12)
- End2End Strategic Dialogue, Latent Sentence Representations, Planning + RL
- arXiv
Recent Advances in Recurrent Neural Networks (2018. 1)
- RNN, Recent Advances, Review
- arXiv
Personalizing Dialogue Agents: I have a dog, do you have pets too? (2018. 1)
- Chit-chat, Profile Memory, Persona-Chat Dataset, ParlAI
- arXiv
Generating Wikipedia by Summarizing Long Sequences (2018. 1)
- Multi-Document Summarization, Extractive-Abstractive Stage, T-DMCA, WikiSum, Google Brain
- arXiv, note, open_review
MaskGAN: Better Text Generation via Filling in the______ (2018. 1)
- MaskGAN, Neural Text Generation, RL Approach
- arXiv, open_review, note
Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs (2018. 1)
- Contextual Decomposition (CD), Disambiguate interactions between Gates
- arXiv, open_review
Universal Language Model Fine-tuning for Text Classification (2018. 1)
- ULMFiT, Pre-trained, Transfer Learning
- arXiv
DeepType: Multilingual Entity Linking by Neural Type System Evolution (2018. 2)
- DeepType, Symbolic Information, Type System, Open AI
- arXiv, openai blog
Deep contextualized word representations (2018. 2)
- biLM, ELMo, Word Embedding, Contextualized, AllenAI
- arXiv, note
Ranking Sentences for Extractive Summarization with Reinforcement Learning (2018. 2)
- Document-Summarization, Cross-Entropy vs RL, Extractive
- arXiv
code2vec: Learning Distributed Representations of Code (2018. 3)
- code2vec, Code Embedding, Predicting method name
- arXiv
Universal Sentence Encoder (2018. 3)
- Transformer, Deep Averaging Network (DAN), Transfer
- arXiv
An efficient framework for learning sentence representations (2018. 3)
- Sentence Representation, True Context, Unsupervised
- arXiv, open_review
An Analysis of Neural Language Modeling at Multiple Scales (2018. 3)
- LSTM vs QRNN, Hyperparemeter, AWD-QRNN
- arXiv
Analyzing Uncertainty in Neural Machine Translation (2018. 3)
- Uncertainty, Beam Search Degradation, Copy Mode
- arXiv
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling (2018. 3)
- Temporal Convolutional Network (TCN), CNN vs RNN
- arXiv
Training Tips for the Transformer Model (2018. 4)
- Transformer, Hyperparameter, Multiple GPU
- arXiv
QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension (2018. 4)
- QA, Conv - Self-Attention, Backtranslation (Data Augmentation)
- arXiv, open_review, note
SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach (2018. 4)
- Top-K Subject Recognitio, Relation Classification
- arXiv
Delete, Retrieve, Generate: A Simple Approach to Sentiment and Style Transfer (2018. 4)
- Sentiment Transfer, Disentangle Attribute, Unsupervised
- arXiv
Parsing Tweets into Universal Dependencies (2018. 4)
- Universal Dependencies (UD), TWEEBANK v2
- arXiv
Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates (2018. 4)
- SR, Subword Sampling + Hyperparameter, Segmentation (BPE, Unigram)
- arXiv
Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension (2018. 4)
- PI-SQuAD, Challenge, Document Encoder, Scalability
- arXiv
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding (2018. 4)
- GLUE, Benchmark, Understanding
- arXiv, leaderboard
On the Practical Computational Power of Finite Precision RNNs for Language Recognition (2018. 5)
- Unbounded counting, IBFP-LSTM
- arXiv
Paper Abstract Writing through Editing Mechanism (2018. 5)
- Writing-editing Network, Attentive Revision Gate
- arXiv
A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings (2018. 5)
- Unsupervised initialization scheme, Robust self-leraning
- arXiv
Efficient and Robust Question Answering from Minimal Context over Documents (2018. 5)
- Sentence Selector, Oracle Sentence, Minimal Set of Sentences (SpeedUp)
- arXiv, note
Global-Locally Self-Attentive Dialogue State Tracker (2018. 5)
- GLAD, WoZ and DSTC2 Dataset
- arXiv
Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of Perfect Information (2018, 5)
- Dataset, EVPI, ACL 2018 Best Paper
- arXiv
Know What You Don't Know: Unanswerable Questions for SQuAD (2018, 6)
- SQuAD 2.0, Negative Example, ACL 2018 Best Paper
- arXiv, leaderboard
The Natural Language Decathlon: Multitask Learning as Question Answering (2018, 6)
- decaNLP, Multitask Question Answering Network (MQAN), Transfer Learning
- arXiv
GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations (2018, 6)
- Transfer Learning Framework, Structured Graphical Representations
- arXiv
Improving Language Understanding by Generative Pre-Training (2018, 6)
- Transformer, Generative Pre-Training, Discriminative Fine-Tuning
- paper, open_ai_blog
Finding Syntax in Human Encephalography with Beam Search (2018, 6)
- RNNG+beam search, ACL 2018 Best Paper
- arXiv
Let's do it "again": A First Computational Approach to Detecting Adverbial Presupposition Triggers (2018, 6)
- Task, Dataset, Weighted-Pooling (WP) ACL 2018 Best Paper
- arXiv
QuAC : Question Answering in Context (2018. 8)
- Information-Seeking dialog, Challenge, Without Evidence
- arXiv, leaderboard
CoQA: A Conversational Question Answering Challenge (2018. 8)
- Abstractive with Extractive Rationale, Challenge, Coreference and Pragmatic Reasoning
- arXiv, leaderboard
Contextual Parameter Generation for Universal Neural Machine Translation (2018. 8)
- Parameter Generation, Language Embedding, EMNLP 2018
- arXiv
Evaluating Theory of Mind in Question Answering (2018. 8)
- Dataset, Higher-order Beliefs, EMNLP 2018
- arXiv
Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text (2018. 9)
- GRAFT-Net, KB+Text Fusion, EMNLP 2018
- arXiv
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering (2018. 9)
- Dataset, Multi-hop, Sentence-level Supporting Fact, EMNLP 2018
- arXiv
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018. 10)
- BERT, Discriminative, Pre-trained, Transfer Learning, NAACL 2019 Best
- arXiv
Trellis Networks for Sequence Modeling (2018. 10)
- TrellisNet, Structural bridge between TCN and RNN, NAACL 2019
- arXiv
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge (2018. 11)
- CommonsenseQA, Dataset, Multiple-Choice, NAACL 2019 Best
- arXiv
Cross-lingual Language Model Pretraining (2019. 1)
- XLM , MLM + TLM, Cross-lingual Pre-trained, Low-Resource
- arXiv
Better Language Models and Their Implications (2019. 2)
- GPT-2, 1.5 Billion Parameters, Zero-Shot
- paper, blog
Parameter-Efficient Transfer Learning for NLP (2019. 2)
- Adapter tuning, Bottleneck, BERT
- arXiv
To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks (2019. 3)
- Fine-tuning vs Feature, BERT and ELMo, Empirically analyze
- arXiv
Linguistic Knowledge and Transferability of Contextual Representations (2019. 3)
- Analysis CWRs, LSTM, Transformer, Transferable, NAACL 2019
- arXiv
ERNIE: Enhanced Representation through Knowledge Integration (2019. 4)
- ERNIE, Masking Strategies, Dialog Language Model, Pre-trained, Transfer Learning
- arXiv
CNM: An Interpretable Complex-valued Network for Matching (2019. 4)
- CNM, Quantum Physics ,Interpretable, NAACL 2019 Best
- arXiv
Unsupervised Recurrent Neural Network Grammars (2019. 4)
- RNNG, Syntax Tree ,Variational Inference
- arXiv
The Curious Case of Neural Text Degeneration (2019. 4)
- Nucleus Sampling, Decoding Method ,Generation
- arXiv
Unified Language Model Pre-training for Natural Language Understanding and Generation (2019. 5)
- UniLM, Uni + Bi + S2S ,Generation
- arXiv
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems (2019. 5)
- SuperGLUE, Benchmark, Understanding
- arXiv, leaderboard
SpanBERT: Improving Pre-training by Representing and Predicting Spans (2019. 7)
- SpanBERT, Span Boundary Objective (SBO), Pre-train, Transformer
- arXiv
RoBERTa: A Robustly Optimized BERT Pretraining Approach (2019. 7)
- RoBERTa, Data-BatchSize, Pre-train, Transformer
- arXiv
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding (2019. 7)
- ERNIE, Continual Pre-training, Word-Struct-Semantic, Transformer
- arXiv
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding (2019. 8)
- StructBERT (ALICE), Language Structure, Pre-train, Transformer
- arXiv

One-Shot/Few-Shot/Meta Learing

Matching Networks for One Shot Learning (2016. 6)
- Matching Nets, Non-Parametric, DeepMind
- arXiv, the morning paper
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (2017. 3)
- MAML, Meta-Learning, Few-Shot, BAIR
- arXiv, bair_blog
SMASH: One-Shot Model Architecture Search through HyperNetworks (2017. 8)
- SMASH, HyperNet, Prior Knowledge
- arXiv, open_review
Reptile: a Scalable Metalearning Algorithm (2018. 3)
- Reptile, Meta-Learning, Few-Shot, OpenAI
- arXiv, openai_blog

Optimization

Understanding the difficulty of training deep feedforward neural networks (2010)
- Weight Initialization (Xavier)
- paper, note
On the difficulty of training Recurrent Neural Networks (2012. 11)
- Gradient Clipping, RNN
- arXiv
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (2015. 2)
- PReLU, Weight Initialization (He)
- arXiv, note
A Simple Way to Initialize Recurrent Networks of Rectified Linear Units (2015. 4)
- Weight Initialization, RNN, Identity Matrix
- arXiv
Cyclical Learning Rates for Training Neural Networks (2015. 6)
- CLR, Triangular, ExpRange, Longtherm Benefit
- arXiv
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima (2016. 9)
- Generalization, Sharpness of Minima
- arXiv
Neural Optimizer Search with Reinforcement Learning (2017. 9)
- Neural Optimizer Search (NOS), PowerSign, AddSign
- arXiv
On the Convergence of Adam and Beyond (2018. 2)
- AMSGrad, Convex optimization
- open_review
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost (2018. 4)
- Adafactor, Adaptive Method, Update Clipping
- arXiv
Revisiting Small Batch Training for Deep Neural Networks (2018. 4)
- Generalization Performance, Training Stability
- arXiv
Reconciling modern machine learning and the bias-variance trade-off (2018. 12)
- Double Descent Risk Curve, Highly Complex Models
- arXiv

Reinforcement Learning

Progressive Neural Networks (2016. 6)
- ProgNN , Incorporate Prior Knowledge
- arXiv, the morning paper
Neural Architecture Search with Reinforcement Learning (2016. 11)
- NAS, Google AutoML, Google Brain
- arXiv
Third-Person Imitation Learning (2017. 3)
- Imitation Learning, Unsupervised (Third-Person), GAN + Domain Confusion
- arXiv
Noisy Networks for Exploration (2017. 6)
- NoisyNet, Exploration, DeepMind
- arXiv, note
Efficient Neural Architecture Search via Parameter Sharing (2018. 2)
- ENAS, Google AutoML, Google Brain
- arXiv
Learning by Playing - Solving Sparse Reward Tasks from Scratch (2018. 2)
- Scratch with minimal prior knowledge, Scheduled Auxiliary Control (SAC-X), DeepMind
- arXiv, deep_mind
Investigating Human Priors for Playing Video Games (2018. 2)
- prior knowledge, key factor
- open_review
World Models (2018. 3)
- Generative + RL, VAE (V), MDN-RNN (M), Controller (C)
- arXiv
Unsupervised Predictive Memory in a Goal-Directed Agent (2018. 3)
- MERLIN, Memory + RL + Inference, Partial Observability
- arXiv, google_ai_blog

Transfer Learning

Unsupervised & Generative

Auto-Encoding Variational Bayes (2013. 12)
- VAE, Variational, Approximate
- arXiv, note, code
Generative Adversarial Networks (2014. 6)
- GAN, Adversarial, Minimax
- arXiv, note, code
Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data (2016. 5)
- DVBF, Variational Inference, SVGB
- arXiv, open_review
SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient (2016. 9)
- Seq2Seq, GAN, RL
- arXiv, note
Structured Inference Networks for Nonlinear State Space Models (2016. 9)
- Structured Variational Approximation, SVGB
- arXiv
beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework (2016. 11)
- Beta-VAE, Disentangled
- open_review
A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning (2017. 10)
- Kalman VAE, LGSSM
- arXiv
Self-Attention Generative Adversarial Networks (2018. 5)
- SAGAN, Attention-Driven, Spectral Normalization
- arXiv
Unsupervised Data Augmentation (2019. 4)
- UDA, TSA Schedule, Semi-Supervised
- arXiv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Papers

Category

Description

Background knowledge

Research Paper

Adversarial Example

AI

Cognitive

Computer Vision

Framework & System

Model

Natural Language Processing

One-Shot/Few-Shot/Meta Learing

Optimization

Reinforcement Learning

Transfer Learning

Unsupervised & Generative

FilesExpand file tree

papers.md

Latest commit

History

papers.md

File metadata and controls

Papers

Category

Description

Background knowledge

Research Paper

Adversarial Example

AI

Cognitive

Computer Vision

Framework & System

Model

Natural Language Processing

One-Shot/Few-Shot/Meta Learing

Optimization

Reinforcement Learning

Transfer Learning

Unsupervised & Generative