Skip to content

jinmang2/Awesome-Papers

Repository files navigation

Awesome-Papers

❓ Objective of jinmang2/Awesome-Papers Repo.

💡 To be AI Researcher, Artist and Good Person...!!

2021 Papers to Read

  • Learning to Learn without Gradient Descent by Gradient Descent
  • Massively Multitask Networks for Drug Discovery
  • One-Shot Imitation Learning
  • Few-Shot Autoregressive Density Estimation: Towards Learning to Learn Distributions
  • Meta-Learning for Low-Resource Neural Machine Translation
  • Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
  • SYNTHESIZER: Rethinking Self-Attention in Transformer Models
  • Fine-tune BERT for Extractive Summarization
  • ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations
  • Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation

2020 Reading Papers

  • 대충 쓱 본 논문은 기재하지 않음
  • 전체 논문을 다 읽고 나 스스로 다른 정보까지 찾아본 논문들만 기재
  • 예를 들어, word2vec같은 경우 개념은 알고 있지만 paper로 뜯어보진 않았기 때문에 기재하지 않음

Reinforcement Learning

  • Asynchronous Methods for Deep Reinforcement Learning
    • A3C, DeepMind & Montreal
  • Continuous Control With Deep Reinforcement Learning
    • DDPG, DQN+DPG, Replay Buffer, Soft-Update via Polyak Averaging, Ornstein Uhlenbeck process, White Gaussian Random process, DeepMind
  • Deterministic Policy Gradient Algorithms
    • DeepMind, Policy Gradient, Actor-Critic, Deterministic Policy
  • Policy Gradient Methods for Reinforcement Learning with Function Approximation
    • Compatible Function Approximation, Policy Gradient, Sutton
  • Approximately Optimal Approximate Reinforcement Learning
    • Kakade & Langford, Mixture Policy, Policy Improvement
  • True Region Policy Optimiation
    • Trust Region, Natural Policy, Kakade & Langford Thm, Policy Improvement, OpenAI
  • Proximal Policy Optimization Algorithms
    • OpenAI, Practical TRPO, Clip Gradient

Meta-Learning

  • Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
    • MAML, Optimization-Based Meta-Learning

NLP

  • Efficient Estimation of Word Representations in Vector Space
    • Word2Vec, CBOW, Skip-Gram
  • Distributed Representations of Words and Phrases and their Compositionality
    • Enhanced vec repr quality, SubSampling, Negative Sampling, Hierarchical Softmax
  • Deep contextualized word representations
    • ELMo, Feature-Based, Pre-ELMo + Linear Combination, SubWord Information by ConvNet
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    • Transformer's Encoder, MLM, NSP
  • Neural Machine Translatoin By Jointly Learning to Align and Translate
    • GRU, Seq2Seq with Attention, Bahnau Attention
  • Attention Is All You Need
    • Transformers, Self-Dot Product Attention, Seq2Seq
  • Advances in Pre-Training Distributed Word Representations
    • FastText
  • Enriching Word Vectors with Subword Information
    • FastText
  • Minimum Risk Training for Neural Machine Translation
    • MRT, NMT
  • Bag of Tricks for Efficient Text Classification
    • FastText for Text Classification, Fast!
  • A Fast and Accurate Dependency Parsing using Neural Networks
    • Parsing
  • MaltParser: A Data-Driven Parser-Generator for Dependency Parsing
    • Parsing
  • Incrementality in Deterministic Dependency Parsing
    • Parsing
  • A Neural Probabilistic Language Model
    • NPLM
  • Universal Language Model Fine-tuning for Text Classification
    • ULMFit, Fine-Tuning
  • The Natural Language Decathlon: Multitask Learning as Question Answering
    • MultiTask Learning, anti-curriculum learning
  • Phrase-Based & Neural Unsupervised Machine Translation
    • Initialization, ``, Back-Translation
  • A Structured Self-Attentive Sentence Embedding
    • Self-Attentive

Graph

  • Graph Attention Networks
    • GNN, Attention
  • MAGNET: Multi-Label Text Classfication using Attention-based Graph Neural Network
    • GAT, MLTC

Conversational AI

  • Memory Networks
  • End-To-End Memory Networks
  • Learning Through Dialogue Interactions By Asking Questions
  • Hierarchical Attention Networks for Document Classification
  • Conversational Decision-Making Model for Predicting the King's Decision in the Annals of the Joseon Dynasty

Fundamental

  • Decoupled Neural Interfaces using Synthetic Gradients
  • Decoupled Weight Decay Regularization
  • Neural Network Ensembles, Cross Validation, and Active Learning
  • Sharp Minima Can Generalize For Deep Nets
  • Long short-term memory
  • Highway Networks
  • Recurrent Highway Networks

ETC

  • LSTM-SAE Unsupervised Pre-training of a Deep LSTM-based Stacked Autoencoder for Multivariate Time Series Forecasting Problems
  • C3D Learning Spatiotemporal Features with 3D Convolutional Networks

🏢 NLP

Tokenization

  • BPE(Byte-Pair-Encoding); A New Algorithm for Data Compression (C-user journal 1994) paper
  • Adjust BPE on NMT; Neural Machine Translation of Rare Words with Subword Units (ACL 2016) paper
    • Compare between n-gram and byte-pair-encoding

Wordpiece

SentencePiece

Morphological

Word Vector Representation

  • NPLM; A Neural Probabilistic Language Model (jmlr 2003) paper
    • NPLM's Reference -> 문장에서 단어의 역할을 학습
      • Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks (NIPS 2000) paper
        • NN으로 고차원 이진 분산 표현을 실시하는 아이디어 제시
      • Extracting distributed representations of concepts and relations from positive and negative propositions (IEEE 2000) link
        • Hinton 교수의 연구가 성공적으로 적용된 사례
      • Natural Language Processing With Modular Pdp Networks and Distributed Lexicon (Cognitive Science 1991 July) link
        • Neural network를 LM에 적용시키려 한 사례
    • NPLM's Reference -> word sequence distribution의 statistical model을 학습
      • Sequential neural text compression (IEEE 1996) link
        • I Love Schmidhuber a lot :)
  • Word2Vec 2013a; Efficient Estimation of Word Representations in Vector Space (ICLR 2013) paper
    • Introduce Skip-Gram & CBOW
    • Google Team
  • Word2Vec 2013b; Distributed Representations of Words and Phrases and their Compositionality (NIPS 2013) paper
    • Propose train optimization method such as negative sampling
  • GloVe(Global Word Vectors); GloVe: Global Vectors for Word Representation (ACL 2014) paper
    • Stanford Univ.
    • Overcome Word2Vec and LSA
  • Swivel(Submatrix-Wise Vector Embedding Learner); Swivel: Improving Embeddings by Noticing What’s Missing () paper
  • FastText; Enriching Word Vectors with Subword Information (17.06.16, arxiv) paper

NLP Tasks

A large annotated corpus for learning natural language inference, Bowman et al., 2015 (EMNLP)

A board-coverage challenge corpus for sentence understanding through inference, Williams et al., 2018

SQuad: 100,000+ questions for machine comprehension of text, Rajpurkar et al., 2016

introduction to th conll-2003 shared task: language-independent named entity recognition, Tjong Kim Sang and De Meulder, 2003

Dependency Parsing

  • Incrementality in Deterministic Dependency Parsing (ACL, 2003) paper
  • MaltParser: A Data-Driven Parser-Generator for Dependency Parsing (LREC, 2005) paper
  • A Fast and Accurate Dependency Parser using Neural Network (EMNLP, 2014) paper

Neural Machine Translation

  • MRT(Minimum Risk Training); Minimum Risk Training for Neural Machine Translation (ACL 2016) paper

Text Classification

  • FastText for classification; Bag of Tricks for Efficient Text Classification (ACL 2017) link
  • UNMFit; Universal Language Model Fine-tuning for Text Classification (18.05.23, arxiv) paper

Question Answering

Stochastic Answer Networks for Machine Reading Comprehension https://arxiv.org/abs/1712.03556

Textual Entailment

Enhanced LSTM for Natural Language Inference https://arxiv.org/abs/1609.06038

Semantic Role Labeling

Deep Semantic Role Labeling: What Works and What’s Next https://www.aclweb.org/anthology/P17-1044/

Summarization

Extractive

  • BertSum; Fine-tune BERT for Extractive Summarization (19.03.25, arxiv) paper
  • BertSum-Full Paper; Text Summarization with Pretrained Encoders (19.08.22, arxiv) paper

Pre-trained NLP Architecture

  • Semi-supervised sequence learning (NIPS 2015) paper

Word Representations: A Simple and General Method for Semi-Supervised Learning

institute subtitle title journal published etc
AllenAI ELMo Deep contextualized word representations ACL 2018 paper
AllenAI LongFormer Longformer: The Long-Document Transformer arxiv 20.04.10 paper
GoogleAI BERT BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding ACL 2018 paper
GoogleAI ALBERT ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS ICLR 19.09.26 paper
GoogleAI T5 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer JMLR 19.10.23 paper
GoogleAI PEGASUS PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization ICML 2020 paper
GoogleAI ELECTRA ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS ICLR 2020 paper
DeepMind Compressive Transformers COMPRESSIVE TRANSFORMERS FOR LONG-RANGE SEQUENCE MODELLING arxiv 19.11.13 paper
UNC Chapel Hill LXMERT LXMERT: Learning Cross-Modality Encoder Representations from Transformers arxiv 19.08.20 paper
OpenAI GPT-1 Improving language understanding with unsupervised learning OpenAI 2018 paper
OpenAI GPT-2 Language Models are Unsupervised Multitask Learners OpenAI 2019 paper
OpenAI GPT-3 Language Models are Few-Shot Learners OpenAI 2020 paper
FAIR FastText Advances in Pre-Training Distributed Word Representations arxiv 17.12.26 paper
FAIR XLM Cross-lingual Language Model Pretraining arxiv 19.01.22 paper
FAIR FSMT Facebook FAIR's WMT19 News Translation Task Submission arxiv 19.07.15 paper
FAIR RoBERTa RoBERTa: A Robustly Optimized BERT Pretraining Approach arxiv 19.07.26 paper
FAIR MMBT Supervised Multimodal Bitransformers for Classifying Images and Text arxiv 19.09.06 paper
FAIR BART BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension arxiv 19.10.29 paper
FAIR CamemBERT CamemBERT: a Tasty French Language Model arxiv 19.11.10 paper
FAIR mBART Multilingual Denoising Pre-training for Neural Machine Translation arxiv 20.01.22 paper
FAIR RAG Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks arxiv 20.05.22 paper
Hugging Face DistilBERT DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter arxiv 19.10.02 paper
Microsoft Marian Marian: Cost-effective High-Quality Neural Machine Translation in C++ ACL 2018 paper
Microsoft MT-DNN Multi-Task Deep Neural Networks for Natural Language Understanding arxiv 19.05.30 paper
Microsoft LayoutLM LayoutLM: Pre-training of Text and Layout for Document Image Understanding arxiv 19.12.31 paper
NVIDIA MegatronLM Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism arxiv 19.09.17 paper
Univ. of Washington Grover-Mega Defending Against Neural Fake News arxiv 19.10.29 paper
Carnegie Mellon GoogleBrain Transformer-XL Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context arxiv 19.06.02 paper
Carnegie Mellon GoogleBrain XLNet XLNet: Generalized Autoregressive Pretraining for Language Understanding arxiv 19.06.19 paper
Carnegie Mellon GoogleBrain Funnel Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing arxiv 20.06.05 paper
Salesforce CTRL CTRL: A CONDITIONAL TRANSFORMER LANGUAGE MODEL FOR CONTROLLABLE GENERATION arxiv 19.09.11 paper
Anonymous authors MobileBERT MobileBERT: Task-Agnostic Compression of BERT by Progressive Knowledge Transfer ICLR 2020 paper

✨ Attention Mechanism

  • Bahdanau Attention; NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE (ICLR 2015) paper

  • Multi-Head Attention; Attention Is All You Needs (NIPS 2017) paper

  • Google Research-Synthesizer; SYNTHESIZER: Rethinking Self-Attention in Transformer Models (20.05.02, arxiv) paper

💆 Conversational AI

Memory-Based Research

  • Sumit Chopra, Jason Weston님 연구 추적
  • Memory Networks (14.10.15, arxiv; ICLR 2015) paper
  • End-To-End Memory Networks (NIPS 2015) paper
  • Learning Through Dialogue Interactions By Asking Questions (16.12.15, ICLR 2017) paper

Open-Domain

  • Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index, ACL
  • Kelvin Guu의 REALM, ACL
  • DPR; Dense Passage Retrieval for Open-Domain Question Answering (20.04.10) paper

🎨 Generative Model

GAN

  • Original GAN; Generative Adversarial Net (NIPS 2014) paper

🐵 Meta Learning

  • MAML; Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (ICML 2017) paper

Curiosity Algorithms

  • https://ai.googleblog.com/2018/10/curiosity-and-procrastination-in.html
  • Meta-leraning curiosity algorithms
  • Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML
  • Novelty search (Lehman & Stanley, 2008)
  • Buffers and Nearest Neighbors (Fu et al., 2017)
  • Generating goals (Srivastava et al., 2013; Kulkarni et al., 2016)
  • Learning progress (Oudeyer et al., 2007; Schmidhuber, 2008)
  • Generating diverse skills (Eysenbach et al., 2018)
  • Stochastic neural networks (Florensa et al., 2017; Fortunato et al., 2017)
  • Count-based exploration (Tang et al., 2017)
  • Object-based curiosity measures (Forestier & Oudeyer, 2016)
  • Bonus-based (Taiga et al., 2019)

Road to General Intelligence

  • AutoML Style Approach
    • Neural Architecture Search (NAS)
    • Hyperparameter optimization for deep networks
    • Auto-sklearn, Learning loss funtions to replace cross-entropy for training a fixed architecture on MNIST and CIFAR
  • Meta-learning with genetic programming, evolutionary computing
  • Programming Automation
    • Searching over mathematical operations within neural networks
    • Neural networks that learn programs
  • Modular Meta-Learning / Hierarchical Meta-Learning, Reinforcement Learning
  • Inspired from Cognitive/Brain Science (Attention, Curiosity, Common Sense, etc)
  • Agent57 (DeepMind)

🧠 Reinforcement Learning

  • Policy Gradient Theorem Policy Gradient Methods for Reinforcement Learning with Function Approximation (NIPS 2000) paper
  • Deterministic Policy Gradient Algorithm
  • Continuous Control with Deep Reinforcement Learning
  • Approximetely Optimal Approximate Reinforcement Learning
  • True Region Policy Optimization
  • Proximal Policy Optimization Algorithms

RL.start() 오늘의 논문 series

  • ACCELERATED METHODS FOR DEEP REINFORCEMENT LEARNING () paper
  • Implementation Matters In Deep RL () paper
  • CURL: Contrastive Unsupervised Representations for Reinforcement Learning () paper
  • Dream to Control: Learning Behaviors by Latent Imagination () paper

📈 Financial Mathematics & Engineer

🎨 Neuromorphic

🐈 Theoretical Deep Learning

  • Neural Network Ensembles, Cross Validation, and Active Learning (NIPS 1995) paper

Batch Normalization

Lipschitz gradient

Global Batch Normalization

Input Covariate Shift

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

How Does Batch Normalization Help Optimization?

Layer Normalization https://arxiv.org/abs/1607.06450

LeCun Initialization Efficient BackProp

Xavier initialization Understanding the difficulty of training deep feedforward neural networks

He Initialization Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Nesterov Optimizer (Optimization류 논문들)

weight_standardization

😍 Schmidhuber

Juergen Schmidhuber's Google Scholar

  • Long short-term memory (Neural Computation 1997) paper
  • LSTM: A Search Space Odyssey (IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017) paper
  • Highway Networks (15.05.03, arxiv) paper
    • Full Paper: Training Very Deep Networks link
  • Recurrent Highway Networks (ICML 2017) paper
  • Gradient flow in recurrent nets: the difficulty of learning long-term dependencies (IEEE 2001) paper paper
  • Bidirectional LSTM networks for improved phoneme classification and recognition (International Conference on Artificial Neural Networks 05.09.11)
  • Sequential neural text compression (IEEE 1996) paper
  • Neural expectation maximazation (NIPS 2017) paper
  • Accelerated Neural Evolution through Cooperatively Coevolved Synapses (JMLR 2008) paper
  • World Models (18.05.09, arxiv) paper

ETC

LSTM-SAE Unsupervised Pre-training of a Deep LSTM-based Stacked Autoencoder for Multivariate Time Series Forecasting Problems

C3D Learning Spatiotemporal Features with 3D Convolutional Networks

n-gram 관련 논문

  • Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer
  • Interpolated estimation of Markov source parameters from sparse data

Pointing the Unknown Words (몬트리홀 대학)

Seq2Seq Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Real-World Anomaly Detection in Surveillance Videos

self-attention on classification - A Structured Self-Attentive Sentence Embedding