Downloads 2020
Number of events: 710
- 2020 Vision: Reimagining the Default Settings of Technology & Society
- A Baseline for Few-Shot Image Classification
- Abductive Commonsense Reasoning
- Abstract Diagrammatic Reasoning with Multiplex Graph Networks
- Accelerating SGD with momentum for over-parameterized learning
- A Closer Look at Deep Policy Gradients
- A closer look at the approximation capabilities of neural networks
- A Closer Look at the Optimization Landscapes of Generative Adversarial Networks
- A Constructive Prediction of the Generalization Error Across Scales
- A critical analysis of self-supervision, or what we can learn from a single image
- Action Semantics Network: Considering the Effects of Actions in Multiagent Systems
- Actor-Critic Provably Finds Nash Equilibria of Linear-Quadratic Mean-Field Games
- Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation
- Adaptive Structural Fingerprints for Graph Attention Networks
- Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks
- Adjustable Real-time Style Transfer
- AdvectiveNet: An Eulerian-Lagrangian Fluidic Reservoir for Point Cloud Processing
- Adversarial AutoAugment
- Adversarial Lipschitz Regularization
- Adversarially Robust Representations with Smooth Encoders
- Adversarially robust transfer learning
- Adversarial Policies: Attacking Deep Reinforcement Learning
- Adversarial Training and Provable Defenses: Bridging the Gap
- Ae-Ot: A New Generative Model Based on Extended Semi-Discrete Optimal Transport
- A Fair Comparison of Graph Neural Networks for Graph Classification
- A Framework for Robustness Certification of Smoothed Classifiers Using F-Divergences
- AfricaNLP - Unlocking Local Languages
- A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case
- A Generalized Training Approach for Multiagent Learning
- AI + Africa = Global Innovation
- AI for Affordable Healthcare
- AI for Earth Sciences
- AI for Overcoming Global Disparities in Cancer Care
- AI Systems That Can See And Talk
- A Latent Morphology Model for Open-Vocabulary Neural Machine Translation
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
- A Learning-based Iterative Method for Solving Vehicle Routing Problems
- A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms
- AMRL: Aggregated Memory For Reinforcement Learning
- A Mutual Information Maximization Perspective of Language Representation Learning
- Analysis of Video Feature Learning in Two-Stream CNNs on the Example of Zebrafish Swim Bout Classification
- And the Bit Goes Down: Revisiting the Quantization of Neural Networks
- A Neural Dirichlet Process Mixture Model for Task-Free Continual Learning
- An Exponential Learning Rate Schedule for Deep Learning
- An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality
- A Probabilistic Formulation of Unsupervised Text Style Transfer
- Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction
- Are Transformers universal approximators of sequence-to-sequence functions?
- A Signal Propagation Perspective for Pruning Neural Networks at Initialization
- AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures
- A Stochastic Derivative Free Optimization Method with Momentum
- Asymptotics of Wide Networks from Feynman Diagrams
- A Target-Agnostic Attack on Deep Models: Exploiting Security Vulnerabilities of Transfer Learning
- A Theoretical Analysis of the Number of Shots in Few-Shot Learning
- A Theory of Usable Information under Computational Constraints
- AtomNAS: Fine-Grained End-to-End Neural Architecture Search
- At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?
- Augmenting Genetic Algorithms with Deep Neural Networks for Exploring the Chemical Space
- Augmenting Non-Collaborative Dialog Systems with Explicit Semantic and Strategic Dialog History
- AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty
- Automated curriculum generation through setter-solver interactions
- Automated Relational Meta-learning
- Automatically Discovering and Learning New Visual Categories with Ranking Statistics
- AutoQ: Automated Kernel-Wise Neural Network Quantization
- BackPACK: Packing more into Backprop
- BatchEnsemble: an Alternative Approach to Efficient Ensemble and Lifelong Learning
- Batch-shaping for learning conditional channel gated networks
- Bayesian Meta Sampling for Fast Uncertainty Adaptation
- BayesOpt Adversarial Attack
- Behaviour Suite for Reinforcement Learning
- BERTScore: Evaluating Text Generation with BERT
- Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks
- Beyond “tabula rasa” in reinforcement learning: agents that remember, adapt, and generalize
- BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations
- Biologically inspired sleep algorithm for increased generalization and adversarial robustness in deep neural networks
- Black-Box Adversarial Attack with Transferable Model-based Embedding
- Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning
- BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget
- Bounds on Over-Parameterization for Guaranteed Existence of Descent Paths in Shallow ReLU Networks
- Breaking Certified Defenses: Semantic Adversarial Examples With Spoofed Robustness Certificates
- Bridging AI and Cognitive Science (BAICS)
- Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness
- B-Spline CNNs on Lie groups
- Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints
- Building Deep Equivariant Capsule Networks
- Can gradient clipping mitigate label noise?
- Capsules with Inverted Dot-Product Attention Routing
- CAQL: Continuous Action Q-Learning
- CATER: A diagnostic dataset for Compositional Actions & TEmporal Reasoning
- Causal Discovery with Reinforcement Learning
- Certified Defenses for Adversarial Patches
- Certified Robustness for Top-k Predictions against Adversarial Perturbations via Randomized Smoothing
- Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation
- Classification-Based Anomaly Detection for General Data
- CLEVRER: Collision Events for Video Representation and Reasoning
- CLN2INV: Learning Loop Invariants with Continuous Logic Networks
- CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning
- Co-Attentive Equivariant Neural Networks: Focusing Equivariance On Transformations Co-Occurring in Data
- Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization
- Combining Q-Learning and Search with Amortized Value Estimates
- Comparing Rewinding and Fine-tuning in Neural Network Pruning
- Composing Task-Agnostic Policies with Deep Reinforcement Learning
- Compositional Language Continual Learning
- Compositional languages emerge in a neural iterated learning model
- Composition-based Multi-Relational Graph Convolutional Networks
- Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network
- Compressive Transformers for Long-Range Sequence Modelling
- Computation Reallocation for Object Detection
- Computer Vision for Agriculture (CV4A)
- Conditional Learning of Fair Representations
- Conservative Uncertainty Estimation By Fitting Prior Networks
- Consistency Regularization for Generative Adversarial Networks
- Continual Learning with Adaptive Weights (CLAW)
- Continual Learning with Bayesian Neural Networks for Non-Stationary Data
- Continual learning with hypernetworks
- Contrastive Learning of Structured World Models
- Contrastive Representation Distillation
- Controlling generative models with continuous factors of variations
- Convergence of Gradient Methods on Bilinear Zero-Sum Games
- Convolutional Conditional Neural Processes
- CoPhy: Counterfactual Learning of Physical Dynamics
- Counterfactuals uncover the modular structure of deep generative models
- Critical initialisation in continuous approximations of binary neural networks
- Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation
- Cross-Lingual Ability of Multilingual BERT: An Empirical Study
- Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework
- Curriculum Loss: Robust Learning and Generalization against Label Corruption
- Curvature Graph Network
- Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning
- Data-dependent Gaussian Prior Objective for Language Generation
- Data-Independent Neural Pruning via Coresets
- DBA: Distributed Backdoor Attacks against Federated Learning
- DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames
- DDSP: Differentiable Digital Signal Processing
- Decentralized Deep Learning with Arbitrary Communication Compression
- Decoding As Dynamic Programming For Recurrent Autoregressive Models
- Decoupling Representation and Classifier for Long-Tailed Recognition
- Deep 3D Pan via Local adaptive "t-shaped" convolutions with global and local adaptive dilations
- Deep Audio Priors Emerge From Harmonic Convolutional Networks
- Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds
- Deep Double Descent: Where Bigger Models and More Data Hurt
- Deep Graph Matching Consensus
- DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures
- Deep Imitative Models for Flexible Inference, Planning, and Control
- Deep Learning For Symbolic Mathematics
- Deep Learning of Determinantal Point Processes via Proper Spectral Sub-gradient
- Deep Network Classification by Scattering and Homotopy Dictionary Learning
- Deep neuroethology of a virtual rodent
- Deep Orientation Uncertainty Learning based on a Bingham Loss
- Deep probabilistic subsampling for task-adaptive compressed sensing
- Deep Semi-Supervised Anomaly Detection
- DeepSphere: a graph-based spherical CNN
- Deep Symbolic Superoptimization Without Human Knowledge
- DeepV2D: Video to Depth with Differentiable Structure from Motion
- Defending Against Physically Realizable Attacks on Image Classification
- DeFINE: Deep Factorized Input Token Embeddings for Neural Sequence Modeling
- Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation
- Demystifying Inter-Class Disentanglement
- Denoising and Regularization via Exploiting the Structural Bias of Convolutional Generators
- Depth-Adaptive Transformer
- Depth-Width Trade-offs for ReLU Networks via Sharkovsky's Theorem
- Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions
- Detecting Extrapolation with Local Ensembles
- Difference-Seeking Generative Adversarial Network--Unseen Sample Generation
- Differentiable learning of numerical rules in knowledge graphs
- Differentiable Reasoning over a Virtual Knowledge Base
- Differentially Private Meta-Learning
- Differentiation of Blackbox Combinatorial Solvers
- DiffTaichi: Differentiable Programming for Physical Simulation
- Directional Message Passing for Molecular Graphs
- Disagreement-Regularized Imitation Learning
- Discovering Motor Programs by Recomposing Demonstrations
- Discrepancy Ratio: Evaluating Model Performance When Even Experts Disagree on the Truth
- Discriminative Particle Filter Reinforcement Learning for Complex Partial observations
- Disentanglement by Nonlinear ICA with General Incompressible-flow Networks (GIN)
- Disentangling Factors of Variations Using Few Labels
- Disentangling neural mechanisms for perceptual grouping
- Distance-Based Learning from Errors for Confidence Calibration
- Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication
- Distributionally Robust Neural Networks
- Diverse Trajectory Forecasting with Determinantal Point Processes
- DivideMix: Learning with Noisy Labels as Semi-supervised Learning
- Doing for Our Robots What Nature Did For Us
- Domain Adaptive Multibranch Networks
- Don't Use Large Mini-batches, Use Local SGD
- Double Neural Counterfactual Regret Minimization
- Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
- Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks
- Dream to Control: Learning Behaviors by Latent Imagination
- DropEdge: Towards Deep Graph Convolutional Networks on Node Classification
- Duration-of-Stay Storage Assignment under Uncertainty
- Dynamical Distance Learning for Semi-Supervised and Unsupervised Skill Discovery
- Dynamically Pruned Message Passing Networks for Large-scale Knowledge Graph Reasoning
- Dynamic Model Pruning with Feedback
- Dynamics-Aware Embeddings
- Dynamics-Aware Unsupervised Skill Discovery
- Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers
- Dynamic Time Lag Regression: Predicting What & When
- Economy Statistical Recurrent Units For Inferring Nonlinear Granger Causality
- Editable Neural Networks
- Effect of Activation Functions on the Training of Overparametrized Neural Nets
- Efficient and Information-Preserving Future Frame Prediction and Beyond
- Efficient Probabilistic Logic Reasoning with Graph Neural Networks
- Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
- Emergence of functional and structural properties of the head direction system by optimization of recurrent neural networks
- Emergent Tool Use From Multi-Agent Autocurricula
- EMPIR: Ensembles of Mixed Precision Deep Networks for Increased Robustness Against Adversarial Attacks
- Empirical Bayes Transductive Meta-Learning with Synthetic Gradients
- Empirical Studies on the Properties of Linear Regions in Deep Neural Networks
- Enabling Deep Spiking Neural Networks with Hybrid Conversion and Spike Timing Dependent Backpropagation
- Encoding word order in complex embeddings
- End to End Trainable Active Contours via Differentiable Rendering
- Energy-based models for atomic-resolution protein conformations
- Enhancing Adversarial Defense by k-Winners-Take-All
- Enhancing Transformation-Based Defenses Against Adversarial Attacks with a Distribution Classifier
- Ensemble Distribution Distillation
- Environmental drivers of systematicity and generalization in a situated agent
- Episodic Reinforcement Learning with Associative Memory
- Escaping Saddle Points Faster with Stochastic Momentum
- ES-MAML: Simple Hessian-Free Meta Learning
- Estimating counterfactual treatment outcomes over time through adversarially balanced representations
- Estimating Gradients for Discrete Random Variables by Sampling without Replacement
- Evaluating The Search Phase of Neural Architecture Search
- Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning
- Expected Information Maximization: Using the I-Projection for Mixture Density Estimation
- Explain Your Move: Understanding Agent Actions Using Salient and Relevant Feature Attribution
- Explanation by Progressive Exaggeration
- Exploration in Reinforcement Learning with Deep Covering Options
- Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep Reinforcement Learning
- Exploring Model-based Planning with Policy Networks
- Extreme Classification via Adversarial Softmax Approximation
- Extreme Tensoring for Low-Memory Preconditioning
- Fair Resource Allocation in Federated Learning
- Fantastic Generalization Measures and Where to Find Them
- FasterSeg: Searching for Faster Real-time Semantic Segmentation
- Fast is better than free: Revisiting adversarial training
- Fast Neural Network Adaptation via Parameter Remapping and Architecture Search
- Fast Task Inference with Variational Intrinsic Successor Features
- Feature Interaction Interpretability: A Case for Explaining Ad-Recommendation Systems via Neural Interaction Detection
- Federated Adversarial Domain Adaptation
- Federated Learning with Matched Averaging
- Few-Shot Learning on Graphs via Super-Classes Based on Graph Spectral Measures
- Few-shot Text Classification with Distributional Signatures
- Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents
- Finite Depth and Width Corrections to the Neural Tangent Kernel
- Fooling Detection Alone is Not Enough: Adversarial Attack against Multiple Object Tracking
- Four Things Everyone Should Know to Improve Batch Normalization
- FreeLB: Enhanced Adversarial Training for Natural Language Understanding
- Frequency-based Search-control in Dyna
- From Inference to Generation: End-to-end Fully Self-supervised Generation of Human Face from Speech
- From Variational to Deterministic Autoencoders
- FSNet: Compression of Deep Convolutional Neural Networks by Filter Summary
- FSPool: Learning Set Representations with Featurewise Sort Pooling
- Functional Regularisation for Continual Learning with Gaussian Processes
- Functional vs. parametric equivalence of ReLU networks
- Fundamental Science in the era of AI
- Gap-Aware Mitigation of Gradient Staleness
- GAT: Generative Adversarial Training for Adversarial Example Detection and Classification
- GenDICE: Generalized Offline Estimation of Stationary Values
- Generalization bounds for deep convolutional neural networks
- Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint
- Generalization through Memorization: Nearest Neighbor Language Models
- Generalized Convolutional Forest Networks for Domain Generalization and Visual Recognition
- Generative Models for Effective ML on Private, Decentralized Datasets
- Generative Ratio Matching Networks
- GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations
- Geometric Analysis of Nonconvex Optimization Landscapes for Overcomplete Learning
- Geometric Insights into the Convergence of Nonlinear TD Learning
- Geom-GCN: Geometric Graph Convolutional Networks
- GLAD: Learning Sparse Graph Recovery
- Global Relational Models of Source Code
- Gradient $\ell_1$ Regularization for Quantization Robustness
- Gradient-Based Neural DAG Learning
- Gradient Descent Maximizes the Margin of Homogeneous Neural Networks
- Gradientless Descent: High-Dimensional Zeroth-Order Optimization
- Gradients as Features for Deep Representation Learning
- GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation
- Graph Constrained Reinforcement Learning for Natural Language Action Spaces
- Graph Convolutional Reinforcement Learning
- Graph inference learning for semi-supervised classification
- Graph Neural Networks Exponentially Lose Expressive Power for Node Classification
- GraphSAINT: Graph Sampling Based Inductive Learning Method
- GraphZoom: A Multi-level Spectral Approach for Accurate and Scalable Graph Embedding
- Guiding Program Synthesis by Learning to Generate Examples
- Hamiltonian Generative Networks
- Harnessing Structures for Value-Based Planning and Reinforcement Learning
- Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks
- Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation
- Higher-Order Function Networks for Learning Composable 3D Object Representations
- High Fidelity Speech Synthesis with Adversarial Networks
- HiLLoC: lossless image compression with hierarchical latent variable models
- Hoppity: Learning Graph Transformations to Detect and Fix Bugs in Programs
- How much Position Information Do Convolutional Neural Networks Encode?
- How to 0wn the NAS in Your Spare Time
- Hypermodels for Exploration
- Hyper-SAGNN: a self-attention based graph neural network for hypergraphs
- I Am Going MAD: Maximum Discrepancy Competition for Comparing Classifiers Adaptively
- Identifying through Flows for Recovering Latent Representations
- Identity Crisis: Memorization and Generalization Under Extreme Overparameterization
- Image-guided Neural Object Rendering
- Imitation Learning via Off-Policy Distribution Matching
- IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks
- Implementation Matters in Deep RL: A Case Study on PPO and TRPO
- Implementing Inductive bias for different navigation tasks through diverse RNN attrractors
- Implicit Bias of Gradient Descent based Adversarial Training on Separable Data
- Improved memory in recurrent neural networks with sequential non-normal dynamics
- Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin
- Improving Adversarial Robustness Requires Revisiting Misclassified Examples
- Improving Generalization in Meta Reinforcement Learning using Learned Objectives
- Improving Neural Language Generation with Spectrum Control
- Incorporating BERT into Neural Machine Translation
- Inductive and Unsupervised Representation Learning on Graph Structured Objects
- Inductive Matrix Completion Based on Graph Neural Networks
- Inductive representation learning on temporal graphs
- Infinite-Horizon Differentiable Model Predictive Control
- Infinite-horizon Off-Policy Policy Evaluation with Multiple Behavior Policies
- Influence-Based Multi-Agent Exploration
- InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization
- Information Geometry of Orthogonal Initializations and Training
- Input Complexity and Out-of-distribution Detection with Likelihood-based Generative Models
- In Search for a SAT-friendly Binarized Neural Network Architecture
- Integration of Deep Neural Models and Differential Equations
- Intensity-Free Learning of Temporal Point Processes
- Interpretable Complex-Valued Neural Networks for Privacy Protection
- Intriguing Properties of Adversarial Training at Scale
- Intrinsically Motivated Discovery of Diverse Patterns in Self-Organizing Systems
- Intrinsic Motivation for Encouraging Synergistic Behavior
- Invertible Models and Normalizing Flows
- Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?
- Iterative energy-based projection on a normal data manifold for anomaly localization
- Jacobian Adversarially Regularized Networks for Robustness
- Jelly Bean World: A Testbed for Never-Ending Learning
- Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps
- Keep Doing What Worked: Behavior Modelling Priors for Offline Reinforcement Learning
- Kernelized Wasserstein Natural Gradient
- Kernel of CycleGAN as a principal homogeneous space
- Knowledge Consistency between Neural Networks and Beyond
- Lagrangian Fluid Simulation with Continuous Convolutions
- LambdaNet: Probabilistic Type Inference using Graph Neural Networks
- LAMOL: LAnguage MOdeling for Lifelong Language Learning
- Language GANs Falling Short
- Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
- Latent Normalizing Flows for Many-to-Many Cross-Domain Mappings
- Lazy-CFR: fast and near-optimal regret minimization for extensive games with imperfect information
- Learned Step Size Quantization
- Learning-Augmented Data Stream Algorithms
- Learning Compositional Koopman Operators for Model-Based Control
- Learning deep graph matching with channel-independent embedding and Hungarian attention
- Learning Disentangled Representations for CounterFactual Regression
- Learning Efficient Parameter Server Synchronization Policies for Distributed SGD
- Learning Execution Through Neural Code Fusion
- Learning Expensive Coordination: An Event-Based Deep RL Approach
- Learning from Explanations with Neural Execution Tree
- Learning from Rules Generalizing Labeled Exemplars
- Learning from Unlabelled Videos Using Contrastive Predictive Neural 3D Mapping
- Learning Heuristics for Quantified Boolean Formulas through Reinforcement Learning
- Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech
- Learning Nearly Decomposable Value Functions Via Communication Minimization
- Learning representations for binary-classification without backpropagation
- Learning Robust Representations via Multi-View Information Bottleneck
- Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling
- Learning Space Partitions for Nearest Neighbor Search
- Learning the Arrow of Time for Problems in Reinforcement Learning
- Learning The Difference That Makes A Difference With Counterfactually-Augmented Data
- Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks
- Learning to Control PDEs with Differentiable Physics
- Learning to Coordinate Manipulation Skills via Skill Behavior Diversification
- Learning To Explore Using Active Neural SLAM
- Learning to Group: A Bottom-Up Framework for 3D Part Discovery in Unseen Categories
- Learning to Guide Random Search
- Learning to Learn by Zeroth-Order Oracle
- Learning to Link
- Learning to Move with Affordance Maps
- Learning to Plan in High Dimensions via Neural Exploration-Exploitation Trees
- Learning to Represent Programs with Property Signatures
- Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering
- Learning to solve the credit assignment problem
- Learning transport cost from subset correspondence
- Learn to Explain Efficiently via Neural Logic Inductive Learning
- Linear Symmetric Quantization of Neural Networks for Low-precision Integer Hardware
- Lipschitz constant estimation of Neural Networks via sparse polynomial optimization
- Lite Transformer with Long-Short Range Attention
- Locality and Compositionality in Zero-Shot Learning
- Locally Constant Networks
- Logic and the 2-Simplicial Transformer
- Lookahead: A Far-sighted Alternative of Magnitude-based Pruning
- Low-dimensional statistical manifold embedding of directed graphs
- Low-Resource Knowledge-Grounded Dialogue Generation
- MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius
- Machine Learning: Changing the future of healthcare
- Making Efficient Use of Demonstrations to Solve Hard Exploration Problems
- Making Sense of Reinforcement Learning and Probabilistic Inference
- Masked Based Unsupervised Content Transfer
- Massively Multilingual Sparse Word Representations
- Mathematical Reasoning in Latent Space
- Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning
- Maxmin Q-learning: Controlling the Estimation Bias of Q-learning
- Measuring and Improving the Use of Graph Information in Graph Neural Networks
- Measuring Compositional Generalization: A Comprehensive Method on Realistic Data
- Measuring the Reliability of Reinforcement Learning Algorithms
- MEMO: A Deep Network for Flexible Combination of Episodic Memories
- Memory-Based Graph Networks
- Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples
- Meta Dropout: Learning to Perturb Latent Features for Generalization
- Meta-Learning Acquisition Functions for Transfer Learning in Bayesian Optimization
- Meta-learning curiosity algorithms
- Meta-Learning Deep Energy-Based Memory Models
- Meta-Learning without Memorization
- Meta-Learning with Warped Gradient Descent
- MetaPix: Few-Shot Video Retargeting
- Meta-Q-Learning
- Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies
- Minimizing FLOPs to Learn Efficient Sparse Representations
- Mirror-Generative Neural Machine Translation
- Mixed-curvature Variational Autoencoders
- Mixed Precision DNNs: All you need is a good parametrization
- Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
- Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks
- ML-IRL: Machine Learning in Real Life
- MMA Training: Direct Input Space Margin Maximization through Adversarial Training
- Model-Augmented Actor-Critic: Backpropagating through Paths
- Model Based Reinforcement Learning for Atari
- Model-based reinforcement learning for biological sequence design
- Mogrifier LSTM
- Monotonic Multihead Attention
- Multi-Agent Interactions Modeling with Correlated Policies
- Multi-agent Reinforcement Learning for Networked System Control
- Multilingual Alignment of Contextual Word Representations
- Multiplicative Interactions and Where to Find Them
- Multi-Scale Representation Learning for Spatial Feature Distributions using Grid Cells
- Mutual Information Gradient Estimation for Representation Learning
- Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification
- NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search
- NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search
- NAS evaluation is frustratingly hard
- N-BEATS: Neural basis expansion analysis for interpretable time series forecasting
- Nesterov Accelerated Gradient and Scale Invariance for Adversarial Attacks
- Network Deconvolution
- Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning
- Neural Architecture Search
- Neural Arithmetic Units
- Neural Epitome Search for Architecture-Agnostic Network Compression
- Neural Execution of Graph Algorithms
- Neural Machine Translation with Universal Visual Representation
- Neural Module Networks for Reasoning over Text
- Neural Network Branching for Neural Network Verification
- Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data
- Neural Outlier Rejection for Self-Supervised Keypoint Learning
- Neural Policy Gradient Methods: Global Optimality and Rates of Convergence
- Neural Stored-program Memory
- Neural Symbolic Reader: Scalable Integration of Distributed and Symbolic Representations for Reading Comprehension
- Neural tangent kernels, transportation mappings, and universal approximation
- Neural Tangents: Fast and Easy Infinite Neural Networks in Python
- Neural Text Generation With Unlikelihood Training
- NeurQuRI: Neural Question Requirement Inspector for Answerability Prediction in Machine Reading Comprehension
- Never Give Up: Learning Directed Exploration Strategies
- Non-Autoregressive Dialog State Tracking
- Novelty Detection Via Blurring
- Observational Overfitting in Reinforcement Learning
- On Bonus Based Exploration Methods In The Arcade Learning Environment
- Once for All: Train One Network and Specialize it for Efficient Deployment
- On Computation and Generalization of Generative Adversarial Imitation Learning
- One-Shot Pruning of Recurrent Neural Networks by Jacobian Spectrum Evaluation
- On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning
- On Identifiability in Transformers
- Online and stochastic optimization beyond Lipschitz continuity: A Riemannian approach
- On Mutual Information Maximization for Representation Learning
- On Robustness of Neural Ordinary Differential Equations
- On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach
- On the Convergence of FedAvg on Non-IID Data
- On the Equivalence between Positional Node Embeddings and Structural Graph Representations
- On the Global Convergence of Training Deep Linear ResNets
- On the interaction between supervision and self-play in emergent communication
- On the Need for Topology-Aware Generative Models for Manifold-Based Defenses
- On the Relationship between Self-Attention and Convolutional Layers
- On the "steerability" of generative adversarial networks
- On the Variance of the Adaptive Learning Rate and Beyond
- On the Weaknesses of Reinforcement Learning for Neural Machine Translation
- On Universal Equivariant Set Networks
- Optimal Strategies Against Generative Attacks
- Optimistic Exploration even with a Pessimistic Initialisation
- Option Discovery using Deep Skill Chaining
- Order Learning and Its Application to Age Estimation
- Overlearning Reveals Sensitive Attributes
- PAC Confidence Sets for Deep Neural Networks via Calibrated Prediction
- Padé Activation Units: End-to-end Learning of Flexible Activation Functions in Deep Networks
- PairNorm: Tackling Oversmoothing in GNNs
- Pay Attention to Features, Transfer Learn Faster CNNs
- PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search
- PCMC-Net: Feature-based Pairwise Choice Markov Chains
- Permutation Equivariant Models for Compositional Generalization in Language
- Phase Transitions for the Information Bottleneck in Representation Learning
- Physics-as-Inverse-Graphics: Unsupervised Physical Parameter Estimation from Video
- Physics-aware Difference Graph Networks for Sparsely-Observed Dynamics
- Picking Winning Tickets Before Training by Preserving Gradient Flow
- Piecewise linear activations substantially shape the loss surfaces of neural networks
- Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning
- Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP
- Plug and Play Language Models: A Simple Approach to Controlled Text Generation
- Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring
- Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks
- Population-Guided Parallel Policy Search for Reinforcement Learning
- Posterior sampling for multi-agent reinforcement learning: solving extensive games with imperfect information
- Practical ML for Developing Countries: learning under limited/low resource scenarios
- Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations
- Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control
- Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks
- Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model
- Pre-training Tasks for Embedding-based Large-scale Retrieval
- Principled Weight Initialization for Hypernetworks
- Probabilistic Connection Importance Inference and Lossless Compression of Deep Neural Networks
- Probability Calibration for Knowledge Graph Embedding Models
- Program Guided Agent
- Progressive Learning and Disentanglement of Hierarchical Representations
- Progressive Memory Banks for Incremental Domain Adaptation
- Projection-Based Constrained Policy Optimization
- Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks
- Provable Filter Pruning for Efficient Neural Networks
- Provable robustness against all adversarial $l_p$-perturbations for $p\geq 1$
- ProxSGD: Training Structured Neural Networks under Regularization and Constraints
- Pruned Graph Scattering Transforms
- Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving
- Pure and Spurious Critical Points: a Geometric Study of Linear Networks
- Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP
- Quantifying Point-Prediction Uncertainty in Neural Networks via Residual Estimation with an I/O Kernel
- Quantifying the Cost of Reliable Photo Authentication via High-Performance Learned Lossy Representations
- Quantum Algorithms for Deep Convolutional Neural Networks
- Query2box: Reasoning over Knowledge Graphs in Vector Space Using Box Embeddings
- Query-efficient Meta Attack to Deep Neural Networks
- RaCT: Toward Amortized Ranking-Critical Training For Collaborative Filtering
- Ranking Policy Gradient
- Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML
- RaPP: Novelty Detection with Reconstruction along Projection Pathway
- Real or Not Real, that is the Question
- Reanalysis of Variance Reduced Temporal Difference Learning
- ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning
- Reconstructing continuous distributions of 3D protein structure from cryo-EM images
- Recurrent neural circuits for contour detection
- Reducing Transformer Depth on Demand with Structured Dropout
- Reflections from the Turing Award Winners
- Reformer: The Efficient Transformer
- Regularizing activations in neural networks via distribution matching with the Wasserstein metric
- Reinforced active learning for image segmentation
- Reinforced Genetic Algorithm Learning for Optimizing Computation Graphs
- Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation
- Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives
- Relational State-Space Model for Stochastic Multi-Object Systems
- ReMixMatch: Semi-Supervised Learning with Distribution Matching and Augmentation Anchoring
- Rényi Fair Inference
- Residual Energy-Based Models for Text Generation
- Restricting the Flow: Information Bottlenecks for Attribution
- Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness
- Rethinking the Hyperparameters for Fine-tuning
- Revisiting Self-Training for Neural Sequence Generation
- RGBD-GAN: Unsupervised 3D Representation Learning From Natural Image Datasets via RGBD Image Synthesis
- RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments
- Ridge Regression: Structure, Cross-Validation, and Sketching
- RNA Secondary Structure Prediction By Learning Unrolled Algorithms
- RNNs Incrementally Evolving on an Equilibrium Manifold: A Panacea for Vanishing and Exploding Gradients?
- Robust And Interpretable Blind Image Denoising Via Bias-Free Convolutional Neural Networks
- Robust anomaly detection and backdoor attack detection via differential privacy
- Robust Local Features for Improving the Generalization of Adversarial Training
- Robustness Verification for Transformers
- Robust Reinforcement Learning for Continuous Control with Model Misspecification
- Robust Subspace Recovery Layer for Unsupervised Anomaly Detection
- Robust training with ensemble consensus
- Rotation-invariant clustering of neuronal responses in primary visual cortex
- RTFM: Generalising to New Environment Dynamics via Reading
- SAdam: A Variant of Adam for Strongly Convex Functions
- Sample Efficient Policy Gradient Methods with Recursive Variance Reduction
- Sampling-Free Learning of Bayesian Quantized Neural Networks
- Scalable and Order-robust Continual Learning with Additive Parameter Decomposition
- Scalable Model Compression by Entropy Penalized Reparameterization
- Scalable Neural Methods for Reasoning With a Symbolic Knowledge Base
- Scale-Equivariant Steerable Networks
- Scaling Autoregressive Video Models
- SCALOR: Generative World Models with Scalable Object Representations
- SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference
- Selection via Proxy: Efficient Data Selection for Deep Learning
- Self-Adversarial Learning with Comparative Discrimination for Text Generation
- Self-labelling via simultaneous clustering and representation learning
- SELF: Learning to Filter Noisy Labels with Self-Ensembling
- Self-Supervised Learning of Appliance Usage
- Semantically-Guided Representation Learning for Self-Supervised Monocular Depth
- Semi-Supervised Generative Modeling for Controllable Speech Synthesis
- Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue
- Sharing Knowledge in Multi-Task Deep Reinforcement Learning
- Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks
- Short and Sparse Deconvolution --- A Geometric Approach
- Sign Bits Are All You Need for Black-Box Attacks
- Sign-OPT: A Query-Efficient Hard-label Adversarial Attack
- Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee
- Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning
- Single Episode Policy Transfer in Reinforcement Learning
- Skip Connections Matter: On the Transferability of Adversarial Examples Generated with ResNets
- Sliced Cramer Synaptic Consolidation for Preserving Deeply Learned Representations
- SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum
- Smooth markets: A basic mechanism for organizing gradient-based learners
- Smoothness and Stability in GANs
- SNODE: Spectral Discretization of Neural ODEs for System Identification
- SNOW: Subscribing to Knowledge via Channel Pooling for Transfer & Lifelong Learning of Convolutional Neural Networks
- SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition
- Span Recovery for Deep Neural Networks with Applications to Input Obfuscation
- Sparse Coding with Gated Learned ISTA
- Spectral Embedding of Regularized Block Models
- Spike-based causal inference for weight alignment
- SpikeGrad: An ANN-equivalent Computation Model for Implementing Backpropagation with Spikes
- SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards
- Stable Rank Normalization for Improved Generalization in Neural Networks and GANs
- State Alignment-based Imitation Learning
- State-only Imitation with Transition Dynamics Mismatch
- Stochastic AUC Maximization with Deep Neural Networks
- Stochastic Conditional Generative Networks with Basis Decomposition
- Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes Well
- Strategies for Pre-training Graph Neural Networks
- StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
- StructPool: Structured Graph Pooling via Conditional Random Fields
- Structured Object-Aware Physics Prediction for Video Modeling and Planning
- Sub-policy Adaptation for Hierarchical Reinforcement Learning
- SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models
- SVQN: Sequential Variational Soft Q-Learning Networks
- Symplectic ODE-Net: Learning Hamiltonian Dynamics with Control
- Symplectic Recurrent Neural Networks
- Synthesizing Programmatic Policies that Inductively Generalize
- TabFact: A Large-scale Dataset for Table-based Fact Verification
- Tackling Climate Change with ML
- Target-Embedding Autoencoders for Supervised Representation Learning
- Tensor Decompositions for Temporal Knowledge Base Completion
- The asymptotic spectrum of the Hessian of DNN throughout training
- The Break-Even Point on Optimization Trajectories of Deep Neural Networks
- The Curious Case of Neural Text Degeneration
- The Decision-Making Side of Machine Learning: Dynamical, Statistical and Economic Perspectives
- The Early Phase of Neural Network Training
- The Gambler's Problem and Beyond
- The Implicit Bias of Depth: How Incremental Learning Drives Generalization
- The Ingredients of Real World Robotic Reinforcement Learning
- The intriguing role of module criticality in the generalization of deep networks
- The Local Elasticity of Neural Networks
- The Logical Expressiveness of Graph Neural Networks
- Theory and Evaluation Metrics for Learning Disentangled Representations
- The Shape of Data: Intrinsic Distance for Data Distributions
- The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget
- Thieves on Sesame Street! Model Extraction of BERT-based APIs
- Thinking While Moving: Deep Reinforcement Learning with Concurrent Control
- To Relieve Your Headache of Training an MRF, Take AdVIL
- Toward Evaluating Robustness of Deep Reinforcement Learning with Continuous Control
- Towards a Deep Network Architecture for Structured Smoothness
- Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets
- Towards Fast Adaptation of Neural Architectures with Meta Learning
- Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models
- Towards neural networks that provably know when they don't know
- Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization
- Towards Stable and Efficient Training of Verifiably Robust Neural Networks
- Towards Trustworthy ML: Rethinking Security and Privacy for ML
- Towards Verified Robustness under Text Deletion Interventions
- Training binary neural networks with real-to-binary convolutions
- Training Generative Adversarial Networks from Incomplete Observations using Factorised Discriminators
- Training individually fair ML models with sensitive subspace robustness
- Training Recurrent Neural Networks Online by Learning Explicit State Variables
- Tranquil Clouds: Neural Networks for Learning Temporally Coherent Features in Point Clouds
- Transferable Perturbations of Deep Feature Distributions
- Transferring Optimality Across Data Distributions via Homotopy Methods
- Transformer-XH: Multi-Evidence Reasoning with eXtra Hop Attention
- Tree-Structured Attention with Hierarchical Accumulation
- Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference
- Truth or backpropaganda? An empirical investigation of deep learning theory
- U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation
- Unbiased Contrastive Divergence Algorithm for Training Energy-Based Latent Variable Models
- Uncertainty-guided Continual Learning with Bayesian Neural Networks
- Understanding and Improving Information Transfer in Multi-Task Learning
- Understanding and Robustifying Differentiable Architecture Search
- Understanding Architectures Learnt by Cell-based Neural Architecture Search
- Understanding Generalization in Recurrent Neural Networks
- Understanding Knowledge Distillation in Non-autoregressive Machine Translation
- Understanding l4-based Dictionary Learning: Interpretation, Stability, and Robustness
- Understanding the Limitations of Conditional Generative Models
- Understanding the Limitations of Variational Mutual Information Estimators
- Understanding Why Neural Networks Generalize Well Through GSNR of Parameters
- Universal Approximation with Certified Networks
- Unpaired Point Cloud Completion on Real Scans using Adversarial Training
- Unrestricted Adversarial Examples via Semantic Manipulation
- Unsupervised Clustering using Pseudo-semi-supervised Learning
- Unsupervised Model Selection for Variational Disentangled Representation Learning
- V4D: 4D Convolutional Neural Networks for Video-level Representation Learning
- Variance Reduction With Sparse Gradients
- Variational Autoencoders for Highly Multivariate Spatial Point Processes Intensities
- Variational Hetero-Encoder Randomized GANs for Joint Image-Text Modeling
- Variational Recurrent Models for Solving Partially Observable Control Tasks
- Variational Template Machine for Data-to-Text Generation
- VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning
- Vid2Game: Controllable Characters Extracted from Real-World Videos
- VideoFlow: A Conditional Flow-Based Model for Stochastic Video Generation
- VL-BERT: Pre-training of Generic Visual-Linguistic Representations
- V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control
- vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
- Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search
- Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards
- Weakly Supervised Clustering by Exploiting Unique Class Count
- Weakly Supervised Disentanglement with Guarantees
- What Can Neural Networks Reason About?
- What graph neural networks cannot learn: depth vs width
- White Noise Analysis of Neural Networks
- Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity
- Why Not to Use Zero Imputation? Correcting Sparsity Bias in Training Neural Networks
- word2ket: Space-efficient Word Embeddings inspired by Quantum Entanglement
- Workshop on Causal Learning For Decision Making
- You CAN Teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings
- You Only Train Once: Loss-Conditional Training of Deep Networks
- Your classifier is secretly an energy based model and you should treat it like one