Showing 1–29 of 29 results for author: Novikov, A

Search v0.5.6 released 2020-02-24

arXiv:2402.14396 [pdf, other]

quant-ph cs.LG

Quantum Circuit Optimization with AlphaTensor

Authors: Francisco J. R. Ruiz, Tuomas Laakkonen, Johannes Bausch, Matej Balog, Mohammadamin Barekatain, Francisco J. H. Heras, Alexander Novikov, Nathan Fitzpatrick, Bernardino Romera-Paredes, John van de Wetering, Alhussein Fawzi, Konstantinos Meichanetzidis, Pushmeet Kohli

Abstract: A key challenge in realizing fault-tolerant quantum computers is circuit optimization. Focusing on the most expensive gates in fault-tolerant quantum computation (namely, the T gates), we address the problem of T-count optimization, i.e., minimizing the number of T gates that are needed to implement a given circuit. To achieve this, we develop AlphaTensor-Quantum, a method based on deep reinforcem… ▽ More A key challenge in realizing fault-tolerant quantum computers is circuit optimization. Focusing on the most expensive gates in fault-tolerant quantum computation (namely, the T gates), we address the problem of T-count optimization, i.e., minimizing the number of T gates that are needed to implement a given circuit. To achieve this, we develop AlphaTensor-Quantum, a method based on deep reinforcement learning that exploits the relationship between optimizing T-count and tensor decomposition. Unlike existing methods for T-count optimization, AlphaTensor-Quantum can incorporate domain-specific knowledge about quantum computation and leverage gadgets, which significantly reduces the T-count of the optimized circuits. AlphaTensor-Quantum outperforms the existing methods for T-count optimization on a set of arithmetic benchmarks (even when compared without making use of gadgets). Remarkably, it discovers an efficient algorithm akin to Karatsuba's method for multiplication in finite fields. AlphaTensor-Quantum also finds the best human-designed solutions for relevant arithmetic computations used in Shor's algorithm and for quantum chemistry simulation, thus demonstrating it can save hundreds of hours of research by optimizing relevant quantum circuits in a fully automated way. △ Less

Submitted 5 March, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: 25 pages main paper + 19 pages appendix
arXiv:2310.12990 [pdf, other]

cs.CV cs.LG eess.SP math.OC

Wave-informed dictionary learning for high-resolution imaging in complex media

Authors: Miguel Moscoso, Alexei Novikov, George Papanicolaou, Chrysoula Tsogka

Abstract: We propose an approach for imaging in scattering media when large and diverse data sets are available. It has two steps. Using a dictionary learning algorithm the first step estimates the true Green's function vectors as columns in an unordered sensing matrix. The array data comes from many sparse sets of sources whose location and strength are not known to us. In the second step, the columns of t… ▽ More We propose an approach for imaging in scattering media when large and diverse data sets are available. It has two steps. Using a dictionary learning algorithm the first step estimates the true Green's function vectors as columns in an unordered sensing matrix. The array data comes from many sparse sets of sources whose location and strength are not known to us. In the second step, the columns of the estimated sensing matrix are ordered for imaging using Multi-Dimensional Scaling with connectivity information derived from cross-correlations of its columns, as in time reversal. For these two steps to work together we need data from large arrays of receivers so the columns of the sensing matrix are incoherent for the first step, as well as from sub-arrays so that they are coherent enough to obtain the connectivity needed in the second step. Through simulation experiments, we show that the proposed approach is able to provide images in complex media whose resolution is that of a homogeneous medium. △ Less

Submitted 21 September, 2023; originally announced October 2023.
arXiv:2210.10855 [pdf, other]

cs.LG eess.SP math.PR stat.ML

Dictionary Learning for the Almost-Linear Sparsity Regime

Authors: Alexei Novikov, Stephen White

Abstract: Dictionary learning, the problem of recovering a sparsely used matrix $\mathbf{D} \in \mathbb{R}^{M \times K}$ and $N$ $s$-sparse vectors $\mathbf{x}_i \in \mathbb{R}^{K}$ from samples of the form $\mathbf{y}_i = \mathbf{D}\mathbf{x}_i$, is of increasing importance to applications in signal processing and data science. When the dictionary is known, recovery of $\mathbf{x}_i$ is possible even for s… ▽ More Dictionary learning, the problem of recovering a sparsely used matrix $\mathbf{D} \in \mathbb{R}^{M \times K}$ and $N$ $s$-sparse vectors $\mathbf{x}_i \in \mathbb{R}^{K}$ from samples of the form $\mathbf{y}_i = \mathbf{D}\mathbf{x}_i$, is of increasing importance to applications in signal processing and data science. When the dictionary is known, recovery of $\mathbf{x}_i$ is possible even for sparsity linear in dimension $M$, yet to date, the only algorithms which provably succeed in the linear sparsity regime are Riemannian trust-region methods, which are limited to orthogonal dictionaries, and methods based on the sum-of-squares hierarchy, which requires super-polynomial time in order to obtain an error which decays in $M$. In this work, we introduce SPORADIC (SPectral ORAcle DICtionary Learning), an efficient spectral method on family of reweighted covariance matrices. We prove that in high enough dimensions, SPORADIC can recover overcomplete ($K > M$) dictionaries satisfying the well-known restricted isometry property (RIP) even when sparsity is linear in dimension up to logarithmic factors. Moreover, these accuracy guarantees have an ``oracle property" that the support and signs of the unknown sparse vectors $\mathbf{x}_i$ can be recovered exactly with high probability, allowing for arbitrarily close estimation of $\mathbf{D}$ with enough samples in polynomial time. To the author's knowledge, SPORADIC is the first polynomial-time algorithm which provably enjoys such convergence guarantees for overcomplete RIP matrices in the near-linear sparsity regime. △ Less

Submitted 27 March, 2023; v1 submitted 19 October, 2022; originally announced October 2022.
arXiv:2205.06175 [pdf, other]

cs.AI cs.CL cs.LG cs.RO

A Generalist Agent

Authors: Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, Tom Eccles, Jake Bruce, Ali Razavi, Ashley Edwards, Nicolas Heess, Yutian Chen, Raia Hadsell, Oriol Vinyals, Mahyar Bordbar, Nando de Freitas

Abstract: Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, dec… ▽ More Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato. △ Less

Submitted 11 November, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

Comments: Published at TMLR, 42 pages

Journal ref: Transactions on Machine Learning Research, 11/2022, https://openreview.net/forum?id=1ikK0kHjvj
arXiv:2103.16596 [pdf, other]

cs.LG stat.ML

Benchmarks for Deep Off-Policy Evaluation

Authors: Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R. Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Tom Le Paine

Abstract: Off-policy evaluation (OPE) holds the promise of being able to leverage large, offline datasets for both evaluating and selecting complex policies for decision making. The ability to learn offline is particularly important in many real-world domains, such as in healthcare, recommender systems, or robotics, where online data collection is an expensive and potentially dangerous process. Being able t… ▽ More Off-policy evaluation (OPE) holds the promise of being able to leverage large, offline datasets for both evaluating and selecting complex policies for decision making. The ability to learn offline is particularly important in many real-world domains, such as in healthcare, recommender systems, or robotics, where online data collection is an expensive and potentially dangerous process. Being able to accurately evaluate and select high-performing policies without requiring online interaction could yield significant benefits in safety, time, and cost for these applications. While many OPE methods have been proposed in recent years, comparing results between papers is difficult because currently there is a lack of a comprehensive and unified benchmark, and measuring algorithmic progress has been challenging due to the lack of difficult evaluation tasks. In order to address this gap, we present a collection of policies that in conjunction with existing offline datasets can be used for benchmarking off-policy evaluation. Our tasks include a range of challenging high-dimensional continuous control problems, with wide selections of datasets and policies for performing policy selection. The goal of our benchmark is to provide a standardized measure of progress that is motivated from a set of principles designed to challenge and test the limits of existing OPE methods. We perform an evaluation of state-of-the-art algorithms and provide open-source access to our data and code to foster future research in this area. △ Less

Submitted 30 March, 2021; originally announced March 2021.

Comments: ICLR 2021 paper. Policies and evaluation code are available at https://github.com/google-research/deep_ope
arXiv:2103.14974 [pdf, other]

math.OC cs.LG cs.MS math.NA

Automatic differentiation for Riemannian optimization on low-rank matrix and tensor-train manifolds

Authors: Alexander Novikov, Maxim Rakhuba, Ivan Oseledets

Abstract: In scientific computing and machine learning applications, matrices and more general multidimensional arrays (tensors) can often be approximated with the help of low-rank decompositions. Since matrices and tensors of fixed rank form smooth Riemannian manifolds, one of the popular tools for finding low-rank approximations is to use Riemannian optimization. Nevertheless, efficient implementation of… ▽ More In scientific computing and machine learning applications, matrices and more general multidimensional arrays (tensors) can often be approximated with the help of low-rank decompositions. Since matrices and tensors of fixed rank form smooth Riemannian manifolds, one of the popular tools for finding low-rank approximations is to use Riemannian optimization. Nevertheless, efficient implementation of Riemannian gradients and Hessians, required in Riemannian optimization algorithms, can be a nontrivial task in practice. Moreover, in some cases, analytic formulas are not even available. In this paper, we build upon automatic differentiation and propose a method that, given an implementation of the function to be minimized, efficiently computes Riemannian gradients and matrix-by-vector products between an approximate Riemannian Hessian and a given vector. △ Less

Submitted 23 October, 2021; v1 submitted 27 March, 2021; originally announced March 2021.
arXiv:2103.11893 [pdf, ps, other]

eess.SP cs.IT math.PR

Thresholding Greedy Pursuit for Sparse Recovery Problems

Authors: Hai Le, Alexei Novikov

Abstract: We study here sparse recovery problems in the presence of additive noise. We analyze a thresholding version of the CoSaMP algorithm, named Thresholding Greedy Pursuit (TGP). We demonstrate that an appropriate choice of thresholding parameter, even without the knowledge of sparsity level of the signal and strength of the noise, can result in exact recovery with no false discoveries as the dimension… ▽ More We study here sparse recovery problems in the presence of additive noise. We analyze a thresholding version of the CoSaMP algorithm, named Thresholding Greedy Pursuit (TGP). We demonstrate that an appropriate choice of thresholding parameter, even without the knowledge of sparsity level of the signal and strength of the noise, can result in exact recovery with no false discoveries as the dimension of the data increases to infinity. △ Less

Submitted 17 March, 2021; originally announced March 2021.

Comments: First version
arXiv:2012.06899 [pdf, other]

cs.LG cs.AI cs.RO

Semi-supervised reward learning for offline reinforcement learning

Authors: Ksenia Konyushkova, Konrad Zolna, Yusuf Aytar, Alexander Novikov, Scott Reed, Serkan Cabi, Nando de Freitas

Abstract: In offline reinforcement learning (RL) agents are trained using a logged dataset. It appears to be the most natural route to attack real-life applications because in domains such as healthcare and robotics interactions with the environment are either expensive or unethical. Training agents usually requires reward functions, but unfortunately, rewards are seldom available in practice and their engi… ▽ More In offline reinforcement learning (RL) agents are trained using a logged dataset. It appears to be the most natural route to attack real-life applications because in domains such as healthcare and robotics interactions with the environment are either expensive or unethical. Training agents usually requires reward functions, but unfortunately, rewards are seldom available in practice and their engineering is challenging and laborious. To overcome this, we investigate reward learning under the constraint of minimizing human reward annotations. We consider two types of supervision: timestep annotations and demonstrations. We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data. In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards. We further investigate the relationship between the quality of the reward model and the final policies. We notice, for example, that the reward models do not need to be perfect to result in useful policies. △ Less

Submitted 12 December, 2020; originally announced December 2020.

Comments: Accepted to Offline Reinforcement Learning Workshop at Neural Information Processing Systems (2020)
arXiv:2011.13885 [pdf, other]

cs.LG cs.AI cs.RO stat.ML

Offline Learning from Demonstrations and Unlabeled Experience

Authors: Konrad Zolna, Alexander Novikov, Ksenia Konyushkova, Caglar Gulcehre, Ziyu Wang, Yusuf Aytar, Misha Denil, Nando de Freitas, Scott Reed

Abstract: Behavior cloning (BC) is often practical for robot learning because it allows a policy to be trained offline without rewards, by supervised learning on expert demonstrations. However, BC does not effectively leverage what we will refer to as unlabeled experience: data of mixed and unknown quality without reward annotations. This unlabeled data can be generated by a variety of sources such as human… ▽ More Behavior cloning (BC) is often practical for robot learning because it allows a policy to be trained offline without rewards, by supervised learning on expert demonstrations. However, BC does not effectively leverage what we will refer to as unlabeled experience: data of mixed and unknown quality without reward annotations. This unlabeled data can be generated by a variety of sources such as human teleoperation, scripted policies and other agents on the same robot. Towards data-driven offline robot learning that can use this unlabeled experience, we introduce Offline Reinforced Imitation Learning (ORIL). ORIL first learns a reward function by contrasting observations from demonstrator and unlabeled trajectories, then annotates all data with the learned reward, and finally trains an agent via offline reinforcement learning. Across a diverse set of continuous control and simulated robotic manipulation tasks, we show that ORIL consistently outperforms comparable BC agents by effectively leveraging unlabeled experience. △ Less

Submitted 27 November, 2020; originally announced November 2020.

Comments: Accepted to Offline Reinforcement Learning Workshop at Neural Information Processing Systems (2020)
arXiv:2011.09269 [pdf, other]

cs.CR

doi 10.1109/ISPRAS51486.2020.00014

Sydr: Cutting Edge Dynamic Symbolic Execution

Authors: Alexey Vishnyakov, Andrey Fedotov, Daniil Kuts, Alexander Novikov, Darya Parygina, Eli Kobrin, Vlada Logunova, Pavel Belecky, Shamil Kurmangaleev

Abstract: The security development lifecycle (SDL) is becoming an industry standard. Dynamic symbolic execution (DSE) has enormous amount of applications in computer security (fuzzing, vulnerability discovery, reverse-engineering, etc.). We propose several performance and accuracy improvements for dynamic symbolic execution. Skipping non-symbolic instructions allows to build a path predicate 1.2--3.5 times… ▽ More The security development lifecycle (SDL) is becoming an industry standard. Dynamic symbolic execution (DSE) has enormous amount of applications in computer security (fuzzing, vulnerability discovery, reverse-engineering, etc.). We propose several performance and accuracy improvements for dynamic symbolic execution. Skipping non-symbolic instructions allows to build a path predicate 1.2--3.5 times faster. Symbolic engine simplifies formulas during symbolic execution. Path predicate slicing eliminates irrelevant conjuncts from solver queries. We handle each jump table (switch statement) as multiple branches and describe the method for symbolic execution of multi-threaded programs. The proposed solutions were implemented in Sydr tool. Sydr performs inversion of branches in path predicate. Sydr combines DynamoRIO dynamic binary instrumentation tool with Triton symbolic engine. We evaluated Sydr features on 64-bit Linux executables. △ Less

Submitted 26 January, 2021; v1 submitted 18 November, 2020; originally announced November 2020.

Comments: 9 pages

Journal ref: 2020 Ivannikov ISPRAS Open Conference (ISPRAS), IEEE, 2020, pp. 46-54
arXiv:2010.07012 [pdf, other]

eess.SP cs.LG math.NA math.PR

doi 10.1109/TSP.2021.3067140

Fast signal recovery from quadratic measurements

Authors: Miguel Moscoso, Alexei Novikov, George Papanicolaou, Chrysoula Tsogka

Abstract: We present a novel approach for recovering a sparse signal from cross-correlated data. Cross-correlations naturally arise in many fields of imaging, such as optics, holography and seismic interferometry. Compared to the sparse signal recovery problem that uses linear measurements, the unknown is now a matrix formed by the cross correlation of the unknown signal. Hence, the bottleneck for inversion… ▽ More We present a novel approach for recovering a sparse signal from cross-correlated data. Cross-correlations naturally arise in many fields of imaging, such as optics, holography and seismic interferometry. Compared to the sparse signal recovery problem that uses linear measurements, the unknown is now a matrix formed by the cross correlation of the unknown signal. Hence, the bottleneck for inversion is the number of unknowns that grows quadratically. The main idea of our proposed approach is to reduce the dimensionality of the problem by recovering only the diagonal of the unknown matrix, whose dimension grows linearly with the size of the problem. The keystone of the methodology is the use of an efficient {\em Noise Collector} that absorbs the data that come from the off-diagonal elements of the unknown matrix and that do not carry extra information about the support of the signal. This results in a linear problem whose cost is similar to the one that uses linear measurements. Our theory shows that the proposed approach provides exact support recovery when the data is not too noisy, and that there are no false positives for any level of noise. Moreover, our theory also demonstrates that when using cross-correlated data, the level of sparsity that can be recovered increases, scaling almost linearly with the number of data. The numerical experiments presented in the paper corroborate these findings. △ Less

Submitted 11 October, 2020; originally announced October 2020.
arXiv:2007.09055 [pdf, other]

cs.LG cs.AI stat.ML

Hyperparameter Selection for Offline Reinforcement Learning

Authors: Tom Le Paine, Cosmin Paduraru, Andrea Michi, Caglar Gulcehre, Konrad Zolna, Alexander Novikov, Ziyu Wang, Nando de Freitas

Abstract: Offline reinforcement learning (RL purely from logged data) is an important avenue for deploying RL techniques in real-world scenarios. However, existing hyperparameter selection methods for offline RL break the offline assumption by evaluating policies corresponding to each hyperparameter setting in the environment. This online execution is often infeasible and hence undermines the main aim of of… ▽ More Offline reinforcement learning (RL purely from logged data) is an important avenue for deploying RL techniques in real-world scenarios. However, existing hyperparameter selection methods for offline RL break the offline assumption by evaluating policies corresponding to each hyperparameter setting in the environment. This online execution is often infeasible and hence undermines the main aim of offline RL. Therefore, in this work, we focus on \textit{offline hyperparameter selection}, i.e. methods for choosing the best policy from a set of many policies trained using different hyperparameters, given only logged data. Through large-scale empirical evaluation we show that: 1) offline RL algorithms are not robust to hyperparameter choices, 2) factors such as the offline RL algorithm and method for estimating Q values can have a big impact on hyperparameter selection, and 3) when we control those factors carefully, we can reliably rank policies across hyperparameter choices, and therefore choose policies which are close to the best policy in the set. Overall, our results present an optimistic view that offline hyperparameter selection is within reach, even in challenging tasks with pixel observations, high dimensional action spaces, and long horizon. △ Less

Submitted 17 July, 2020; originally announced July 2020.
arXiv:2006.15134 [pdf, other]

cs.LG cs.AI stat.ML

Critic Regularized Regression

Authors: Ziyu Wang, Alexander Novikov, Konrad Zolna, Jost Tobias Springenberg, Scott Reed, Bobak Shahriari, Noah Siegel, Josh Merel, Caglar Gulcehre, Nicolas Heess, Nando de Freitas

Abstract: Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction. It addresses challenges with regard to the cost of data collection and safety, both of which are particularly pertinent to real-world applications of RL. Unfortunately, most off-policy algorithms perform poorly when learnin… ▽ More Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction. It addresses challenges with regard to the cost of data collection and safety, both of which are particularly pertinent to real-world applications of RL. Unfortunately, most off-policy algorithms perform poorly when learning from a fixed dataset. In this paper, we propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR). We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces -- outperforming several state-of-the-art offline RL algorithms by a significant margin on a wide range of benchmark tasks. △ Less

Submitted 22 September, 2021; v1 submitted 26 June, 2020; originally announced June 2020.

Comments: 24 pages; presented at NeurIPS 2020
arXiv:2006.13888 [pdf, other]

cs.LG stat.ML

RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning

Authors: Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gomez Colmenarejo, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matt Hoffman, Ofir Nachum, George Tucker, Nicolas Heess, Nando de Freitas

Abstract: Offline methods for reinforcement learning have a potential to help bridge the gap between reinforcement learning research and real-world applications. They make it possible to learn policies from offline datasets, thus overcoming concerns associated with online data collection in the real-world, including cost, safety, or ethical concerns. In this paper, we propose a benchmark called RL Unplugged… ▽ More Offline methods for reinforcement learning have a potential to help bridge the gap between reinforcement learning research and real-world applications. They make it possible to learn policies from offline datasets, thus overcoming concerns associated with online data collection in the real-world, including cost, safety, or ethical concerns. In this paper, we propose a benchmark called RL Unplugged to evaluate and compare offline RL methods. RL Unplugged includes data from a diverse range of domains including games (e.g., Atari benchmark) and simulated motor control problems (e.g., DM Control Suite). The datasets include domains that are partially or fully observable, use continuous or discrete actions, and have stochastic vs. deterministic dynamics. We propose detailed evaluation protocols for each domain in RL Unplugged and provide an extensive analysis of supervised learning and offline RL methods using these protocols. We will release data for all our tasks and open-source all algorithms presented in this paper. We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community. Moving forward, we view RL Unplugged as a living benchmark suite that will evolve and grow with datasets contributed by the research community and ourselves. Our project page is available on https://git.io/JJUhd. △ Less

Submitted 12 February, 2021; v1 submitted 24 June, 2020; originally announced June 2020.

Comments: NeurIPS paper. 21 pages including supplementary material, the github link for the datasets: https://github.com/deepmind/deepmind-research/rl_unplugged
arXiv:2006.00979 [pdf, other]

cs.LG cs.AI

Acme: A Research Framework for Distributed Reinforcement Learning

Authors: Matthew W. Hoffman, Bobak Shahriari, John Aslanides, Gabriel Barth-Maron, Nikola Momchev, Danila Sinopalnikov, Piotr Stańczyk, Sabela Ramos, Anton Raichuk, Damien Vincent, Léonard Hussenot, Robert Dadashi, Gabriel Dulac-Arnold, Manu Orsini, Alexis Jacq, Johan Ferret, Nino Vieillard, Seyed Kamyar Seyed Ghasemipour, Sertan Girgin, Olivier Pietquin, Feryal Behbahani, Tamara Norman, Abbas Abdolmaleki, Albin Cassirer, Fan Yang , et al. (14 additional authors not shown)

Abstract: Deep reinforcement learning (RL) has led to many recent and groundbreaking advances. However, these advances have often come at the cost of both increased scale in the underlying architectures being trained as well as increased complexity of the RL algorithms used to train them. These increases have in turn made it more difficult for researchers to rapidly prototype new ideas or reproduce publishe… ▽ More Deep reinforcement learning (RL) has led to many recent and groundbreaking advances. However, these advances have often come at the cost of both increased scale in the underlying architectures being trained as well as increased complexity of the RL algorithms used to train them. These increases have in turn made it more difficult for researchers to rapidly prototype new ideas or reproduce published RL algorithms. To address these concerns this work describes Acme, a framework for constructing novel RL algorithms that is specifically designed to enable agents that are built using simple, modular components that can be used at various scales of execution. While the primary goal of Acme is to provide a framework for algorithm development, a secondary goal is to provide simple reference implementations of important or state-of-the-art algorithms. These implementations serve both as a validation of our design decisions as well as an important contribution to reproducibility in RL research. In this work we describe the major design decisions made within Acme and give further details as to how its components can be used to implement various algorithms. Our experiments provide baselines for a number of common and state-of-the-art algorithms as well as showing how these algorithms can be scaled up for much larger and more complex environments. This highlights one of the primary advantages of Acme, namely that it can be used to implement large, distributed RL algorithms that can run at massive scales while still maintaining the inherent readability of that implementation. This work presents a second version of the paper which coincides with an increase in modularity, additional emphasis on offline, imitation and learning from demonstrations algorithms, as well as various new agents implemented as part of Acme. △ Less

Submitted 20 September, 2022; v1 submitted 1 June, 2020; originally announced June 2020.

Comments: This work presents a second version of the paper which coincides with an increase in modularity, additional emphasis on offline, imitation and learning from demonstrations algorithms, as well as various new agents implemented as part of Acme
arXiv:1910.01077 [pdf, other]

cs.LG cs.AI cs.RO stat.ML

Task-Relevant Adversarial Imitation Learning

Authors: Konrad Zolna, Scott Reed, Alexander Novikov, Sergio Gomez Colmenarejo, David Budden, Serkan Cabi, Misha Denil, Nando de Freitas, Ziyu Wang

Abstract: We show that a critical vulnerability in adversarial imitation is the tendency of discriminator networks to learn spurious associations between visual features and expert labels. When the discriminator focuses on task-irrelevant features, it does not provide an informative reward signal, leading to poor task performance. We analyze this problem in detail and propose a solution that outperforms sta… ▽ More We show that a critical vulnerability in adversarial imitation is the tendency of discriminator networks to learn spurious associations between visual features and expert labels. When the discriminator focuses on task-irrelevant features, it does not provide an informative reward signal, leading to poor task performance. We analyze this problem in detail and propose a solution that outperforms standard Generative Adversarial Imitation Learning (GAIL). Our proposed method, Task-Relevant Adversarial Imitation Learning (TRAIL), uses constrained discriminator optimization to learn informative rewards. In comprehensive experiments, we show that TRAIL can solve challenging robotic manipulation tasks from pixels by imitating human operators without access to any task rewards, and clearly outperforms comparable baseline imitation agents, including those trained via behaviour cloning and conventional GAIL. △ Less

Submitted 12 November, 2020; v1 submitted 2 October, 2019; originally announced October 2019.

Comments: Accepted to CoRL 2020 (see presentation here: https://youtu.be/ZgQvFGuEgFU )
arXiv:1909.12200 [pdf, other]

cs.RO cs.LG

Scaling data-driven robotics with reward sketching and batch reinforcement learning

Authors: Serkan Cabi, Sergio Gómez Colmenarejo, Alexander Novikov, Ksenia Konyushkova, Scott Reed, Rae Jeong, Konrad Zolna, Yusuf Aytar, David Budden, Mel Vecerik, Oleg Sushkov, David Barker, Jonathan Scholz, Misha Denil, Nando de Freitas, Ziyu Wang

Abstract: We present a framework for data-driven robotics that makes use of a large dataset of recorded robot experience and scales to several tasks using learned reward functions. We show how to apply this framework to accomplish three different object manipulation tasks on a real robot platform. Given demonstrations of a task together with task-agnostic recorded experience, we use a special form of human… ▽ More We present a framework for data-driven robotics that makes use of a large dataset of recorded robot experience and scales to several tasks using learned reward functions. We show how to apply this framework to accomplish three different object manipulation tasks on a real robot platform. Given demonstrations of a task together with task-agnostic recorded experience, we use a special form of human annotation as supervision to learn a reward function, which enables us to deal with real-world tasks where the reward signal cannot be acquired directly. Learned rewards are used in combination with a large dataset of experience from different tasks to learn a robot policy offline using batch RL. We show that using our approach it is possible to train agents to perform a variety of challenging manipulation tasks including stacking rigid objects and handling cloth. △ Less

Submitted 4 June, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

Comments: Project website: https://sites.google.com/view/data-driven-robotics/

Journal ref: Robotics: Science and Systems Conference 2020
arXiv:1908.04412 [pdf, other]

eess.SP cs.LG stat.ML

doi 10.1073/pnas.1913995117

The Noise Collector for sparse recovery in high dimensions

Authors: Miguel Moscoso, Alexei Novikov, George Papanicolaou, Chrysoula Tsogka

Abstract: The ability to detect sparse signals from noisy high-dimensional data is a top priority in modern science and engineering. A sparse solution of the linear system $A ρ= b_0$ can be found efficiently with an $l_1$-norm minimization approach if the data is noiseless. Detection of the signal's support from data corrupted by noise is still a challenging problem, especially if the level of noise must be… ▽ More The ability to detect sparse signals from noisy high-dimensional data is a top priority in modern science and engineering. A sparse solution of the linear system $A ρ= b_0$ can be found efficiently with an $l_1$-norm minimization approach if the data is noiseless. Detection of the signal's support from data corrupted by noise is still a challenging problem, especially if the level of noise must be estimated. We propose a new efficient approach that does not require any parameter estimation. We introduce the Noise Collector (NC) matrix $C$ and solve an augmented system $A ρ+ C η= b_0 + e$, where $ e$ is the noise. We show that the $l_1$-norm minimal solution of the augmented system has zero false discovery rate for any level of noise and with probability that tends to one as the dimension of $ b_0$ increases to infinity. We also obtain exact support recovery if the noise is not too large, and develop a Fast Noise Collector Algorithm which makes the computational cost of solving the augmented system comparable to that of the original one. Finally, we demonstrate the effectiveness of the method in applications to passive array imaging. △ Less

Submitted 5 August, 2019; originally announced August 2019.
arXiv:1908.01479 [pdf, other]

eess.IV cs.LG math.NA physics.comp-ph

doi 10.1088/1361-6420/ab5a21

Imaging with highly incomplete and corrupted data

Authors: Miguel Moscoso, Alexei Novikov, George Papanicolaou, Chrysoula Tsogka

Abstract: We consider the problem of imaging sparse scenes from a few noisy data using an $l_1$-minimization approach. This problem can be cast as a linear system of the form $A \, ρ=b$, where $A$ is an $N\times K$ measurement matrix. We assume that the dimension of the unknown sparse vector $ρ\in {\mathbb{C}}^K$ is much larger than the dimension of the data vector $b \in {\mathbb{C}}^N$, i.e, $K \gg N$. We… ▽ More We consider the problem of imaging sparse scenes from a few noisy data using an $l_1$-minimization approach. This problem can be cast as a linear system of the form $A \, ρ=b$, where $A$ is an $N\times K$ measurement matrix. We assume that the dimension of the unknown sparse vector $ρ\in {\mathbb{C}}^K$ is much larger than the dimension of the data vector $b \in {\mathbb{C}}^N$, i.e, $K \gg N$. We provide a theoretical framework that allows us to examine under what conditions the $\ell_1$-minimization problem admits a solution that is close to the exact one in the presence of noise. Our analysis shows that $l_1$-minimization is not robust for imaging with noisy data when high resolution is required. To improve the performance of $l_1$-minimization we propose to solve instead the augmented linear system $ [A \, | \, C] ρ=b$, where the $N \times Σ$ matrix $C$ is a noise collector. It is constructed so as its column vectors provide a frame on which the noise of the data, a vector of dimension $N$, can be well approximated. Theoretically, the dimension $Σ$ of the noise collector should be $e^N$ which would make its use not practical. However, our numerical results illustrate that robust results in the presence of noise can be obtained with a large enough number of columns $Σ\approx 10 K$. △ Less

Submitted 5 August, 2019; originally announced August 2019.
arXiv:1905.05308 [pdf, other]

physics.comp-ph cs.CE math.NA

Synthetic aperture imaging with intensity-only data

Authors: Miguel Moscoso, Alexei Novikov, George Papanicolaou, Chrysoula Tsogka

Abstract: We consider imaging the reflectivity of scatterers from intensity-only data recorded by a single moving transducer that both emits and receives signals, forming a synthetic aperture. By exploiting frequency illumination diversity, we obtain multiple intensity measurements at each location, from which we determine field cross-correlations using an appropriate phase controlled illumination strategy… ▽ More We consider imaging the reflectivity of scatterers from intensity-only data recorded by a single moving transducer that both emits and receives signals, forming a synthetic aperture. By exploiting frequency illumination diversity, we obtain multiple intensity measurements at each location, from which we determine field cross-correlations using an appropriate phase controlled illumination strategy and the inner product polarization identity. The field cross-correlations obtained this way do not, however, provide all the missing phase information because they are determined up to a phase that depends on the receiver's location. The main result of this paper is an algorithm with which we recover the field cross-correlations up to a single phase that is common to all the data measured over the synthetic aperture, so all the data are synchronized. Thus, we can image coherently with data over all frequencies and measurement locations as if full phase information was recorded. △ Less

Submitted 13 May, 2019; originally announced May 2019.
arXiv:1807.02437 [pdf, other]

cs.CV

doi 10.1109/TMI.2018.2881678

Deep Sequential Segmentation of Organs in Volumetric Medical Scans

Authors: Alexey Novikov, David Major, Maria Wimmer, Dimitrios Lenis, Katja Bühler

Abstract: Segmentation in 3D scans is playing an increasingly important role in current clinical practice supporting diagnosis, tissue quantification, or treatment planning. The current 3D approaches based on convolutional neural networks usually suffer from at least three main issues caused predominantly by implementation constraints - first, they require resizing the volume to the lower-resolutional refer… ▽ More Segmentation in 3D scans is playing an increasingly important role in current clinical practice supporting diagnosis, tissue quantification, or treatment planning. The current 3D approaches based on convolutional neural networks usually suffer from at least three main issues caused predominantly by implementation constraints - first, they require resizing the volume to the lower-resolutional reference dimensions, second, the capacity of such approaches is very limited due to memory restrictions, and third, all slices of volumes have to be available at any given training or testing time. We address these problems by a U-Net-like architecture consisting of bidirectional convolutional LSTM and convolutional, pooling, upsampling and concatenation layers enclosed into time-distributed wrappers. Our network can either process the full volumes in a sequential manner, or segment slabs of slices on demand. We demonstrate performance of our architecture on vertebrae and liver segmentation tasks in 3D CT scans. △ Less

Submitted 11 March, 2019; v1 submitted 6 July, 2018; originally announced July 2018.

Journal ref: Published in IEEE Transactions on Medical Imaging on 16 November 2018, URL: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8537944&isnumber=4359023
arXiv:1801.01928 [pdf, ps, other]

cs.MS math.NA

Tensor Train decomposition on TensorFlow (T3F)

Authors: Alexander Novikov, Pavel Izmailov, Valentin Khrulkov, Michael Figurnov, Ivan Oseledets

Abstract: Tensor Train decomposition is used across many branches of machine learning. We present T3F -- a library for Tensor Train decomposition based on TensorFlow. T3F supports GPU execution, batch processing, automatic differentiation, and versatile functionality for the Riemannian optimization framework, which takes into account the underlying manifold structure to construct efficient optimization meth… ▽ More Tensor Train decomposition is used across many branches of machine learning. We present T3F -- a library for Tensor Train decomposition based on TensorFlow. T3F supports GPU execution, batch processing, automatic differentiation, and versatile functionality for the Riemannian optimization framework, which takes into account the underlying manifold structure to construct efficient optimization methods. The library makes it easier to implement machine learning papers that rely on the Tensor Train decomposition. T3F includes documentation, examples and 94% test coverage. △ Less

Submitted 2 March, 2020; v1 submitted 5 January, 2018; originally announced January 2018.
arXiv:1711.00811 [pdf, other]

cs.LG

Expressive power of recurrent neural networks

Authors: Valentin Khrulkov, Alexander Novikov, Ivan Oseledets

Abstract: Deep neural networks are surprisingly efficient at solving practical tasks, but the theory behind this phenomenon is only starting to catch up with the practice. Numerous works show that depth is the key to this efficiency. A certain class of deep convolutional networks -- namely those that correspond to the Hierarchical Tucker (HT) tensor decomposition -- has been proven to have exponentially hig… ▽ More Deep neural networks are surprisingly efficient at solving practical tasks, but the theory behind this phenomenon is only starting to catch up with the practice. Numerous works show that depth is the key to this efficiency. A certain class of deep convolutional networks -- namely those that correspond to the Hierarchical Tucker (HT) tensor decomposition -- has been proven to have exponentially higher expressive power than shallow networks. I.e. a shallow network of exponential width is required to realize the same score function as computed by the deep architecture. In this paper, we prove the expressive power theorem (an exponential lower bound on the width of the equivalent shallow network) for a class of recurrent neural networks -- ones that correspond to the Tensor Train (TT) decomposition. This means that even processing an image patch by patch with an RNN can be exponentially more efficient than a (shallow) convolutional network with one hidden layer. Using theoretical results on the relation between the tensor decompositions we compare expressive powers of the HT- and TT-Networks. We also implement the recurrent TT-Networks and provide numerical evidence of their expressivity. △ Less

Submitted 7 February, 2018; v1 submitted 2 November, 2017; originally announced November 2017.

Comments: Accepted as a conference paper at ICLR 2018
arXiv:1710.07324 [pdf, other]

cs.LG stat.ML

Scalable Gaussian Processes with Billions of Inducing Inputs via Tensor Train Decomposition

Authors: Pavel Izmailov, Alexander Novikov, Dmitry Kropotov

Abstract: We propose a method (TT-GP) for approximate inference in Gaussian Process (GP) models. We build on previous scalable GP research including stochastic variational inference based on inducing inputs, kernel interpolation, and structure exploiting algebra. The key idea of our method is to use Tensor Train decomposition for variational parameters, which allows us to train GPs with billions of inducing… ▽ More We propose a method (TT-GP) for approximate inference in Gaussian Process (GP) models. We build on previous scalable GP research including stochastic variational inference based on inducing inputs, kernel interpolation, and structure exploiting algebra. The key idea of our method is to use Tensor Train decomposition for variational parameters, which allows us to train GPs with billions of inducing inputs and achieve state-of-the-art results on several benchmarks. Further, our approach allows for training kernels based on deep neural networks without any modifications to the underlying GP model. A neural network learns a multidimensional embedding for the data, which is used by the GP to make the final prediction. We train GP and neural network parameters end-to-end without pretraining, through maximization of GP marginal likelihood. We show the efficiency of the proposed approach on several regression and classification benchmark datasets including MNIST, CIFAR-10, and Airline. △ Less

Submitted 17 January, 2018; v1 submitted 19 October, 2017; originally announced October 2017.
arXiv:1701.08816 [pdf, other]

cs.CV cs.LG

Fully Convolutional Architectures for Multi-Class Segmentation in Chest Radiographs

Authors: Alexey A. Novikov, Dimitrios Lenis, David Major, Jiri Hladůvka, Maria Wimmer, Katja Bühler

Abstract: The success of deep convolutional neural networks on image classification and recognition tasks has led to new applications in very diversified contexts, including the field of medical imaging. In this paper we investigate and propose neural network architectures for automated multi-class segmentation of anatomical organs in chest radiographs, namely for lungs, clavicles and heart. We address seve… ▽ More The success of deep convolutional neural networks on image classification and recognition tasks has led to new applications in very diversified contexts, including the field of medical imaging. In this paper we investigate and propose neural network architectures for automated multi-class segmentation of anatomical organs in chest radiographs, namely for lungs, clavicles and heart. We address several open challenges including model overfitting, reducing number of parameters and handling of severely imbalanced data in CXR by fusing recent concepts in convolutional networks and adapting them to the segmentation problem task in CXR. We demonstrate that our architecture combining delayed subsampling, exponential linear units, highly restrictive regularization and a large number of high resolution low level abstract features outperforms state-of-the-art methods on all considered organs, as well as the human observer on lungs and heart. The models use a multi-class configuration with three target classes and are trained and tested on the publicly available JSRT database, consisting of 247 X-ray images the ground-truth masks for which are available in the SCR database. Our best performing model, trained with the loss function based on the Dice coefficient, reached mean Jaccard overlap scores of 95.0\% for lungs, 86.8\% for clavicles and 88.2\% for heart. This architecture outperformed the human observer results for lungs and heart. △ Less

Submitted 13 February, 2018; v1 submitted 30 January, 2017; originally announced January 2017.

Comments: Final pre-print version accepted for publication in TMI Added new content: * additional evaluations * additional figures * improving the old content
arXiv:1611.03214 [pdf, other]

cs.LG

Ultimate tensorization: compressing convolutional and FC layers alike

Authors: Timur Garipov, Dmitry Podoprikhin, Alexander Novikov, Dmitry Vetrov

Abstract: Convolutional neural networks excel in image recognition tasks, but this comes at the cost of high computational and memory complexity. To tackle this problem, [1] developed a tensor factorization framework to compress fully-connected layers. In this paper, we focus on compressing convolutional layers. We show that while the direct application of the tensor framework [1] to the 4-dimensional kerne… ▽ More Convolutional neural networks excel in image recognition tasks, but this comes at the cost of high computational and memory complexity. To tackle this problem, [1] developed a tensor factorization framework to compress fully-connected layers. In this paper, we focus on compressing convolutional layers. We show that while the direct application of the tensor framework [1] to the 4-dimensional kernel of convolution does compress the layer, we can do better. We reshape the convolutional kernel into a tensor of higher order and factorize it. We combine the proposed approach with the previous work to compress both convolutional and fully-connected layers of a network and achieve 80x network compression rate with 1.1% accuracy drop on the CIFAR-10 dataset. △ Less

Submitted 10 November, 2016; originally announced November 2016.

Comments: NIPS 2016 workshop: Learning with Tensors: Why Now and How?
arXiv:1605.03795 [pdf, other]

stat.ML cs.LG

Exponential Machines

Authors: Alexander Novikov, Mikhail Trofimov, Ivan Oseledets

Abstract: Modeling interactions between features improves the performance of machine learning solutions in many domains (e.g. recommender systems or sentiment analysis). In this paper, we introduce Exponential Machines (ExM), a predictor that models all interactions of every order. The key idea is to represent an exponentially large tensor of parameters in a factorized format called Tensor Train (TT). The T… ▽ More Modeling interactions between features improves the performance of machine learning solutions in many domains (e.g. recommender systems or sentiment analysis). In this paper, we introduce Exponential Machines (ExM), a predictor that models all interactions of every order. The key idea is to represent an exponentially large tensor of parameters in a factorized format called Tensor Train (TT). The Tensor Train format regularizes the model and lets you control the number of underlying parameters. To train the model, we develop a stochastic Riemannian optimization procedure, which allows us to fit tensors with 2^160 entries. We show that the model achieves state-of-the-art performance on synthetic data with high-order interactions and that it works on par with high-order factorization machines on a recommender system dataset MovieLens 100K. △ Less

Submitted 8 December, 2017; v1 submitted 12 May, 2016; originally announced May 2016.

Comments: ICLR-2017 workshop track paper
arXiv:1509.06569 [pdf, ps, other]

cs.LG cs.NE

Tensorizing Neural Networks

Authors: Alexander Novikov, Dmitry Podoprikhin, Anton Osokin, Dmitry Vetrov

Abstract: Deep neural networks currently demonstrate state-of-the-art performance in several domains. At the same time, models of this class are very demanding in terms of computational resources. In particular, a large amount of memory is required by commonly used fully-connected layers, making it hard to use the models on low-end devices and stopping the further increase of the model size. In this paper w… ▽ More Deep neural networks currently demonstrate state-of-the-art performance in several domains. At the same time, models of this class are very demanding in terms of computational resources. In particular, a large amount of memory is required by commonly used fully-connected layers, making it hard to use the models on low-end devices and stopping the further increase of the model size. In this paper we convert the dense weight matrices of the fully-connected layers to the Tensor Train format such that the number of parameters is reduced by a huge factor and at the same time the expressive power of the layer is preserved. In particular, for the Very Deep VGG networks we report the compression factor of the dense weight matrix of a fully-connected layer up to 200000 times leading to the compression factor of the whole network up to 7 times. △ Less

Submitted 20 December, 2015; v1 submitted 22 September, 2015; originally announced September 2015.
arXiv:cs/0205058 [pdf, ps, other]

cs.NI

Content Distribution in Unicast Replica Meshes

Authors: Alexei Novikov

Abstract: We propose centralized algorithm of data distribution in the unicast p2p network. Good example of such networks are meshes of WWW and FTP mirrors. Simulation of data propogation for different network topologies is performed and it is shown that proposed method performs up to 200% better then common apporaches We propose centralized algorithm of data distribution in the unicast p2p network. Good example of such networks are meshes of WWW and FTP mirrors. Simulation of data propogation for different network topologies is performed and it is shown that proposed method performs up to 200% better then common apporaches △ Less

Submitted 21 May, 2002; originally announced May 2002.

Comments: 14 pages, 8 figures, submitted to ACM Transactions on Internet Technology

ACM Class: H.3.3; H.3.4; H.3.5

Search v0.5.6 released 2020-02-24