Showing 1–50 of 73 results for author: Chow, Y

Search v0.5.6 released 2020-02-24

arXiv:2408.01562 [pdf]

cs.CY

Welfare, sustainability, and equity evaluation of the New York City Interborough Express using spatially heterogeneous mode choice models

Authors: Hai Yang, Hongying Wu, Lauren Whang, Xiyuan Ren, Joseph Y. J. Chow

Abstract: The Metropolitan Transit Authority (MTA) proposed building a new light rail route called the Interborough Express (IBX) to provide a direct, fast transit linkage between Queens and Brooklyn. An open-access synthetic citywide trip agenda dataset and a block-group-level mode choice model are used to assess the potential impact IBX could bring to New York City (NYC). IBX could save 28.1 minutes to po… ▽ More The Metropolitan Transit Authority (MTA) proposed building a new light rail route called the Interborough Express (IBX) to provide a direct, fast transit linkage between Queens and Brooklyn. An open-access synthetic citywide trip agenda dataset and a block-group-level mode choice model are used to assess the potential impact IBX could bring to New York City (NYC). IBX could save 28.1 minutes to potential riders across the city. For travelers either going to or departing from areas close to IBX, the average time saving is projected to be 29.7 minutes. IBX is projected to have more than 254 thousand daily ridership after its completion (69% higher than reported in the official IBX proposal). Among those riders, more than 78 thousand people (30.8%) would come from low-income households while 165 thousand people (64.7%) would start or end along the IBX corridor. The addition of IBX would attract more than 50 thousand additional daily trips to transit mode, among which more than 16 thousand would be switched from using private vehicles, reducing potential greenhouse gas (GHG) emissions by 29.28 metric tons per day. IBX can also bring significant consumer surplus benefits to the communities, which are estimated to be $1.25 USD per trip, or as high as $1.64 per trip made by a low-income traveler. While benefits are proportionately higher for lower-income users, the service does not appear to significantly reduce the proportion of travelers whose consumer surpluses fall below 10% of the population average (already quite low). △ Less

Submitted 2 August, 2024; originally announced August 2024.
arXiv:2406.00024 [pdf, other]

cs.CL cs.AI cs.ET cs.LG

Embedding-Aligned Language Models

Authors: Guy Tennenholtz, Yinlam Chow, Chih-Wei Hsu, Lior Shani, Ethan Liang, Craig Boutilier

Abstract: We propose a novel approach for training large language models (LLMs) to adhere to objectives defined within a latent embedding space. Our method leverages reinforcement learning (RL), treating a pre-trained LLM as an environment. Our embedding-aligned guided language (EAGLE) agent is trained to iteratively steer the LLM's generation towards optimal regions of the latent embedding space, w.r.t. so… ▽ More We propose a novel approach for training large language models (LLMs) to adhere to objectives defined within a latent embedding space. Our method leverages reinforcement learning (RL), treating a pre-trained LLM as an environment. Our embedding-aligned guided language (EAGLE) agent is trained to iteratively steer the LLM's generation towards optimal regions of the latent embedding space, w.r.t. some predefined criterion. We demonstrate the effectiveness of the EAGLE agent using the MovieLens 25M dataset to surface content gaps that satisfy latent user demand. We also demonstrate the benefit of using an optimal design of a state-dependent action set to improve EAGLE's efficiency. Our work paves the way for controlled and grounded text generation using LLMs, ensuring consistency with domain-specific knowledge and data representations. △ Less

Submitted 24 May, 2024; originally announced June 2024.
arXiv:2404.05053 [pdf, ps, other]

math.CO cs.GT

Cooking Poisons: Thinking Laterally with Game Theory

Authors: Timothy Y. Chow

Abstract: We revive an old lateral-thinking puzzle by Michael Rabin, involving poisons with strange properties. We show that the puzzle admits several unintended solutions that are just as interesting as the intended solution. Analyzing these alternative solutions using game theory yields surprisingly subtle results and several unanswered questions. We revive an old lateral-thinking puzzle by Michael Rabin, involving poisons with strange properties. We show that the puzzle admits several unintended solutions that are just as interesting as the intended solution. Analyzing these alternative solutions using game theory yields surprisingly subtle results and several unanswered questions. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: 7 pages, to be published in Mathematics Magazine

MSC Class: 91A05
arXiv:2402.15957 [pdf, other]

cs.LG

DynaMITE-RL: A Dynamic Model for Improved Temporal Meta-Reinforcement Learning

Authors: Anthony Liang, Guy Tennenholtz, Chih-wei Hsu, Yinlam Chow, Erdem Bıyık, Craig Boutilier

Abstract: We introduce DynaMITE-RL, a meta-reinforcement learning (meta-RL) approach to approximate inference in environments where the latent state evolves at varying rates. We model episode sessions - parts of the episode where the latent state is fixed - and propose three key modifications to existing meta-RL methods: consistency of latent information within sessions, session masking, and prior latent co… ▽ More We introduce DynaMITE-RL, a meta-reinforcement learning (meta-RL) approach to approximate inference in environments where the latent state evolves at varying rates. We model episode sessions - parts of the episode where the latent state is fixed - and propose three key modifications to existing meta-RL methods: consistency of latent information within sessions, session masking, and prior latent conditioning. We demonstrate the importance of these modifications in various domains, ranging from discrete Gridworld environments to continuous-control and simulated robot assistive tasks, demonstrating that DynaMITE-RL significantly outperforms state-of-the-art baselines in sample efficiency and inference returns. △ Less

Submitted 24 February, 2024; originally announced February 2024.
arXiv:2402.14925 [pdf, other]

cs.IT cs.LG math.ST

Efficient Unbiased Sparsification

Authors: Leighton Barnes, Stephen Cameron, Timothy Chow, Emma Cohen, Keith Frankston, Benjamin Howard, Fred Kochman, Daniel Scheinerman, Jeffrey VanderKam

Abstract: An unbiased $m$-sparsification of a vector $p\in \mathbb{R}^n$ is a random vector $Q\in \mathbb{R}^n$ with mean $p$ that has at most $m<n$ nonzero coordinates. Unbiased sparsification compresses the original vector without introducing bias; it arises in various contexts, such as in federated learning and sampling sparse probability distributions. Ideally, unbiased sparsification should also minimi… ▽ More An unbiased $m$-sparsification of a vector $p\in \mathbb{R}^n$ is a random vector $Q\in \mathbb{R}^n$ with mean $p$ that has at most $m<n$ nonzero coordinates. Unbiased sparsification compresses the original vector without introducing bias; it arises in various contexts, such as in federated learning and sampling sparse probability distributions. Ideally, unbiased sparsification should also minimize the expected value of a divergence function $\mathsf{Div}(Q,p)$ that measures how far away $Q$ is from the original $p$. If $Q$ is optimal in this sense, then we call it efficient. Our main results describe efficient unbiased sparsifications for divergences that are either permutation-invariant or additively separable. Surprisingly, the characterization for permutation-invariant divergences is robust to the choice of divergence function, in the sense that our class of optimal $Q$ for squared Euclidean distance coincides with our class of optimal $Q$ for Kullback-Leibler divergence, or indeed any of a wide variety of divergences. △ Less

Submitted 24 July, 2024; v1 submitted 22 February, 2024; originally announced February 2024.
arXiv:2401.06619 [pdf, other]

cs.SE

doi 10.1145/3597503.3639184

PyTy: Repairing Static Type Errors in Python

Authors: Yiu Wai Chow, Luca Di Grazia, Michael Pradel

Abstract: Gradual typing enables developers to annotate types of their own choosing, offering a flexible middle ground between no type annotations and a fully statically typed language. As more and more code bases get type-annotated, static type checkers detect an increasingly large number of type errors. Unfortunately, fixing these errors requires manual effort, hampering the adoption of gradual typing in… ▽ More Gradual typing enables developers to annotate types of their own choosing, offering a flexible middle ground between no type annotations and a fully statically typed language. As more and more code bases get type-annotated, static type checkers detect an increasingly large number of type errors. Unfortunately, fixing these errors requires manual effort, hampering the adoption of gradual typing in practice. This paper presents PyTy, an automated program repair approach targeted at statically detectable type errors in Python. The problem of repairing type errors deserves specific attention because it exposes particular repair patterns, offers a warning message with hints about where and how to apply a fix, and because gradual type checking serves as an automatic way to validate fixes. We addresses this problem through three contributions: (i) an empirical study that investigates how developers fix Python type errors, showing a diverse set of fixing strategies with some recurring patterns; (ii) an approach to automatically extract type error fixes, which enables us to create a dataset of 2,766 error-fix pairs from 176 GitHub repositories, named PyTyDefects; (iii) the first learning-based repair technique for fixing type errors in Python. Motivated by the relative data scarcity of the problem, the neural model at the core of PyTy is trained via cross-lingual transfer learning. Our evaluation shows that PyTy offers fixes for ten frequent categories of type errors, successfully addressing 85.4% of 281 real-world errors. This effectiveness outperforms state-of-the-art large language models asked to repair type errors (by 2.1x) and complements a previous technique aimed at type errors that manifest at runtime. Finally, 20 out of 30 pull requests with PyTy-suggested fixes have been merged by developers, showing the usefulness of PyTy in practice. △ Less

Submitted 12 January, 2024; originally announced January 2024.

Journal ref: ICSE 2024
arXiv:2311.02085 [pdf, other]

cs.IR cs.AI

Preference Elicitation with Soft Attributes in Interactive Recommendation

Authors: Erdem Biyik, Fan Yao, Yinlam Chow, Alex Haig, Chih-wei Hsu, Mohammad Ghavamzadeh, Craig Boutilier

Abstract: Preference elicitation plays a central role in interactive recommender systems. Most preference elicitation approaches use either item queries that ask users to select preferred items from a slate, or attribute queries that ask them to express their preferences for item characteristics. Unfortunately, users often wish to describe their preferences using soft attributes for which no ground-truth se… ▽ More Preference elicitation plays a central role in interactive recommender systems. Most preference elicitation approaches use either item queries that ask users to select preferred items from a slate, or attribute queries that ask them to express their preferences for item characteristics. Unfortunately, users often wish to describe their preferences using soft attributes for which no ground-truth semantics is given. Leveraging concept activation vectors for soft attribute semantics, we develop novel preference elicitation methods that can accommodate soft attributes and bring together both item and attribute-based preference elicitation. Our techniques query users using both items and soft attributes to update the recommender system's belief about their preferences to improve recommendation quality. We demonstrate the effectiveness of our methods vis-a-vis competing approaches on both synthetic and real-world datasets. △ Less

Submitted 22 October, 2023; originally announced November 2023.
arXiv:2310.17475 [pdf]

cs.CY

Analytical model for large-scale design of sidewalk delivery robot systems

Authors: Hai Yang, Yuchen Du, Tho V. Le, Joseph Y. J. Chow

Abstract: With the rise in demand for local deliveries and e-commerce, robotic deliveries are being considered as efficient and sustainable solutions. However, the deployment of such systems can be highly complex due to numerous factors involving stochastic demand, stochastic charging and maintenance needs, complex routing, etc. We propose a model that uses continuous approximation methods for evaluating se… ▽ More With the rise in demand for local deliveries and e-commerce, robotic deliveries are being considered as efficient and sustainable solutions. However, the deployment of such systems can be highly complex due to numerous factors involving stochastic demand, stochastic charging and maintenance needs, complex routing, etc. We propose a model that uses continuous approximation methods for evaluating service trade-offs that consider the unique characteristics of large-scale sidewalk delivery robot systems used to serve online food deliveries. The model captures both the initial cost and the operation cost of the delivery system and evaluates the impact of constraints and operation strategies on the deployment. By minimizing the system cost, variables related to the system design can be determined. First, the minimization problem is formulated based on a homogeneous area, and the optimal system cost can be derived as a closed-form expression. By evaluating the expression, relationships between variables and the system cost can be directly obtained. We then apply the model in neighborhoods in New York City to evaluate the cost of deploying the sidewalk delivery robot system in a real-world scenario. The results shed light on the potential of deploying such a system in the future. △ Less

Submitted 26 October, 2023; originally announced October 2023.
arXiv:2310.06176 [pdf, other]

cs.AI

Factual and Personalized Recommendations using Language Models and Reinforcement Learning

Authors: Jihwan Jeong, Yinlam Chow, Guy Tennenholtz, Chih-Wei Hsu, Azamat Tulepbergenov, Mohammad Ghavamzadeh, Craig Boutilier

Abstract: Recommender systems (RSs) play a central role in connecting users to content, products, and services, matching candidate items to users based on their preferences. While traditional RSs rely on implicit user feedback signals, conversational RSs interact with users in natural language. In this work, we develop a comPelling, Precise, Personalized, Preference-relevant language model (P4LM) that recom… ▽ More Recommender systems (RSs) play a central role in connecting users to content, products, and services, matching candidate items to users based on their preferences. While traditional RSs rely on implicit user feedback signals, conversational RSs interact with users in natural language. In this work, we develop a comPelling, Precise, Personalized, Preference-relevant language model (P4LM) that recommends items to users while putting emphasis on explaining item characteristics and their relevance. P4LM uses the embedding space representation of a user's preferences to generate compelling responses that are factually-grounded and relevant w.r.t. the user's preferences. Moreover, we develop a joint reward function that measures precision, appeal, and personalization, which we use as AI-based feedback in a reinforcement learning-based language model framework. Using the MovieLens 25M dataset, we demonstrate that P4LM delivers compelling, personalized movie narratives to users. △ Less

Submitted 9 October, 2023; originally announced October 2023.
arXiv:2310.04475 [pdf, other]

cs.CL cs.AI cs.LG

Demystifying Embedding Spaces using Large Language Models

Authors: Guy Tennenholtz, Yinlam Chow, Chih-Wei Hsu, Jihwan Jeong, Lior Shani, Azamat Tulepbergenov, Deepak Ramachandran, Martin Mladenov, Craig Boutilier

Abstract: Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format. Nevertheless, they often preclude direct interpretation. While downstream tasks make use of these compressed representations, meaningful interpretation usually requires visualization using dimensionality reduction or specialized machin… ▽ More Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format. Nevertheless, they often preclude direct interpretation. While downstream tasks make use of these compressed representations, meaningful interpretation usually requires visualization using dimensionality reduction or specialized machine learning interpretability methods. This paper addresses the challenge of making such embeddings more interpretable and broadly useful, by employing Large Language Models (LLMs) to directly interact with embeddings -- transforming abstract vectors into understandable narratives. By injecting embeddings into LLMs, we enable querying and exploration of complex embedding data. We demonstrate our approach on a variety of diverse tasks, including: enhancing concept activation vectors (CAVs), communicating novel embedded entities, and decoding user preferences in recommender systems. Our work couples the immense information potential of embeddings with the interpretative power of LLMs. △ Less

Submitted 13 March, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: Accepted to ICLR 2024
arXiv:2305.09452 [pdf]

cs.AI cs.CY

A sequential transit network design algorithm with optimal learning under correlated beliefs

Authors: Gyugeun Yoon, Joseph Y. J. Chow

Abstract: Mobility service route design requires demand information to operate in a service region. Transit planners and operators can access various data sources including household travel survey data and mobile device location logs. However, when implementing a mobility system with emerging technologies, estimating demand becomes harder because of limited data resulting in uncertainty. This study proposes… ▽ More Mobility service route design requires demand information to operate in a service region. Transit planners and operators can access various data sources including household travel survey data and mobile device location logs. However, when implementing a mobility system with emerging technologies, estimating demand becomes harder because of limited data resulting in uncertainty. This study proposes an artificial intelligence-driven algorithm that combines sequential transit network design with optimal learning to address the operation under limited data. An operator gradually expands its route system to avoid risks from inconsistency between designed routes and actual travel demand. At the same time, observed information is archived to update the knowledge that the operator currently uses. Three learning policies are compared within the algorithm: multi-armed bandit, knowledge gradient, and knowledge gradient with correlated beliefs. For validation, a new route system is designed on an artificial network based on public use microdata areas in New York City. Prior knowledge is reproduced from the regional household travel survey data. The results suggest that exploration considering correlations can achieve better performance compared to greedy choices in general. In future work, the problem may incorporate more complexities such as demand elasticity to travel time, no limitations to the number of transfers, and costs for expansion. △ Less

Submitted 26 January, 2024; v1 submitted 16 May, 2023; originally announced May 2023.
arXiv:2305.04324 [pdf]

cs.CY eess.SY

A generalized network level disruption strategy selection model for urban public transport systems

Authors: Qi Liu, Joseph Y. J. Chow

Abstract: A fast recovery from disruptions is of vital importance for the reliability of transit systems. This study presents a new attempt to tackle the transit disruption mitigation problem in a comprehensive and hierarchical way. A network level strategy selection optimization model is formulated as a joint routing and resource allocation (nJRRA) problem. By constraining the problem further into an epsil… ▽ More A fast recovery from disruptions is of vital importance for the reliability of transit systems. This study presents a new attempt to tackle the transit disruption mitigation problem in a comprehensive and hierarchical way. A network level strategy selection optimization model is formulated as a joint routing and resource allocation (nJRRA) problem. By constraining the problem further into an epsilon-constrained nJRRA problem, classic solution algorithms can be applied to solve the quadratically constrained quadratic program (QCQP). On top of this "basic model", we propose adding a decision to delay the resource allocation decisions up to a maximum initiation time when the incident duration is stochastic. To test the models, a quasi-dynamic evaluation program with a given incident duration distribution is constructed using discretized time steps and discrete distributions. Five different demand patterns and four different disruption duration distributions (20 combinations) are tested on a toy transit network. The results show that the two models outperform benchmark strategies such as using only line level adjustment or only bus bridging. They also highlight conditions when delaying the decision is preferred. △ Less

Submitted 7 May, 2023; originally announced May 2023.
arXiv:2305.00818 [pdf]

cs.GT cs.CY

On-demand Mobility-as-a-Service platform assignment games with guaranteed stable outcomes

Authors: Bingqing Liu, Joseph Y. J. Chow

Abstract: Mobility-as-a-Service (MaaS) systems are two-sided markets, with two mutually exclusive sets of agents, i.e., travelers/users and operators, forming a mobility ecosystem in which multiple operators compete or cooperate to serve customers under a governing platform provider. This study proposes a MaaS platform equilibrium model based on many-to-many assignment games incorporating both fixed-route t… ▽ More Mobility-as-a-Service (MaaS) systems are two-sided markets, with two mutually exclusive sets of agents, i.e., travelers/users and operators, forming a mobility ecosystem in which multiple operators compete or cooperate to serve customers under a governing platform provider. This study proposes a MaaS platform equilibrium model based on many-to-many assignment games incorporating both fixed-route transit services and mobility-on-demand (MOD) services. The matching problem is formulated as a convex multicommodity flow network design problem under congestion that captures the cost of accessing MOD services. The local stability conditions reflect a generalization of Wardrop's principles that include operators' decisions. Due to the presence of congestion, the problem may result in non-stable designs, and a subsidy mechanism from the platform is proposed to guarantee local stability. A new exact solution algorithm to the matching problem is proposed based on a branch and bound framework with a Frank-Wolfe algorithm integrated with Lagrangian relaxation and subgradient optimization, which guarantees the optimality of the matching problem but not stability. A heuristic which integrates stability conditions and subsidy design is proposed, which reaches either an optimal MaaS platform equilibrium solution with global stability, or a feasible locally stable solution that may require subsidy. For the heuristic, a worst-case bound and condition for obtaining an exact solution are both identified. An expanded Sioux Falls network test with 82 nodes and 748 links derives generalizable insights about the model for coopetitive interdependencies between operators sharing the platform, handling congestion effects in MOD services, effects of local stability on investment impacts, and illustrating inequities that may arise under heterogeneous populations. △ Less

Submitted 21 June, 2024; v1 submitted 1 May, 2023; originally announced May 2023.
arXiv:2303.05126 [pdf, other]

eess.IV cs.CV

Hybrid Dual Mean-Teacher Network With Double-Uncertainty Guidance for Semi-Supervised Segmentation of MRI Scans

Authors: Jiayi Zhu, Bart Bolsterlee, Brian V. Y. Chow, Yang Song, Erik Meijering

Abstract: Semi-supervised learning has made significant progress in medical image segmentation. However, existing methods primarily utilize information acquired from a single dimensionality (2D/3D), resulting in sub-optimal performance on challenging data, such as magnetic resonance imaging (MRI) scans with multiple objects and highly anisotropic resolution. To address this issue, we present a Hybrid Dual M… ▽ More Semi-supervised learning has made significant progress in medical image segmentation. However, existing methods primarily utilize information acquired from a single dimensionality (2D/3D), resulting in sub-optimal performance on challenging data, such as magnetic resonance imaging (MRI) scans with multiple objects and highly anisotropic resolution. To address this issue, we present a Hybrid Dual Mean-Teacher (HD-Teacher) model with hybrid, semi-supervised, and multi-task learning to achieve highly effective semi-supervised segmentation. HD-Teacher employs a 2D and a 3D mean-teacher network to produce segmentation labels and signed distance fields from the hybrid information captured in both dimensionalities. This hybrid learning mechanism allows HD-Teacher to combine the `best of both worlds', utilizing features extracted from either 2D, 3D, or both dimensions to produce outputs as it sees fit. Outputs from 2D and 3D teacher models are also dynamically combined, based on their individual uncertainty scores, into a single hybrid prediction, where the hybrid uncertainty is estimated. We then propose a hybrid regularization module to encourage both student models to produce results close to the uncertainty-weighted hybrid prediction. The hybrid uncertainty suppresses unreliable knowledge in the hybrid prediction, leaving only useful information to improve network performance further. Extensive experiments of binary and multi-class segmentation conducted on three MRI datasets demonstrate the effectiveness of the proposed framework. Code is available at https://github.com/ThisGame42/Hybrid-Teacher. △ Less

Submitted 9 March, 2023; originally announced March 2023.
arXiv:2302.10850 [pdf, other]

cs.LG cs.AI cs.CL

Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management

Authors: Dhawal Gupta, Yinlam Chow, Aza Tulepbergenov, Mohammad Ghavamzadeh, Craig Boutilier

Abstract: Reinforcement learning (RL) has shown great promise for developing dialogue management (DM) agents that are non-myopic, conduct rich conversations, and maximize overall user satisfaction. Despite recent developments in RL and language models (LMs), using RL to power conversational chatbots remains challenging, in part because RL requires online exploration to learn effectively, whereas collecting… ▽ More Reinforcement learning (RL) has shown great promise for developing dialogue management (DM) agents that are non-myopic, conduct rich conversations, and maximize overall user satisfaction. Despite recent developments in RL and language models (LMs), using RL to power conversational chatbots remains challenging, in part because RL requires online exploration to learn effectively, whereas collecting novel human-bot interactions can be expensive and unsafe. This issue is exacerbated by the combinatorial action spaces facing these algorithms, as most LM agents generate responses at the word level. We develop a variety of RL algorithms, specialized to dialogue planning, that leverage recent Mixture-of-Expert Language Models (MoE-LMs) -- models that capture diverse semantics, generate utterances reflecting different intents, and are amenable for multi-turn DM. By exploiting MoE-LM structure, our methods significantly reduce the size of the action space and improve the efficacy of RL-based DM. We evaluate our methods in open-domain dialogue to demonstrate their effectiveness w.r.t.\ the diversity of intent in generated utterances and overall DM performance. △ Less

Submitted 29 October, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

Comments: Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)
arXiv:2301.10545 [pdf, other]

cs.SE cs.CR cs.PL

Beware of the Unexpected: Bimodal Taint Analysis

Authors: Yiu Wai Chow, Max Schäfer, Michael Pradel

Abstract: Static analysis is a powerful tool for detecting security vulnerabilities and other programming problems. Global taint tracking, in particular, can spot vulnerabilities arising from complicated data flow across multiple functions. However, precisely identifying which flows are problematic is challenging, and sometimes depends on factors beyond the reach of pure program analysis, such as convention… ▽ More Static analysis is a powerful tool for detecting security vulnerabilities and other programming problems. Global taint tracking, in particular, can spot vulnerabilities arising from complicated data flow across multiple functions. However, precisely identifying which flows are problematic is challenging, and sometimes depends on factors beyond the reach of pure program analysis, such as conventions and informal knowledge. For example, learning that a parameter "name" of an API function "locale" ends up in a file path is surprising and potentially problematic. In contrast, it would be completely unsurprising to find that a parameter "command" passed to an API function "execaCommand" is eventually interpreted as part of an operating-system command. This paper presents Fluffy, a bimodal taint analysis that combines static analysis, which reasons about data flow, with machine learning, which probabilistically determines which flows are potentially problematic. The key idea is to let machine learning models predict from natural language information involved in a taint flow, such as API names, whether the flow is expected or unexpected, and to inform developers only about the latter. We present a general framework and instantiate it with four learned models, which offer different trade-offs between the need to annotate training data and the accuracy of predictions. We implement Fluffy on top of the CodeQL analysis framework and apply it to 250K JavaScript projects. Evaluating on five common vulnerability types, we find that Fluffy achieves an F1 score of 0.85 or more on four of them across a variety of datasets. △ Less

Submitted 25 January, 2023; originally announced January 2023.

Journal ref: International Symposium on Software Testing and Analysis (ISSTA), 2023
arXiv:2212.14800 [pdf, other]

cs.LG cs.AI

A deep real options policy for sequential service region design and timing

Authors: Srushti Rath, Joseph Y. J. Chow

Abstract: As various city agencies and mobility operators navigate toward innovative mobility solutions, there is a need for strategic flexibility in well-timed investment decisions in the design and timing of mobility service regions, i.e. cast as "real options" (RO). This problem becomes increasingly challenging with multiple interacting RO in such investments. We propose a scalable machine learning based… ▽ More As various city agencies and mobility operators navigate toward innovative mobility solutions, there is a need for strategic flexibility in well-timed investment decisions in the design and timing of mobility service regions, i.e. cast as "real options" (RO). This problem becomes increasingly challenging with multiple interacting RO in such investments. We propose a scalable machine learning based RO framework for multi-period sequential service region design & timing problem for mobility-on-demand services, framed as a Markov decision process with non-stationary stochastic variables. A value function approximation policy from literature uses multi-option least squares Monte Carlo simulation to get a policy value for a set of interdependent investment decisions as deferral options (CR policy). The goal is to determine the optimal selection and timing of a set of zones to include in a service region. However, prior work required explicit enumeration of all possible sequences of investments. To address the combinatorial complexity of such enumeration, we propose a new variant "deep" RO policy using an efficient recurrent neural network (RNN) based ML method (CR-RNN policy) to sample sequences to forego the need for enumeration, making network design & timing policy tractable for large scale implementation. Experiments on multiple service region scenarios in New York City (NYC) shows the proposed policy substantially reduces the overall computational cost (time reduction for RO evaluation of > 90% of total investment sequences is achieved), with zero to near-zero gap compared to the benchmark. A case study of sequential service region design for expansion of MoD services in Brooklyn, NYC show that using the CR-RNN policy to determine optimal RO investment strategy yields a similar performance (0.5% within CR policy value) with significantly reduced computation time (about 5.4 times faster). △ Less

Submitted 30 December, 2022; originally announced December 2022.
arXiv:2212.00289 [pdf]

cs.CY

Dial-a-ride problem with modular platooning and en-route transfers

Authors: Zhexi Fu, Joseph Y. J. Chow

Abstract: Modular vehicles (MV) possess the ability to physically connect/disconnect with each other and travel in platoon with less energy consumption. A fleet of demand-responsive transit vehicles with such technology can serve passengers door to door or have vehicles deviate to platoon with each other to travel at lower cost and allow for en-route passenger transfers before splitting. A mixed integer lin… ▽ More Modular vehicles (MV) possess the ability to physically connect/disconnect with each other and travel in platoon with less energy consumption. A fleet of demand-responsive transit vehicles with such technology can serve passengers door to door or have vehicles deviate to platoon with each other to travel at lower cost and allow for en-route passenger transfers before splitting. A mixed integer linear programming (MILP) model is formulated to solve this "modular dial-a-ride problem" (MDARP). A heuristic algorithm based on Steiner-tree-inspired large neighborhood search is developed to solve the MDARP for practical scenarios. A set of small-scale synthetic numerical experiments are tested to evaluate the optimality gap and computation time between exact solutions of the MDARP using commercial software and the proposed heuristic. Large-scale experiments are conducted on the Anaheim network with 378 candidate join/split nodes to further explore the potentials and identify the ideal operation scenarios of MVs. The results show that MV technology can save up to 52.0% in vehicle travel cost, 35.6% in passenger service time, and 29.4% in total cost against existing on-demand mobility services in the scenarios tested. Results suggest that MVs best benefit from platooning by serving "enclave pairs" as a hub-and-spoke service. △ Less

Submitted 23 December, 2022; v1 submitted 1 December, 2022; originally announced December 2022.
arXiv:2208.02294 [pdf, other]

cs.CL cs.LG

Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning

Authors: Deborah Cohen, Moonkyung Ryu, Yinlam Chow, Orgad Keller, Ido Greenberg, Avinatan Hassidim, Michael Fink, Yossi Matias, Idan Szpektor, Craig Boutilier, Gal Elidan

Abstract: Despite recent advances in natural language understanding and generation, and decades of research on the development of conversational bots, building automated agents that can carry on rich open-ended conversations with humans "in the wild" remains a formidable challenge. In this work we develop a real-time, open-ended dialogue system that uses reinforcement learning (RL) to power a bot's conversa… ▽ More Despite recent advances in natural language understanding and generation, and decades of research on the development of conversational bots, building automated agents that can carry on rich open-ended conversations with humans "in the wild" remains a formidable challenge. In this work we develop a real-time, open-ended dialogue system that uses reinforcement learning (RL) to power a bot's conversational skill at scale. Our work pairs the succinct embedding of the conversation state generated using SOTA (supervised) language models with RL techniques that are particularly suited to a dynamic action space that changes as the conversation progresses. Trained using crowd-sourced data, our novel system is able to substantially exceeds the (strong) baseline supervised model with respect to several metrics of interest in a live experiment with real users of the Google Assistant. △ Less

Submitted 25 July, 2022; originally announced August 2022.
arXiv:2206.13441 [pdf, other]

cs.AI eess.SY

doi 10.1016/j.trc.2022.103955

EMVLight: a Multi-agent Reinforcement Learning Framework for an Emergency Vehicle Decentralized Routing and Traffic Signal Control System

Authors: Haoran Su, Yaofeng D. Zhong, Joseph Y. J. Chow, Biswadip Dey, Li Jin

Abstract: Emergency vehicles (EMVs) play a crucial role in responding to time-critical calls such as medical emergencies and fire outbreaks in urban areas. Existing methods for EMV dispatch typically optimize routes based on historical traffic-flow data and design traffic signal pre-emption accordingly; however, we still lack a systematic methodology to address the coupling between EMV routing and traffic s… ▽ More Emergency vehicles (EMVs) play a crucial role in responding to time-critical calls such as medical emergencies and fire outbreaks in urban areas. Existing methods for EMV dispatch typically optimize routes based on historical traffic-flow data and design traffic signal pre-emption accordingly; however, we still lack a systematic methodology to address the coupling between EMV routing and traffic signal control. In this paper, we propose EMVLight, a decentralized reinforcement learning (RL) framework for joint dynamic EMV routing and traffic signal pre-emption. We adopt the multi-agent advantage actor-critic method with policy sharing and spatial discounted factor. This framework addresses the coupling between EMV navigation and traffic signal control via an innovative design of multi-class RL agents and a novel pressure-based reward function. The proposed methodology enables EMVLight to learn network-level cooperative traffic signal phasing strategies that not only reduce EMV travel time but also shortens the travel time of non-EMVs. Simulation-based experiments indicate that EMVLight enables up to a $42.6\%$ reduction in EMV travel time as well as an $23.5\%$ shorter average travel time compared with existing approaches. △ Less

Submitted 29 June, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

Comments: 19 figures, 10 tables. Manuscript extended on previous work arXiv:2109.05429, arXiv:2111.00278

Journal ref: Transportation Research Part C: Emerging Technologies Volume 146, January 2023, 103955
arXiv:2206.00059 [pdf, other]

cs.CL cs.AI

A Mixture-of-Expert Approach to RL-based Dialogue Management

Authors: Yinlam Chow, Aza Tulepbergenov, Ofir Nachum, MoonKyung Ryu, Mohammad Ghavamzadeh, Craig Boutilier

Abstract: Despite recent advancements in language models (LMs), their application to dialogue management (DM) problems and ability to carry on rich conversations remain a challenge. We use reinforcement learning (RL) to develop a dialogue agent that avoids being short-sighted (outputting generic utterances) and maximizes overall user satisfaction. Most existing RL approaches to DM train the agent at the wor… ▽ More Despite recent advancements in language models (LMs), their application to dialogue management (DM) problems and ability to carry on rich conversations remain a challenge. We use reinforcement learning (RL) to develop a dialogue agent that avoids being short-sighted (outputting generic utterances) and maximizes overall user satisfaction. Most existing RL approaches to DM train the agent at the word-level, and thus, have to deal with a combinatorially complex action space even for a medium-size vocabulary. As a result, they struggle to produce a successful and engaging dialogue even if they are warm-started with a pre-trained LM. To address this issue, we develop a RL-based DM using a novel mixture of expert language model (MoE-LM) that consists of (i) a LM capable of learning diverse semantics for conversation histories, (ii) a number of {\em specialized} LMs (or experts) capable of generating utterances corresponding to a particular attribute or personality, and (iii) a RL-based DM that performs dialogue planning with the utterances generated by the experts. Our MoE approach provides greater flexibility to generate sensible utterances with different intents and allows RL to focus on conversational-level DM. We compare it with SOTA baselines on open-domain dialogues and demonstrate its effectiveness both in terms of the diversity and sensibility of the generated utterances and the overall DM performance. △ Less

Submitted 31 May, 2022; originally announced June 2022.
arXiv:2205.05138 [pdf, other]

cs.LG

Efficient Risk-Averse Reinforcement Learning

Authors: Ido Greenberg, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor

Abstract: In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns. A risk measure often focuses on the worst returns out of the agent's experience. As a result, standard methods for risk-averse RL often ignore high-return strategies. We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypas… ▽ More In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns. A risk measure often focuses on the worst returns out of the agent's experience. As a result, standard methods for risk-averse RL often ignore high-return strategies. We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it. We also devise a novel Cross Entropy module for risk sampling, which (1) preserves risk aversion despite the soft risk; (2) independently improves sample efficiency. By separating the risk aversion of the sampler and the optimizer, we can sample episodes with poor conditions, yet optimize with respect to successful strategies. We combine these two concepts in CeSoR - Cross-entropy Soft-Risk optimization algorithm - which can be applied on top of any risk-averse policy gradient (PG) method. We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks, including in scenarios where standard risk-averse PG completely fails. △ Less

Submitted 12 October, 2022; v1 submitted 10 May, 2022; originally announced May 2022.

Comments: Accepted to NeurIPS 2022
arXiv:2204.05193 [pdf, other]

cs.CL cs.LG

Worldwide city transport typology prediction with sentence-BERT based supervised learning via Wikipedia

Authors: Srushti Rath, Joseph Y. J. Chow

Abstract: An overwhelming majority of the world's human population lives in urban areas and cities. Understanding a city's transportation typology is immensely valuable for planners and policy makers whose decisions can potentially impact millions of city residents. Despite the value of understanding a city's typology, labeled data (city and it's typology) is scarce, and spans at most a few hundred cities i… ▽ More An overwhelming majority of the world's human population lives in urban areas and cities. Understanding a city's transportation typology is immensely valuable for planners and policy makers whose decisions can potentially impact millions of city residents. Despite the value of understanding a city's typology, labeled data (city and it's typology) is scarce, and spans at most a few hundred cities in the current transportation literature. To break this barrier, we propose a supervised machine learning approach to predict a city's typology given the information in its Wikipedia page. Our method leverages recent breakthroughs in natural language processing, namely sentence-BERT, and shows how the text-based information from Wikipedia can be effectively used as a data source for city typology prediction tasks that can be applied to over 2000 cities worldwide. We propose a novel method for low-dimensional city representation using a city's Wikipedia page, which makes supervised learning of city typology labels tractable even with a few hundred labeled samples. These features are used with labeled city samples to train binary classifiers (logistic regression) for four different city typologies: (i) congestion, (ii) auto-heavy, (iii) transit-heavy, and (iv) bike-friendly cities resulting in reasonably high AUC scores of 0.87, 0.86, 0.61 and 0.94 respectively. Our approach provides sufficient flexibility for incorporating additional variables in the city typology models and can be applied to study other city typologies as well. Our findings can assist a diverse group of stakeholders in transportation and urban planning fields, and opens up new opportunities for using text-based information from Wikipedia (or similar platforms) as data sources in such fields. △ Less

Submitted 28 March, 2022; originally announced April 2022.
arXiv:2202.04849 [pdf, other]

cs.LG

SAFER: Data-Efficient and Safe Reinforcement Learning via Skill Acquisition

Authors: Dylan Slack, Yinlam Chow, Bo Dai, Nevan Wichers

Abstract: Methods that extract policy primitives from offline demonstrations using deep generative models have shown promise at accelerating reinforcement learning(RL) for new tasks. Intuitively, these methods should also help to trainsafeRLagents because they enforce useful skills. However, we identify these techniques are not well equipped for safe policy learning because they ignore negative experiences(… ▽ More Methods that extract policy primitives from offline demonstrations using deep generative models have shown promise at accelerating reinforcement learning(RL) for new tasks. Intuitively, these methods should also help to trainsafeRLagents because they enforce useful skills. However, we identify these techniques are not well equipped for safe policy learning because they ignore negative experiences(e.g., unsafe or unsuccessful), focusing only on positive experiences, which harms their ability to generalize to new tasks safely. Rather, we model the latentsafetycontextusing principled contrastive training on an offline dataset of demonstrations from many tasks, including both negative and positive experiences. Using this late variable, our RL framework, SAFEty skill pRiors (SAFER) extracts task-specific safe primitive skills to safely and successfully generalize to new tasks. In the inference stage, policies trained with SAFER learn to compose safe skills into successful policies. We theoretically characterize why SAFER can enforce safe policy learning and demonstrate its effectiveness on several complex safety-critical robotic grasping tasks inspired by the game Operation, in which SAFERoutperforms state-of-the-art primitive learning methods in success and safety. △ Less

Submitted 30 June, 2022; v1 submitted 10 February, 2022; originally announced February 2022.
arXiv:2202.02830 [pdf, other]

cs.IR cs.AI cs.LG

doi 10.1145/1122445.1122456

Discovering Personalized Semantics for Soft Attributes in Recommender Systems using Concept Activation Vectors

Authors: Christina Göpfert, Alex Haig, Yinlam Chow, Chih-wei Hsu, Ivan Vendrov, Tyler Lu, Deepak Ramachandran, Hubert Pham, Mohammad Ghavamzadeh, Craig Boutilier

Abstract: Interactive recommender systems have emerged as a promising paradigm to overcome the limitations of the primitive user feedback used by traditional recommender systems (e.g., clicks, item consumption, ratings). They allow users to express intent, preferences, constraints, and contexts in a richer fashion, often using natural language (including faceted search and dialogue). Yet more research is ne… ▽ More Interactive recommender systems have emerged as a promising paradigm to overcome the limitations of the primitive user feedback used by traditional recommender systems (e.g., clicks, item consumption, ratings). They allow users to express intent, preferences, constraints, and contexts in a richer fashion, often using natural language (including faceted search and dialogue). Yet more research is needed to find the most effective ways to use this feedback. One challenge is inferring a user's semantic intent from the open-ended terms or attributes often used to describe a desired item, and using it to refine recommendation results. Leveraging concept activation vectors (CAVs) [26], a recently developed approach for model interpretability in machine learning, we develop a framework to learn a representation that captures the semantics of such attributes and connects them to user preferences and behaviors in recommender systems. One novel feature of our approach is its ability to distinguish objective and subjective attributes (both subjectivity of degree and of sense), and associate different senses of subjective attributes with different users. We demonstrate on both synthetic and real-world data sets that our CAV representation not only accurately interprets users' subjective semantics, but can also be used to improve recommendations through interactive item critiquing. △ Less

Submitted 2 June, 2023; v1 submitted 6 February, 2022; originally announced February 2022.
arXiv:2109.14138 [pdf]

cs.CY

doi 10.1007/s12205-022-0995-3

A simulation sandbox to compare fixed-route, semi-flexible-transit, and on-demand microtransit system designs

Authors: Gyugeun Yoon, Joseph Y. J. Chow, Srushti Rath

Abstract: With advances in emerging technologies, options for operating public transit services have broadened from conventional fixed-route service through semi-flexible service to on-demand microtransit. Nevertheless, guidelines for deciding between these services remain limited in the real implementation. An open-source simulation sandbox is developed that can compare state-of-the-practice methods for ev… ▽ More With advances in emerging technologies, options for operating public transit services have broadened from conventional fixed-route service through semi-flexible service to on-demand microtransit. Nevertheless, guidelines for deciding between these services remain limited in the real implementation. An open-source simulation sandbox is developed that can compare state-of-the-practice methods for evaluating between the different types of public transit operations. For the case of the semi-flexible service, the Mobility Allowance Shuttle Transit (MAST) system is extended to include passenger deviations. A case study demonstrates the sandbox to evaluate and existing B63 bus route in Brooklyn, NY and compares its performance with the four other system designs spanning across the three service types for three different demand scenarios. △ Less

Submitted 19 January, 2022; v1 submitted 28 September, 2021; originally announced September 2021.

Journal ref: KSCE Journal of Civil Engineering 26, 3043-3062 (2022)
arXiv:2107.08217 [pdf]

cs.CY math.OC

doi 10.1080/21680566.2022.2060370

A congested schedule-based dynamic transit passenger flow estimator using stop count data

Authors: Qi Liu, Joseph Y. J. Chow

Abstract: A dynamic transit flow estimation model based on congested schedule-based transit equilibrium assignment is proposed using observations from stop count data. A solution algorithm is proposed for the mathematical program with schedule-based transit equilibrium constraints (MPEC) with polynomial computational complexity. The equilibrium constraints corresponding to the schedule-based hyperpath flow… ▽ More A dynamic transit flow estimation model based on congested schedule-based transit equilibrium assignment is proposed using observations from stop count data. A solution algorithm is proposed for the mathematical program with schedule-based transit equilibrium constraints (MPEC) with polynomial computational complexity. The equilibrium constraints corresponding to the schedule-based hyperpath flow are modified from the literature to fit into an estimation problem. Computational experiments are conducted first to verify the methodology with two synthetic data sets (one of which is Sioux Falls), followed by a validation of the method using bus data from Qingpu District in Shanghai, China, with 4 bus lines, 120 segments, 55 bus stops, and 120 one-minute intervals. The estimation model converged to 0.005 tolerance of relative change in 10 iterations. The estimated average of segment flows are only 2.5% off from the average of the observed segment flows; relative errors among segments are 42.5%. △ Less

Submitted 16 August, 2021; v1 submitted 17 July, 2021; originally announced July 2021.

Journal ref: Transportmetrica B: Transport Dynamics (2022)
arXiv:2102.05851 [pdf]

cs.CY

doi 10.1080/15568318.2022.2029633

An electric vehicle charging station access equilibrium model with M/D/C queueing

Authors: Bingqing Liu, Theodoros P. Pantelidis, Stephanie Tam, Joseph Y. J. Chow

Abstract: Despite the dependency of electric vehicle (EV) fleets on charging station availability, charging infrastructure remains limited in many cities. Three contributions are made. First, we propose an EV-to-charging station user equilibrium (UE) assignment model with a M/D/C queue approximation as a nondifferentiable nonlinear program. Second, to address the non-differentiability of the queue delay fun… ▽ More Despite the dependency of electric vehicle (EV) fleets on charging station availability, charging infrastructure remains limited in many cities. Three contributions are made. First, we propose an EV-to-charging station user equilibrium (UE) assignment model with a M/D/C queue approximation as a nondifferentiable nonlinear program. Second, to address the non-differentiability of the queue delay function, we propose an original solution algorithm based on the derivative-free Method of Successive Averages. Computational tests with a toy network show that the model converges to a UE. A working code in Python is provided free on Github with detailed test cases. Third, the model is applied to the large-scale case study of New York City Department of Citywide Administrative Services (NYC DCAS) fleet and EV charging station configuration as of July 8, 2020, which includes unique, real data for 563 Level 2 chargers and 4 Direct Current Fast Chargers (DCFCs) and 1484 EVs distributed over 512 Traffic Analysis Zones. The arrival rates of the assignment model are calibrated in the base scenario to fit an observed average utilization ratio of 7.6% in NYC. The model is then applied to compare charging station investment policies of DCFCs to Level 2 charging stations based on two alternative criteria. Results suggest a policy based on selecting locations with high utilization ratio instead of with high queue delay. △ Less

Submitted 3 September, 2021; v1 submitted 11 February, 2021; originally announced February 2021.

Journal ref: International Journal of Sustainable Transportation (2022)
arXiv:2012.00386 [pdf, other]

cs.LG cs.AI

Non-Stationary Latent Bandits

Authors: Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed, Mohammad Ghavamzadeh, Craig Boutilier

Abstract: Users of recommender systems often behave in a non-stationary fashion, due to their evolving preferences and tastes over time. In this work, we propose a practical approach for fast personalization to non-stationary users. The key idea is to frame this problem as a latent bandit, where the prototypical models of user behavior are learned offline and the latent state of the user is inferred online… ▽ More Users of recommender systems often behave in a non-stationary fashion, due to their evolving preferences and tastes over time. In this work, we propose a practical approach for fast personalization to non-stationary users. The key idea is to frame this problem as a latent bandit, where the prototypical models of user behavior are learned offline and the latent state of the user is inferred online from its interactions with the models. We call this problem a non-stationary latent bandit. We propose Thompson sampling algorithms for regret minimization in non-stationary latent bandits, analyze them, and evaluate them on a real-world dataset. The main strength of our approach is that it can be combined with rich offline-learned models, which can be misspecified, and are subsequently fine-tuned online using posterior sampling. In this way, we naturally combine the strengths of offline and online learning. △ Less

Submitted 1 December, 2020; originally announced December 2020.

Comments: 15 pages, 4 figures
arXiv:2010.11652 [pdf, other]

cs.LG stat.ML

CoinDICE: Off-Policy Confidence Interval Estimation

Authors: Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvári, Dale Schuurmans

Abstract: We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy's value, given only access to a static experience dataset collected by unknown behavior policies. Starting from a function space embedding of the linear program formulation of the $Q$-function, we obtain an optimization problem with gene… ▽ More We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy's value, given only access to a static experience dataset collected by unknown behavior policies. Starting from a function space embedding of the linear program formulation of the $Q$-function, we obtain an optimization problem with generalized estimating equation constraints. By applying the generalized empirical likelihood method to the resulting Lagrangian, we propose CoinDICE, a novel and efficient algorithm for computing confidence intervals. Theoretically, we prove the obtained confidence intervals are valid, in both asymptotic and finite-sample regimes. Empirically, we show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods. △ Less

Submitted 22 October, 2020; originally announced October 2020.

Comments: To appear at NeurIPS 2020 as spotlight
arXiv:2010.09648 [pdf]

cs.MA cs.CV eess.IV physics.soc-ph

Agent-based Simulation Model and Deep Learning Techniques to Evaluate and Predict Transportation Trends around COVID-19

Authors: Ding Wang, Fan Zuo, Jingqin Gao, Yueshuai He, Zilin Bian, Suzana Duran Bernardes, Chaekuk Na, Jingxing Wang, John Petinos, Kaan Ozbay, Joseph Y. J. Chow, Shri Iyer, Hani Nassif, Xuegang Jeff Ban

Abstract: The COVID-19 pandemic has affected travel behaviors and transportation system operations, and cities are grappling with what policies can be effective for a phased reopening shaped by social distancing. This edition of the white paper updates travel trends and highlights an agent-based simulation model's results to predict the impact of proposed phased reopening strategies. It also introduces a re… ▽ More The COVID-19 pandemic has affected travel behaviors and transportation system operations, and cities are grappling with what policies can be effective for a phased reopening shaped by social distancing. This edition of the white paper updates travel trends and highlights an agent-based simulation model's results to predict the impact of proposed phased reopening strategies. It also introduces a real-time video processing method to measure social distancing through cameras on city streets. △ Less

Submitted 23 September, 2020; originally announced October 2020.
arXiv:2010.05150 [pdf, other]

cs.CL cs.AI cs.LG cs.RO

Safe Reinforcement Learning with Natural Language Constraints

Authors: Tsung-Yen Yang, Michael Hu, Yinlam Chow, Peter J. Ramadge, Karthik Narasimhan

Abstract: While safe reinforcement learning (RL) holds great promise for many practical applications like robotics or autonomous cars, current approaches require specifying constraints in mathematical form. Such specifications demand domain expertise, limiting the adoption of safe RL. In this paper, we propose learning to interpret natural language constraints for safe RL. To this end, we first introduce Ha… ▽ More While safe reinforcement learning (RL) holds great promise for many practical applications like robotics or autonomous cars, current approaches require specifying constraints in mathematical form. Such specifications demand domain expertise, limiting the adoption of safe RL. In this paper, we propose learning to interpret natural language constraints for safe RL. To this end, we first introduce HazardWorld, a new multi-task benchmark that requires an agent to optimize reward while not violating constraints specified in free-form text. We then develop an agent with a modular architecture that can interpret and adhere to such textual constraints while learning new tasks. Our model consists of (1) a constraint interpreter that encodes textual constraints into spatial and temporal representations of forbidden states, and (2) a policy network that uses these representations to produce a policy achieving minimal constraint violations during training. Across different domains in HazardWorld, we show that our method achieves higher rewards (up to11x) and fewer constraint violations (by 1.8x) compared to existing approaches. However, in terms of absolute performance, HazardWorld still poses significant challenges for agents to learn efficiently, motivating the need for future work. △ Less

Submitted 3 August, 2021; v1 submitted 10 October, 2020; originally announced October 2020.

Comments: The first two authors contributed equally
arXiv:2009.14018 [pdf]

physics.soc-ph cs.SI

Toward the "New Normal": A Surge in Speeding, New Volume Patterns, and Recent Trends in Taxis/For-Hire Vehicles

Authors: Jingqin Gao, Abhinav Bhattacharyya, Ding Wang, Nick Hudanich, Siva Sooryaa, Muruga Thambiran, Suzana Duran Bernardes, Chaekuk Na, Fan Zuo, Zilin Bian, Kaan Ozbay, Shri Iyer, Hani Nassif, Joseph Y. J. Chow

Abstract: Six months into the pandemic and one month after the phase four reopening in New York City (NYC), restrictions are lifting, businesses and schools are reopening, but global infections are still rising. This white paper updates travel trends observed in the aftermath of the COVID-19 outbreak in NYC and highlight some findings toward the "new normal." Six months into the pandemic and one month after the phase four reopening in New York City (NYC), restrictions are lifting, businesses and schools are reopening, but global infections are still rising. This white paper updates travel trends observed in the aftermath of the COVID-19 outbreak in NYC and highlight some findings toward the "new normal." △ Less

Submitted 23 September, 2020; originally announced September 2020.
arXiv:2008.04762 [pdf]

physics.soc-ph cs.CY

doi 10.1016/j.tranpol.2020.12.011

A validated multi-agent simulation test bed to evaluate congestion pricing policies on population segments by time-of-day in New York City

Authors: Brian Yueshuai He, Jinkai Zhou, Ziyi Ma, Ding Wang, Di Sha, Mina Lee, Joseph Y. J. Chow, Kaan Ozbay

Abstract: Evaluation of the demand for emerging transportation technologies and policies can vary by time of day due to spillbacks on roadways, rescheduling of travelers' activity patterns, and shifting to other modes that affect the level of congestion. These effects are not well-captured with static travel demand models. We calibrate and validate the first open-source multi-agent simulation model for New… ▽ More Evaluation of the demand for emerging transportation technologies and policies can vary by time of day due to spillbacks on roadways, rescheduling of travelers' activity patterns, and shifting to other modes that affect the level of congestion. These effects are not well-captured with static travel demand models. We calibrate and validate the first open-source multi-agent simulation model for New York City, called MATSim-NYC, to support agencies in evaluating policies such as congestion pricing. The simulation-based virtual test bed is loaded with an 8M+ synthetic 2016 population calibrated in a prior study. The road network is calibrated to INRIX speed data and average annual daily traffic for a screenline along the East River crossings, resulting in average speed differences of 7.2% on freeways and 17.1% on arterials, leading to average difference of +1.8% from the East River screenline. Validation against transit stations shows an 8% difference from observed counts and median difference of 29% for select road link counts. The model is used to evaluate a congestion pricing plan proposed by the Regional Plan Association and suggests a much higher (127K) car trip reduction compared to their report (59K). The pricing policy would impact the population segment making trips within Manhattan differently from the population segment of trips outside Manhattan. The multiagent simulation can show that 37.3% of the Manhattan segment would be negatively impacted by the pricing compared to 39.9% of the non-Manhattan segment, which has implications for redistribution of congestion pricing revenues. The citywide travel consumer surplus decreases when the congestion pricing goes up from $9.18 to $14 both ways even as it increases for the Charging-related population segment. This implies that increasing pricing from $9.18 to $14 benefits Manhattanites at the expense of the rest of the city. △ Less

Submitted 21 December, 2020; v1 submitted 31 July, 2020; originally announced August 2020.

Journal ref: Transport Policy 101 (2021) 145-161
arXiv:2008.00335 [pdf, other]

cs.AI cs.LG eess.SY

V2I Connectivity-Based Dynamic Queue-Jump Lane for Emergency Vehicles: A Deep Reinforcement Learning Approach

Authors: Haoran Su, Kejian Shi, Li Jin, Joseph Y. J. Chow

Abstract: Emergency vehicle (EMV) service is a key function of cities and is exceedingly challenging due to urban traffic congestion. A main reason behind EMV service delay is the lack of communication and cooperation between vehicles blocking EMVs. In this paper, we study the improvement of EMV service under V2I connectivity. We consider the establishment of dynamic queue jump lanes (DQJLs) based on real-t… ▽ More Emergency vehicle (EMV) service is a key function of cities and is exceedingly challenging due to urban traffic congestion. A main reason behind EMV service delay is the lack of communication and cooperation between vehicles blocking EMVs. In this paper, we study the improvement of EMV service under V2I connectivity. We consider the establishment of dynamic queue jump lanes (DQJLs) based on real-time coordination of connected vehicles. We develop a novel Markov decision process formulation for the DQJL problem, which explicitly accounts for the uncertainty of drivers' reaction to approaching EMVs. We propose a deep neural network-based reinforcement learning algorithm that efficiently computes the optimal coordination instructions. We also validate our approach on a micro-simulation testbed using Simulation of Urban Mobility (SUMO). Validation results show that with our proposed methodology, the centralized control system saves approximately 15\% EMV passing time than the benchmark system. △ Less

Submitted 29 May, 2021; v1 submitted 1 August, 2020; originally announced August 2020.

Comments: 20 pages, 6 figures
arXiv:2006.14518 [pdf]

cs.GT cs.CY

Mobility operator service capacity sharing contract design to risk-pool against network disruptions

Authors: Theodoros P. Pantelidis, Joseph Y. J. Chow, Oded Cats

Abstract: We propose a new mechanism to design risk-pooling contracts between operators to facilitate horizontal cooperation to mitigate those costs and improve service resilience during disruptions. We formulate a novel two-stage stochastic multicommodity flow model to determine the cost savings of a coalition under different disruption scenarios and solve it using L-shaped method along with sample average… ▽ More We propose a new mechanism to design risk-pooling contracts between operators to facilitate horizontal cooperation to mitigate those costs and improve service resilience during disruptions. We formulate a novel two-stage stochastic multicommodity flow model to determine the cost savings of a coalition under different disruption scenarios and solve it using L-shaped method along with sample average approximation. Computational tests of the L-shaped method against deterministic equivalent method with sample average approximation are conducted for network instances with up to 64 nodes, 10 OD pairs, and 1024 scenarios. The results demonstrate that the solution algorithm only becomes computationally effective for larger size instances (above 128 nodes) and that SAA maintains a close approximation. The proposed model is applied to a regional multi-operator network in the Randstad area of the Netherlands, for four operators, 40 origin-destination pairs, and over 1400 links where disruption data is available. Using the proposed method, we identify stable cost allocations among four operating agencies that could yield a 66% improvement in overall network performance over not having any risk-pooling contract in place. Furthermore, the model allows policymakers to evaluate the sensitivity of any one operator's bargaining power to different network structures and disruption scenario distributions, as we illustrate for the HTM operator in Randstad. △ Less

Submitted 1 May, 2023; v1 submitted 25 June, 2020; originally announced June 2020.
arXiv:2006.13408 [pdf, other]

cs.LG cs.AI stat.ML

Control-Aware Representations for Model-based Reinforcement Learning

Authors: Brandon Cui, Yinlam Chow, Mohammad Ghavamzadeh

Abstract: A major challenge in modern reinforcement learning (RL) is efficient control of dynamical systems from high-dimensional sensory observations. Learning controllable embedding (LCE) is a promising approach that addresses this challenge by embedding the observations into a lower-dimensional latent space, estimating the latent dynamics, and utilizing it to perform control in the latent space. Two impo… ▽ More A major challenge in modern reinforcement learning (RL) is efficient control of dynamical systems from high-dimensional sensory observations. Learning controllable embedding (LCE) is a promising approach that addresses this challenge by embedding the observations into a lower-dimensional latent space, estimating the latent dynamics, and utilizing it to perform control in the latent space. Two important questions in this area are how to learn a representation that is amenable to the control problem at hand, and how to achieve an end-to-end framework for representation learning and control. In this paper, we take a few steps towards addressing these questions. We first formulate a LCE model to learn representations that are suitable to be used by a policy iteration style algorithm in the latent space. We call this model control-aware representation learning (CARL). We derive a loss function for CARL that has close connection to the prediction, consistency, and curvature (PCC) principle for representation learning. We derive three implementations of CARL. In the offline implementation, we replace the locally-linear control algorithm (e.g.,~iLQR) used by the existing LCE methods with a RL algorithm, namely model-based soft actor-critic, and show that it results in significant improvement. In online CARL, we interleave representation learning and control, and demonstrate further gain in performance. Finally, we propose value-guided CARL, a variation in which we optimize a weighted version of the CARL loss function, where the weights depend on the TD-error of the current policy. We evaluate the proposed algorithms by extensive experiments on benchmark tasks and compare them with several LCE baselines. △ Less

Submitted 23 June, 2020; originally announced June 2020.
arXiv:2006.13368 [pdf]

econ.GN cs.MA physics.soc-ph

doi 10.1016/j.ijtst.2021.01.003

Impact of COVID-19 behavioral inertia on reopening strategies for New York City Transit

Authors: Ding Wang, Brian Yueshuai He, Jingqin Gao, Joseph Y. J. Chow, Kaan Ozbay, Shri Iyer

Abstract: The COVID-19 pandemic has affected travel behaviors and transportation system operations, and cities are grappling with what policies can be effective for a phased reopening shaped by social distancing. A baseline model was previously developed and calibrated for pre-COVID conditions as MATSim-NYC. A new COVID model is calibrated that represents travel behavior during the COVID-19 pandemic by reca… ▽ More The COVID-19 pandemic has affected travel behaviors and transportation system operations, and cities are grappling with what policies can be effective for a phased reopening shaped by social distancing. A baseline model was previously developed and calibrated for pre-COVID conditions as MATSim-NYC. A new COVID model is calibrated that represents travel behavior during the COVID-19 pandemic by recalibrating the population agendas to include work-from-home and re-estimating the mode choice model for MATSim-NYC to fit observed traffic and transit ridership data. Assuming the change in behavior exhibits inertia during reopening, we analyze the increase in car traffic due to the phased reopen plan guided by the state government of New York. Four reopening phases and two reopening scenarios (with and without transit capacity restrictions) are analyzed. A Phase 4 reopening with 100% transit capacity may only see as much as 73% of pre-COVID ridership and an increase in the number of car trips by as much as 142% of pre-pandemic levels. Limiting transit capacity to 50% would decrease transit ridership further from 73% to 64% while increasing car trips to as much as 143% of pre-pandemic levels. While the increase appears small, the impact on consumer surplus is disproportionately large due to already increased traffic congestion. Many of the trips also get shifted to other modes like micromobility. The findings imply that a transit capacity restriction policy during reopening needs to be accompanied by (1) support for micromobility modes, particularly in non-Manhattan boroughs, and (2) congestion alleviation policies that focus on reducing traffic in Manhattan, such as cordon-based pricing. △ Less

Submitted 11 February, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

Journal ref: International Journal of Transportation Science & Technology 10(2) 197-211 (2021)
arXiv:2006.08714 [pdf, other]

cs.LG cs.AI stat.ML

Latent Bandits Revisited

Authors: Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed, Craig Boutilier

Abstract: A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state. The primary goal of the agent is to identify the latent state, after which it can act optimally. This setting is a natural midpoint between online and offline learning---complex models can be learned offline with the agent identifying latent state online---… ▽ More A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state. The primary goal of the agent is to identify the latent state, after which it can act optimally. This setting is a natural midpoint between online and offline learning---complex models can be learned offline with the agent identifying latent state online---of practical relevance in, say, recommender systems. In this work, we propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling. Our methods are contextual and aware of model uncertainty and misspecification. We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions. A comprehensive empirical study showcases the advantages of our approach. △ Less

Submitted 15 June, 2020; originally announced June 2020.

Comments: 16 pages, 2 figures
arXiv:2006.08236 [pdf, other]

cs.LG cs.AI stat.ML

Non-Stationary Off-Policy Optimization

Authors: Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed

Abstract: Off-policy learning is a framework for evaluating and optimizing policies without deploying them, from data collected by another policy. Real-world environments are typically non-stationary and the offline learned policies should adapt to these changes. To address this challenge, we study the novel problem of off-policy optimization in piecewise-stationary contextual bandits. Our proposed solution… ▽ More Off-policy learning is a framework for evaluating and optimizing policies without deploying them, from data collected by another policy. Real-world environments are typically non-stationary and the offline learned policies should adapt to these changes. To address this challenge, we study the novel problem of off-policy optimization in piecewise-stationary contextual bandits. Our proposed solution has two phases. In the offline learning phase, we partition logged data into categorical latent states and learn a near-optimal sub-policy for each state. In the online deployment phase, we adaptively switch between the learned sub-policies based on their performance. This approach is practical and analyzable, and we provide guarantees on both the quality of off-policy optimization and the regret during online deployment. To show the effectiveness of our approach, we compare it to state-of-the-art baselines on both synthetic and real-world datasets. Our approach outperforms methods that act only on observed context. △ Less

Submitted 4 April, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

Comments: AISTATS 2021; 16 pages, 2 figures
arXiv:2006.05443 [pdf, other]

cs.LG cs.AI stat.ML

Variational Model-based Policy Optimization

Authors: Yinlam Chow, Brandon Cui, MoonKyung Ryu, Mohammad Ghavamzadeh

Abstract: Model-based reinforcement learning (RL) algorithms allow us to combine model-generated data with those collected from interaction with the real system in order to alleviate the data efficiency problem in RL. However, designing such algorithms is often challenging because the bias in simulated data may overshadow the ease of data generation. A potential solution to this challenge is to jointly lear… ▽ More Model-based reinforcement learning (RL) algorithms allow us to combine model-generated data with those collected from interaction with the real system in order to alleviate the data efficiency problem in RL. However, designing such algorithms is often challenging because the bias in simulated data may overshadow the ease of data generation. A potential solution to this challenge is to jointly learn and improve model and policy using a universal objective function. In this paper, we leverage the connection between RL and probabilistic inference, and formulate such an objective function as a variational lower-bound of a log-likelihood. This allows us to use expectation maximization (EM) and iteratively fix a baseline policy and learn a variational distribution, consisting of a model and a policy (E-step), followed by improving the baseline policy given the learned variational distribution (M-step). We propose model-based and model-free policy iteration (actor-critic) style algorithms for the E-step and show how the variational distribution learned by them can be used to optimize the M-step in a fully model-based fashion. Our experiments on a number of continuous control tasks show that despite being more complex, our model-based (E-step) algorithm, called {\em variational model-based policy optimization} (VMBPO), is more sample-efficient and robust to hyper-parameter tuning than its model-free (E-step) counterpart. Using the same control tasks, we also compare VMBPO with several state-of-the-art model-based and model-free RL algorithms and show its sample efficiency and performance. △ Less

Submitted 23 June, 2020; v1 submitted 9 June, 2020; originally announced June 2020.
arXiv:2005.03465 [pdf]

physics.soc-ph cs.CE stat.AP

doi 10.1109/FISTS46898.2020.9264887

A stochastic user-operator assignment game for microtransit service evaluation: A case study of Kussbus in Luxembourg

Authors: Tai-Yu Ma, Joseph Y. J. Chow, Sylvain Klein, Ziyi Ma

Abstract: This paper proposes a stochastic variant of the stable matching model from Rasulkhani and Chow [1] which allows microtransit operators to evaluate their operation policy and resource allocations. The proposed model takes into account the stochastic nature of users' travel utility perception, resulting in a probabilistic stable operation cost allocation outcome to design ticket price and ridership… ▽ More This paper proposes a stochastic variant of the stable matching model from Rasulkhani and Chow [1] which allows microtransit operators to evaluate their operation policy and resource allocations. The proposed model takes into account the stochastic nature of users' travel utility perception, resulting in a probabilistic stable operation cost allocation outcome to design ticket price and ridership forecasting. We applied the model for the operation policy evaluation of a microtransit service in Luxembourg and its border area. The methodology for the model parameters estimation and calibration is developed. The results provide useful insights for the operator and the government to improve the ridership of the service. △ Less

Submitted 8 April, 2020; originally announced May 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:1912.01984
arXiv:2003.01086 [pdf, other]

cs.LG eess.SY stat.ML

Predictive Coding for Locally-Linear Control

Authors: Rui Shu, Tung Nguyen, Yinlam Chow, Tuan Pham, Khoat Than, Mohammad Ghavamzadeh, Stefano Ermon, Hung H. Bui

Abstract: High-dimensional observations and unknown dynamics are major challenges when applying optimal control to many real-world decision making tasks. The Learning Controllable Embedding (LCE) framework addresses these challenges by embedding the observations into a lower dimensional latent space, estimating the latent dynamics, and then performing control directly in the latent space. To ensure the lear… ▽ More High-dimensional observations and unknown dynamics are major challenges when applying optimal control to many real-world decision making tasks. The Learning Controllable Embedding (LCE) framework addresses these challenges by embedding the observations into a lower dimensional latent space, estimating the latent dynamics, and then performing control directly in the latent space. To ensure the learned latent dynamics are predictive of next-observations, all existing LCE approaches decode back into the observation space and explicitly perform next-observation prediction---a challenging high-dimensional task that furthermore introduces a large number of nuisance parameters (i.e., the decoder) which are discarded during control. In this paper, we propose a novel information-theoretic LCE approach and show theoretically that explicit next-observation prediction can be replaced with predictive coding. We then use predictive coding to develop a decoder-free LCE model whose latent dynamics are amenable to locally-linear control. Extensive experiments on benchmark tasks show that our model reliably learns a controllable latent space that leads to superior performance when compared with state-of-the-art LCE baselines. △ Less

Submitted 2 March, 2020; originally announced March 2020.
arXiv:2003.01025 [pdf, other]

cs.AI eess.SY

Dynamic Queue-Jump Lane for Emergency Vehicles under Partially Connected Settings: A Multi-Agent Deep Reinforcement Learning Approach

Authors: Haoran Su, Kejian Shi, Joseph. Y. J. Chow, Li Jin

Abstract: Emergency vehicle (EMV) service is a key function of cities and is exceedingly challenging due to urban traffic congestion. The main reason behind EMV service delay is the lack of communication and cooperation between vehicles blocking EMVs. In this paper, we study the improvement of EMV service under V2X connectivity. We consider the establishment of dynamic queue jump lanes (DQJLs) based on real… ▽ More Emergency vehicle (EMV) service is a key function of cities and is exceedingly challenging due to urban traffic congestion. The main reason behind EMV service delay is the lack of communication and cooperation between vehicles blocking EMVs. In this paper, we study the improvement of EMV service under V2X connectivity. We consider the establishment of dynamic queue jump lanes (DQJLs) based on real-time coordination of connected vehicles in the presence of non-connected human-driven vehicles. We develop a novel Markov decision process formulation for the DQJL coordination strategies, which explicitly accounts for the uncertainty of drivers' yielding pattern to approaching EMVs. Based on pairs of neural networks representing actors and critics for agent vehicles, we develop a multi-agent actor-critic deep reinforcement learning algorithm that handles a varying number of vehicles and a random proportion of connected vehicles in the traffic. Approaching the optimal coordination strategies via indirect and direct reinforcement learning, we present two schemata to address multi-agent reinforcement learning on this connected vehicle application. Both approaches are validated, on a micro-simulation testbed SUMO, to establish a DQJL fast and safely. Validation results reveal that, with DQJL coordination strategies, it saves up to 30% time for EMVs to pass a link-level intelligent urban roadway than the baseline scenario. △ Less

Submitted 15 January, 2021; v1 submitted 2 March, 2020; originally announced March 2020.

Comments: 42 pages, 13 figures, 7 tables
arXiv:2002.05522 [pdf, other]

cs.LG cs.AI stat.ML

BRPO: Batch Residual Policy Optimization

Authors: Sungryull Sohn, Yinlam Chow, Jayden Ooi, Ofir Nachum, Honglak Lee, Ed Chi, Craig Boutilier

Abstract: In batch reinforcement learning (RL), one often constrains a learned policy to be close to the behavior (data-generating) policy, e.g., by constraining the learned action distribution to differ from the behavior policy by some maximum degree that is the same at each state. This can cause batch RL to be overly conservative, unable to exploit large policy changes at frequently-visited, high-confiden… ▽ More In batch reinforcement learning (RL), one often constrains a learned policy to be close to the behavior (data-generating) policy, e.g., by constraining the learned action distribution to differ from the behavior policy by some maximum degree that is the same at each state. This can cause batch RL to be overly conservative, unable to exploit large policy changes at frequently-visited, high-confidence states without risking poor performance at sparsely-visited states. To remedy this, we propose residual policies, where the allowable deviation of the learned policy is state-action-dependent. We derive a new for RL method, BRPO, which learns both the policy and allowable deviation that jointly maximize a lower bound on policy performance. We show that BRPO achieves the state-of-the-art performance in a number of tasks. △ Less

Submitted 28 March, 2020; v1 submitted 7 February, 2020; originally announced February 2020.
arXiv:2001.07282 [pdf]

cs.DS cs.CY math.OC

doi 10.1287/trsc.2021.1058

A node-charge graph-based online carshare rebalancing policy with capacitated electric charging

Authors: Theodoros P. Pantelidis, Li Li, Tai-Yu Ma, Joseph Y. J. Chow, Saif Eddin G. Jabari

Abstract: Viability of electric car-sharing operations depends on rebalancing algorithms. Earlier methods in the literature suggest a trend toward non-myopic algorithms using queueing principles. We propose a new rebalancing policy using cost function approximation. The cost function is modeled as a p-median relocation problem with minimum cost flow conservation and path-based charging station capacities on… ▽ More Viability of electric car-sharing operations depends on rebalancing algorithms. Earlier methods in the literature suggest a trend toward non-myopic algorithms using queueing principles. We propose a new rebalancing policy using cost function approximation. The cost function is modeled as a p-median relocation problem with minimum cost flow conservation and path-based charging station capacities on a static node-charge graph structure. The cost function is NP-complete, so a heuristic is proposed that ensures feasible solutions that can be solved in an online system. The algorithm is validated in a case study of electric carshare in Brooklyn, New York, with demand data shared from BMW ReachNow operations in September 2017 (262 vehicle fleet, 231 pickups per day, 303 traffic analysis zones (TAZs)) and charging station location data (18 charging stations with 4 port capacities). The proposed non-myopic rebalancing heuristic reduces the cost increase compared to myopic rebalancing by 38%. Other managerial insights are further discussed. △ Less

Submitted 14 March, 2021; v1 submitted 20 January, 2020; originally announced January 2020.

Journal ref: Transportation Science (2021)
arXiv:1912.09004 [pdf]

cs.CY

doi 10.1109/TITS.2021.3105230

Online route choice modeling for Mobility-as-a-Service networks with non-separable, congestible link capacity effects

Authors: Susan Jia Xu, Joseph Y. J. Chow

Abstract: With the prevalence of MaaS systems, route choice models need to consider characteristics unique to them. MaaS systems tend to involve service systems with fleets of vehicles; as a result, the available service capacity depends on the choices of other travelers in different parts of the system. We model this with a new concept of "congestible capacity"; that is, link capacities are a function of f… ▽ More With the prevalence of MaaS systems, route choice models need to consider characteristics unique to them. MaaS systems tend to involve service systems with fleets of vehicles; as a result, the available service capacity depends on the choices of other travelers in different parts of the system. We model this with a new concept of "congestible capacity"; that is, link capacities are a function of flow instead of link costs. This dependency is also non-separable; the capacity in one link can depend on flows from multiple links. An offline-online estimation method is introduced to capture the structural effects that flows have on capacities and the resulting impacts on route choice utilities. The method is first applied to obtain unique congestible capacity shadow prices in a multimodal network to verify the capability to capture congestion effects on capacities. The capacities are shown to vary and impact the utility of a route. The method is validated using real system data from Citi Bike in New York City. The results show that the model can fit to the data quite well and performs better than a baseline modeling approach that ignores congestible capacity effects. By relating the route choice to congestible capacities using a random utility model, modelers can monitor and quantify the impacts to traveler consumer surplus in real time. Applications of the model and online method include monitoring capacity effects on consumer surplus, using the model to direct incentives programs for rebalancing and other revenue management strategies, and to guide resource allocation to mitigate consumer surplus impacts due to disruptions from incidents. △ Less

Submitted 9 July, 2021; v1 submitted 18 December, 2019; originally announced December 2019.

Journal ref: IEEE Transactions on Intelligent Transportation Systems, 2021
arXiv:1912.02074 [pdf, other]

cs.LG cs.AI

AlgaeDICE: Policy Gradient from Arbitrary Experience

Authors: Ofir Nachum, Bo Dai, Ilya Kostrikov, Yinlam Chow, Lihong Li, Dale Schuurmans

Abstract: In many real-world applications of reinforcement learning (RL), interactions with the environment are limited due to cost or feasibility. This presents a challenge to traditional RL algorithms since the max-return objective involves an expectation over on-policy samples. We introduce a new formulation of max-return optimization that allows the problem to be re-expressed by an expectation over an a… ▽ More In many real-world applications of reinforcement learning (RL), interactions with the environment are limited due to cost or feasibility. This presents a challenge to traditional RL algorithms since the max-return objective involves an expectation over on-policy samples. We introduce a new formulation of max-return optimization that allows the problem to be re-expressed by an expectation over an arbitrary behavior-agnostic and off-policy data distribution. We first derive this result by considering a regularized version of the dual max-return objective before extending our findings to unregularized objectives through the use of a Lagrangian formulation of the linear programming characterization of Q-values. We show that, if auxiliary dual variables of the objective are optimized, then the gradient of the off-policy objective is exactly the on-policy policy gradient, without any use of importance weighting. In addition to revealing the appealing theoretical properties of this approach, we also show that it delivers good practical performance. △ Less

Submitted 4 December, 2019; originally announced December 2019.
arXiv:1911.04435 [pdf]

cs.CY econ.GN

doi 10.1016/j.trb.2020.08.002

A many-to-many assignment game and stable outcome algorithm to evaluate collaborative Mobility-as-a-Service platforms

Authors: Theodoros P. Pantelidis, Joseph Y. J. Chow, Saeid Rasulkhani

Abstract: As Mobility as a Service (MaaS) systems become increasingly popular, travel is changing from unimodal trips to personalized services offered by a platform of mobility operators. Evaluation of MaaS platforms depends on modeling both user route decisions as well as operator service and pricing decisions. We adopt a new paradigm for traffic assignment in a MaaS network of multiple operators using the… ▽ More As Mobility as a Service (MaaS) systems become increasingly popular, travel is changing from unimodal trips to personalized services offered by a platform of mobility operators. Evaluation of MaaS platforms depends on modeling both user route decisions as well as operator service and pricing decisions. We adopt a new paradigm for traffic assignment in a MaaS network of multiple operators using the concept of stable matching to allocate costs and determine prices offered by operators corresponding to user route choices and operator service choices without resorting to nonconvex bilevel programming formulations. Unlike our prior work, the proposed model allows travelers to make multimodal, multi-operator trips, resulting in stable cost allocations between competing network operators to provide MaaS for users. An algorithm is proposed to efficiently generate stability conditions for the stable outcome model. Extensive computational experiments demonstrate the use of the model to handling pricing responses of MaaS operators in technological and capacity changes, government acquisition, consolidation, and firm entry, using the classic Sioux Falls network. The proposed algorithm replicates the same stability conditions as explicit path enumeration while taking only 17 seconds compared to explicit path enumeration timing out over 2 hours. △ Less

Submitted 28 June, 2020; v1 submitted 11 November, 2019; originally announced November 2019.

Journal ref: Transportation Research Part B 104 (2020) 79-100
arXiv:1911.03779 [pdf]

physics.soc-ph cs.LG stat.ML

doi 10.1109/MITS.2020.3037324

Empirical validation of network learning with taxi GPS data from Wuhan, China

Authors: Susan Jia Xu, Qian Xie, Joseph Y. J. Chow, Xintao Liu

Abstract: In prior research, a statistically cheap method was developed to monitor transportation network performance by using only a few groups of agents without having to forecast the population flows. The current study validates this "multi-agent inverse optimization" method using taxi GPS probe data from the city of Wuhan, China. Using a controlled 2062-link network environment and different GPS data pr… ▽ More In prior research, a statistically cheap method was developed to monitor transportation network performance by using only a few groups of agents without having to forecast the population flows. The current study validates this "multi-agent inverse optimization" method using taxi GPS probe data from the city of Wuhan, China. Using a controlled 2062-link network environment and different GPS data processing algorithms, an online monitoring environment is simulated using the real data over a 4-hour period. Results show that using only samples from one OD pair, the multi-agent inverse optimization method can learn network parameters such that forecasted travel times have a 0.23 correlation with the observed travel times. By increasing to monitoring from just two OD pairs, the correlation improves further to 0.56. △ Less

Submitted 17 August, 2020; v1 submitted 9 November, 2019; originally announced November 2019.

Journal ref: IEEE Intelligent Transportation Systems Magazine 13(1) (2021) 42-58

Search v0.5.6 released 2020-02-24