Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paper edits from MWC #140

Merged
merged 2 commits into from
Jan 25, 2021
Merged

Conversation

mattwigway
Copy link
Contributor

Here are some edits I've made to the paper. The biggest change is the addition of a new section at the end on computing accessibility—I think this is a major use case and useful to have in the paper, but if other authors don't think it belongs in this paper I'm open to removing it. If it does stay we need to use actual data—I assigned random values to each of the point in the grid and computed accessibility with those. Maybe an employment, population, or POI dataset?

Most of my other changes are just minor rewordings for clarity or grammar. I did add the computation time for the travel time matrix to the text, although I'm running r5r in a docker container on a 6-year-old MacBook, so we might get better numbers from someone else's machine. I also put a place to write the version numbers of r5r and r5 used in the final paper. We may want to take the code that actually measures the computation time out before we publish.

In the figures, I added a point to show where the central bus station is in Figure 2, and in both Figures 1 and 2 I switched them to have R5 calculate the median travel time directly, rather than using the median of the calculated percentiles (which would actually have been the 60th percentile, since there were two percentiles below and two percentiles above requested from R5).

A few things I think we may want to revisit, but that I didn't change:

  • I'm not sure people will agree that other open-source routing packages can't handle large networks. For instance, OSRM can handle networks (not transit networks) way bigger than R5 can. Maybe the real problem is ease of use? getting routing software set up is a huge pain
  • I'm not sure that R5 is really set up for country-level routing as we say in the text. It can easily handle, say, the Netherlands, but the whole US not so much. (An aside: the Southern California Association of Governments region has both more people and more land area than the Netherlands.)
  • I feel like there should be a way to crop the maps so the edges of the network don't show, but I don't know what it is.

Right now the text is about 1100 words not including the code listings. I'm not sure how Transport Findings will treat the code listings. David King is on the editorial board and a friend of mine - I could reach out to him if you like. Also, FWIW, I've heard that they are not really enforcing the word limit and have also reviewed papers for them that were significantly over the word limit.

@codecov
Copy link

codecov bot commented Jan 25, 2021

Codecov Report

Merging #140 (73d29ab) into master (1e2fee5) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #140   +/-   ##
=======================================
  Coverage   93.16%   93.16%           
=======================================
  Files           7        7           
  Lines         322      322           
=======================================
  Hits          300      300           
  Misses         22       22           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1e2fee5...73d29ab. Read the comment docs.

While there is a growing number of open source routing models [@opentripplanneropentripplanner; @lovelace2019stplanr; @padgham2019dodgr], most options available do not efficiently process large transport networks. This paper presents [r5r](https://ipeagit.github.io/r5r/), a new open source R package for routing on multimodal transport networks based on the [Rapid Realistic Routing on Real-world and Reimagined networks (R<sup>5</sup>)](https://github.com/conveyal/r5). R<sup>5</sup> is a powerful next-generation routing engine written in Java and developed at Conveyal [@conway2017evidencebased; @conway2018accounting] to provide an efficient backend for analytic applications - such as accessibility analysis. The r5r package provides a simple and friendly interface to run R<sup>5</sup> locally from within R, which allows users to efficiently calculate travel time matrices or generate multiple route alternatives between origins and destinations using seamless parallel computing.
<!-- MWC: I'm not sure I agree with this - OTP can handle large networks (and OTP 2.0 actually shares a lot of algorithms with R5), and there are also tools like osrm that can handle really large street routing problems. Is the problem more that the tools that can handle really large networks are inaccessible to researchers? -->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. A key contribution of r5r is making it super easy to do routing analysis

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in any case, it's still true most available options do 'do not efficiently process large public transport networks'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think the contributions are making it easy to do it fast (b/c most easy options are not fast, and most fast options are not easy), and handling large PT networks.


<!-- More importantly, those tools generate deterministic estimates of travel times without accounting for variability in results (due to service reliability issues, for example) ... [rafa: the idea here is to emphasize the power of R5 with the time_window parameter. We certainly need to rephrase things, but perhaps we should stress this idea here -->

# METHODS AND DATA

The r5r package has low data requirements and is easily scalable, allowing fast computation of routes and travel times at either city or country-level analysis. It creates a routable transport network using street network data from [OpenStreetMap](https://www.openstreetmap.org/) (OSM), and it automatically combines public transport data if they are provided by the user in the standard [General Transit Feed Specification](https://developers.google.com/transit/gtfs/) (GTFS) format.
<!-- Country-level analysis takes a ton of RAM (well, depending on the country)---Maybe we should say region-level instead? Like, the Netherlands is possible on high-end consumer hardware, but I suspect that, say, the US would overwhelm most commodity machines -->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I'll change to 'region-level' in the text

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattwigway do you think this is related to issue #141? Have you tried running R5 on a country like Germany?

```

For this article, we used r5r version XX and R5 version XX. <!-- TODO I'm running r5r from github, but we should use a prod release here. I do think we should include these version numbers -->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll add this to the version numbers text

@rafapereirabr
Copy link
Member

Thanks, Matt. I'll go over your comments/suggestions

@rafapereirabr rafapereirabr merged commit a693e32 into ipeaGIT:master Jan 25, 2021
The `travel_time_matrix()` function takes, as inputs, the spatial location of origins/destinations (either as a spatial `sf POINT` object, or as a `data.frame` containing the columns `id`, `lon` and `lat`) and a few travel parameters such as *maximum trip duration* or *walking distance*. It outputs travel time estimates for each origin-destination pair at a set `departure_datetime`.
The `travel_time_matrix()` function takes, as inputs, the spatial location of origins/destinations (either as a spatial `sf POINT` object, or as a `data.frame` containing the columns `id`, `lon` and `lat`) and a few travel parameters such as *maximum trip duration*, or *walking distance*. It outputs travel time estimates for each origin-destination pair at a set `departure_datetime`.

<!-- It might be worthwhile to consider changing the defaults to use a longer time window, and to return more percentiles, so users don't have to set parameters to see variation -MWC -->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I belive most users are not yet familiar the idea of using a time_window , so they general user will likely use the function assuming the results refer to the exact departure time of input. I'm afraid that if we change the default time_window to 30 min or 1 hour, most users ( who don't read the documentation) will have a wrong interpretation of the output. If we leave the time_window as it is now, the result is more intuitive for general users but we still allows for advanced users to tweak the time_window parameter and fully benefit from the function capability.

@@ -97,10 +108,10 @@ departure_datetime <- as.POSIXct("13-05-2019 14:00:00",
format = "%d-%m-%Y %H:%M:%S")
time_window <- 120 # in minutes
percentiles <- c(20, 40, 60, 80, 99)
percentiles <- c(20, 40, 50, 60, 80, 99)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattwigway , I'm not sure we can add more thresholds here. I believe the R5 algorithm now imposes a limit of up to 5 thresolds. Perhaps @mvpsaraiva can confirm this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, it seemed to work and give reasonable numbers for all percentiles.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnalysisWorkerTask now has a limit of five percentiles, you can see it on line 46. The validatePercentiles() function (line 245) enforces that limit. It's a very recent change, from December 2020.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's interesting. Maybe just use percentiles (5, 25, 50, 75, 95) here? That would make the analysis symmetrical as well. Taking the median of the percentiles isn't correct though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I will do that. Thanks

In our example, we can visualize the isochrone (area reachable within a certain amount of time) departing from the central bus station as follows:
<!-- Travel time percentile 50 is the median, we shouldn't be calculating it from the other travel times -MWC -->
<!-- For this map and the next, I couldn't figure it out but there must be a way to crop the map view so only the shaded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. @mvpsaraiva , do you know a simple way to do this?

Copy link
Collaborator

@mvpsaraiva mvpsaraiva Jan 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 965fca8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants