Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predict arrival times #15

Closed
elliottwilliams opened this issue Aug 10, 2016 · 4 comments
Closed

Predict arrival times #15

elliottwilliams opened this issue Aug 10, 2016 · 4 comments

Comments

@elliottwilliams
Copy link
Member

I don't think we've discussed much about how schedule data is going to be sourced. I'm now at a point where I could be incorporating that information into the UI, so let's figure out how Proper will get schedule information.

I see two different schedule-related needs:

  • ETAs for a vehicle:station association (high priority): This is obviously a must have for the station arrival scene. From my perspective it should be part of Shark's data model, though it might pose some design challenges because it's data on the association itself.
  • Time-indepent schedule data (low priority): Allows Proper to answer questions like "When does this route run?", "What time does a vehicle on this route come to my station every morning?", "How frequently does this stop here?". I don't think these data are tied to the event model of Shark (though feel free to correct me if you do), so it might make sense to write a Timetable server that exposes RPCs which provide schedule data about a given entity.
@faultyserver
Copy link
Member

I agree with all of this, but I'll address your two points directly, since my ideas are likely a little different than what you have in mind:

ETAs for vehicles

My original idea was to simply have a schedule_delta property on vehicles that indicates the time disparity between a vehicle's last departure time and the time it was scheduled to depart. The reasoning here was that it is impossible to predict delays, so the understood delay that a vehicle has to the next station is the same as the understood delay it has to the next 5 stations.

Essentially, delay seems like an intrinsic property of vehicles, not their stations, routes, or any combination of the three. It also fits more nicely on the Vehicle class as an attribute than creating a special association type that has its own properties (which would likely only be used for this vehicle:station association).

The issue with the schedule_delta attribute is that it requires your second point (time-independent schedule data) to be in some way core to Shark. I'll address a possibility at the end of this post.

Time-independent schedule data

At one point, either we decided (or I stated) that, for the sake of efficiency and scalability, Shark will never be responsible for RPCs. Because of that, a separate server is likely the best solution.

Getting schedule data is pretty simple compared to what Shark has to handle. For simplicity, I think it makes sense to limit schedule data to agencies which provide a GTFS document (since it's basically the official standard of transit scheduling). All that the server would have to do is parse GTFS data into memory (there are ruby and python libraries that do this already). Using GTFS also means that the server can supply other information such as fare rates and transfers.

The difficulty with this server will be creating RPCs that map our representation of objects (stations identified by stop_code, routes identified by short_name, or really anything, given that these are configurable) to the GTFS representation, and can also locate the appropriate records in the GTFS tables (this seems like the legitimately difficult part).

I'd recommend taking a look at CityBus's GTFS data through one of those libraries and getting an idea of how it works for yourself. From there we can collaborate on the new server.

Proposal (tl;dr)

Create a Timetable server that can provide RPCs for schedule information based on GTFS sources (widely available at http:https://transit.land). This server may (eventually) also provide fare and transfer information.

Shark will then have a Schedule source type that calls these RPCs and fills out schedule_delta on Vehicle objects (plus any other attributes we may add later).

Agencies that want to provide realtime schedule-relative information can then include that source in their configuration:

Shark::Agency.configure do |agency|
  ...
  agency.use_manager :vehicle_manager do |manager|
    ...
    manager.source_from :schedule
  end
end

Clients should probably always try to contact the Timetable server to get schedule information. There could potentially be a has_schedule_information(<agency>) RPC that tells whether the agency has schedule information available, and determines whether the client will poll for actual information as it runs.

@faultyserver
Copy link
Member

I think this should actually be a middleware (No, seriously this time).

The reason is that Source classes are used to discover Objects and manage their lifetimes. This means that (for the most part) Sources don't take a list of existing Objects and perform queries based on that. Instead, they perform a query and create that list, then apply it to a manager.

In the case of Schedule, it would have to take a list of existing Vehicle objects and query the Timetable server for those vehicles in particular, meaning it doesn't fit the idea of a Source.

As a middleware, it makes sense to perform actions based on exising Objects (that's what an event handler is), so Schedule fits here perfectly.

@faultyserver
Copy link
Member

As always, I'm debating whether Schedule should be a Middleware or a Source. As far as I can tell, there are two different mindsets:

  1. (Source) Schedule data as the golden standard: With this mindset, we inherently assume that the scheduled times of events are more useful than their realtime counterparts. For the most part, agencies are good about sticking to their schedules, or updating their schedules when foreseeable delays are coming, so this isn't an insane idea.

    In this case, the schedule data can be queried generically (conforming to the idea of a Source explained above: no parameters or lists). Sources of realtime information can then amend the results from that Source with schedule_deltas and other real-time attributes.

  2. (Middleware) Schedule data as an annex: With this mindset, we assume that realtime information is the most useful, and that schedule information is more of a nicety.

    In this case, the schedule data is individually queried for each object (more inline with the idea of a Middleware), and is amended to those objects to provide the schedule_delta attribute.

An advantage of the first approach is that users can see all of the normal operations of an agency, even when things are very not-normal. For example, if CityBus for some reason couldn't run Route 7 one day, users would still see that there is a route that goes to those stops, and that vehicles do regularly service them, even though no vehicles are currently traveling the routes. An obstacle to this approach, though, is how to relay this information to users. How do we inform them that vehicles aren't just delayed, they're simply not traveling the route?. This is definitely a client-side issue, but one that I think would be affected by the design of the server.

With the second approach, the situation is reversed. Using the same example, users would see that no buses are coming to stops on Route 7, but would have no idea if that's normal or not. The question here is How do we inform them that vehicles do normally travel this route, but currently are not?.

One final caveat: I think both methodologies should be implemented, such that an agency can decide which mindset it wants to take simply by writing its configuration accordingly. My inquiry is more of what should we default to, and what should our mindset be when starting work with new agencies?

@elliottwilliams what do you think?

@faultyserver
Copy link
Member

The advent of Timetable and Providence (not yet created) have abstracted all schedule information out of Shark. That is, Shark will only ever be responsible for Realtime information, and Schedule information will be supplied where possible via external services.

In the future, we may add a GTFS Source that treats the schedule data as if it were realtime information (with anonymous Vehicles), so that agencies without realtime information still provide heartbeated events.

With that, I'm going to close this issue as punted, so that we can reference it when the GTFS Source is going to be implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants