Skip to content
Jason M. Carter edited this page Jun 20, 2017 · 6 revisions

Connected Vehicle Data Privacy Wiki

This wiki will contain information related to connected vehicle data and privacy at a higher level that the details of the privacy protection module (PPM).

Privacy Preserving Data Publishing (PPDP) and Connected Vehicle Applications

Privacy preserving data publishing (PPDP) aims to generate data that does not present a privacy risk to the individuals that data describes. For this discussion, this data will be called de-identified data keeping in mind that no direct, persistent identifiers exist in the Basic Safety Message (BSM) specification. Perhaps the most important question concerning PPDP is whether the data remains useful after it has been generated. This section describes connected vehicle data use cases and efforts to determine whether the data generated by the privacy protection module and other tools remains useful.

Over the past 25 years as Intelligent Transportation Systems (ITS) brought technology to transportation, data has grown to be a critical input into operations and system efficiency of State, local, and cross-jurisdictional systems. Data is a key input into real-time system operations and safety, provides a means to measure performance against target goals, and gives us a historical view that supports long-term planning processes. On a daily basis, planners and operators seek insight into the ongoing as well as forecasted performance of the transportation system (system performance metrics); and use various sources of data for various types of planning (long-term; freight, etc.). Some preliminary examples of typical use cases for which models are employed include:

  • Origin-destination patterns,
  • Movements through an intersection,
  • Delay on road segments and at intersections, and
  • Speed on road segments.

Obviously this list is not comprehensive.

Connected Vehicle Data Utility Questions

The utility of de-identified data for various transportation modeling, forecasting, and other operational uses is a key consideration. If the resulting data provides little benefit, then other means of control must be considered, so the original data can be analyzed to generate accurate results.

The following data utility research questions have been posed:

  • Does de-identified data help practitioners make decisions? Can those decision be made with the same or better certainty than when using existing data resources (e.g., loop detectors)

  • Which use cases are not affected, or minimally affected, by using de-identified data?

  • What uses of the data are not possible due to the de-identification process?

  • What location resolution (i.e., latitude and longitude measurement fidelity) is needed to facilitate each particular use case?

  • Is the use case confined to a set of isolated regions, or is a data set that covers all roads required?

  • What specific data is required to conduct the analysis needed that is not provided in the de-identified data?

  • How was the usefulness of the data determined?

  • Besides using the original data (data that has not been de-identified), how could a de-identification procedure be improved to provide more useful data?

Transportation Use Cases (Contribution by Volpe)

Locations of hard-braking events to identify potential safety issues

Business Case (why should anyone care):  Crashes are rare events.  At locations with low traffic volumes, there may not be enough actual crashes to identify safety issues.  Near-misses occur more frequently, and are typically characterized by exceptionally hard braking.  A plot of hard-braking events on a map can help identify areas which may benefit from some type of safety intervention (e.g., warning signage).

Use Case (definition): Pre-determined areas (bound by latitude and longitude) of analysis will be established to create the data set. A cut-off filter value will be used to cull the data to consist of only those data that have an acceleration less than a cut-off value (e.g., -11 m/s^2). It will need to be answered by the research panel how much data immediately preceding (or following) these cut-off points will be required to ensure value; also, for behavior noise, would it be desirable to also retain machine and trip information to be able to eliminate perpetual hard-brakers?

Questions: What is in the area that may be contributing to the hard-braking? (e.g., stop sign (unexpected), short yellow light, speed-trap, strip malls, hills)

BSM Fields: GPS_Elevation (D), GPS_Latitude (H), GPS_Longitude (I), InVehicle_ABS_State (Q), InVehicle_ABS_State (R), InVehicle_Logitudinal_Acceleration (T), InVehicle_Logitudinal_Speed (U), InVehicle_Stability_Control_Status (W), InVehicle_Traction_Control_Status (Z), InVehicle_Yaw_Rate (AD); Lane Track Suite

What alternative data sources exist for this use case?: probe vehicle accelerometer data (e.g., smart phone)

Number of turning movements at intersections

Business Case (why should anyone care): On non-freeway facilities, congestion generally occurs in conjunction with intersections. To mitigate intersection-related congestion, it is essential to understand the performance of the intersection in detail, including turning movements. This may aid in the design of simple interventions (pocket lanes, signal timings) that will provide great benefit.

Use Case (definition): Pre-determined areas (bound by latitude and longitude) of analysis will be established to create the data set. Entries (classified by approach) and passage through the exits can be aggregated to usage stats. A time-series may also be generated based on the additional dimension of time of day.

Questions: Is one heading more frequented than another (e.g., east vs. west)? Thinking to a small region, is there a pattern to usage of certain intersections in an area during one time of the day versus another time of day? Are the vehicles that are broadcasting BSMs representative of all vehicles?

BSM Fields: Time (C), GPS_Heading (G), GPS_Latitude (H), GPS_Longitude (I), GPS_Speed (L), DAS_Pitch_Rate (O), DAS_Roll_Rate (P), InVehicle_Longitudinal_Accel (T), InVehicle_Longitudinal_Speed (U), InVehicle_Stability_Control_Status (W), InVehicle_Steering_Position (X), InVehicle_Throttle_Position (Y), InVehicle_Turn_Signal_Left Device (AA), InVehicle_Turn_Signal_Right (AB), InVehicle_Yaw_Rate (AD)

What alternative data sources exist for this use case?: Loop detectors, traffic monitoring cameras, manual turning movement counts

Delays at intersections, by turning movement

Business Case (why should anyone care): Relieving congestion on roadways can help to reduce travel time for people on the road. This may be analyzed by identifying areas where blockages occur, namely intersection that experience delays. A plot of delays at intersections on a map can help identify areas which may benefit from some type of intervention (e.g. sign: no left turn between 6-9 a.m., longer timed light in certain direction during certain hours, or extended pocket lanes)

Use Case (definition): Pre-determined areas (bound by latitude and longitude) of analysis will be established to create the data set. When evaluating intersections through distribution analysis and tracking of delays as indicated by speed or elapsed time over the designated space.

Questions: : Is one direction of entry to the intersection (and desired exit point) more delayed than another? Does this occurrence shift to a different entry/ exit combination at a different time of day? Do lane blockages occur?

BSM Fields: Time (C), GPS_Heading (G), GPS_Latitude (H), GPS_Longitude (I), GPS_Speed (L), DAS_Pitch_Rate (O), DAS_Roll_Rate (P), InVehicle_Brake_Status (R), InVehicle_Longitudinal_Accel (T), InVehicle_Longitudinal_Speed (U), InVehicle_Stability_Control_Status (W), InVehicle_Steering_Position (X), InVehicle_Throttle_Position (Y), InVehicle_Turn_Signal_Left Device (AA), InVehicle_Turn_Signal_Right (AB), InVehicle_Yaw_Rate (AD)

What alternative data sources exist for this use case?: Loop detectors, traffic monitoring cameras, manual turning movement observations

Signal cycle failures (where a green signal fails to clear a queue)

Business Case (why should anyone care): Some intersections are known to take several signal cycles to clear, this can cause drivers to block the intersection and create more congestion. It can also encourage dangerous behavior like running red lights if the signal length is known to be long. Identification of these types of failures could help inform planners to re-evaluate their timing on signals if appropriate or to systematically evaluate the downstream effects of adjusting that timing to determine if in fact there are harsher ramifications from adjusting a localized time-setting on one signal. A plot of signal cycle failure events on a map can help identify areas which may benefit from some type of intervention.

Use Case (definition): Pre-determined areas (bound by latitude and longitude) will be established to create the data set. Wait times and associated signal times could be compared and then heat-map assigned to the map as a plot to indicate where actual time is the same, greater than, or less than the signal time to identify and categorize the intersection (NOTE: turning movements may have additional implications for this analysis).

Questions: Can cameras or loop detectors be installed/ utilized to identify real time when this is happening (after being identified here) to adjust signals and mitigate this occurrence? Is one direction more prone to this phenomenon than another? Does that change at a different time of day?

BSM Fields: Time (C), GPS_Latitude (H), GPS_Longitude (I), GPS_Speed (L), InVehicle_Brake_Status (R), InVehicle_Longitudinal_Accel (T), InVehicle_Longitudinal_Speed (U), InVehicle_Throttle_Position (Y), InVehicle_Traction_Control_Status (Z), InVehicle_Turn_Signal_Left Device (AA), InVehicle_Turn_Signal_Right (AB)

What models exist that will exploit this use case? Planning, Open Street Maps

What alternative data sources exist for this use case?: Traffic monitoring cameras

Queues at known and unknown locations

Business Case (why should anyone care): If people know when queues are likely to occur at given locations, they are likely to change their driving behavior if they have the flexibility to do so. Just by giving this information, people can plan accordingly and go at a less congested time (e.g., the online DMV wait times for services at different locations). Queues at rush hour leading to a major highway interchange is a reasonable thing to quantify and consider when planning new interchanges in other locations. Queues at other locations can also be indicative of a signal failure, an off ramp, a light, a stop sign, a left-turn or U-turn, or many other cases. This may help developers determine what the congestion factor is around an area that they are considering developing for business purposes and think about going elsewhere because the chances of balking would be higher if already congested as indicated by a previously detected queue. It may also be an area for marketers to consider for placing their advertisements, if traffic is queued up anyway that may allow the advertisers to elaborate their message because onlookers will have more time to read it in queue. A plot of queues on a map can help identify areas which may benefit from some type of signage about wait time associated with the particular time of day or real-time (e.g., a variable message sign reading 10 miles, xx minutes to I-95)

Use Case (definition): Pre-determined areas (bound by latitude and longitude) of analysis will be established to create the data set.

Questions: Is there some type of attractive nuisance/ curiosity factor contributing to this? How about other types of natural congestion associated with town centers or strips?

BSM Fields: Time (C), GPS_Latitude (H), GPS_Longitude (I), GPS_Speed (L), InVehicle_Brake_Status (R), InVehicle_Longitudinal_Accel (T), InVehicle_Longitudinal_Speed (U), InVehicle_Throttle_Position (Y), InVehicle_Traction_Control_Status (Z), InVehicle_Turn_Signal_Left Device (AA), InVehicle_Turn_Signal_Right (AB)

What alternative data sources exist for this use case?: Loop detectors

Shockwaves on a freeway

Business Case (why should anyone care): Can we determine if road characteristics are to blame for the origin of shockwaves or if driver behavior is more likely the case, or even if a combination of factors contribute to this phenomenon. Mitigating factors, like speed adjustments, could be applied and efficacy of such approaches may be evaluated.

Use Case (definition): Pre-determined areas (bound by latitude and longitude) will be established to create the data set.

Questions: Are there identifiable things that cause this as opposed to purely phantom cases? (e.g., billboards, turn-offs, speed traps, hills, turns, exits, etc.)

BSM Fields: Time (C), GPS_Elevation (D), GPS_Heading (G), GPS_Latitude (H), GPS_Longitude (I), GPS_Speed (L), InVehicle_ABS_State (Q), InVehicle_Brake_Status (R), InVehicle_Longitudinal_Accel (T), InVehicle_Longitudinal_Speed (U), InVehicle_Stability_Control_Status (W), InVehicle_Steering_Position (X), InVehicle_Throttle_Position (Y), InVehicle_Traction_Control_Status (Z), InVehicle_Turn_Signal_Left Device (AA), InVehicle_Turn_Signal_Right (AB), InVehicle_Wiper_Status (AC), InVehicle_Yaw_Rate (AD), Lane Track Suite

What alternative data sources exist for this use case?: probe vehicle data

Space mean speed

Business Case (why should anyone care): Optimization of routing may require you to take a circuitous routing or more mileage than a more direct route if fastest time is desired over shortest route. When space is divided into a grid or roads have been segmented off in order to evaluate the space, the space mean speed can give usage indications for the stretch of road, normal operating conditions, over time it can indicate heavier volumes of traffic based on speed slow down, it can also give indications of adverse conditions where people are exercising more caution and therefore driving slower. Determining the mean speed of a space can help determine routing when coupled with a plot of mean speed by space on a map.

Use Case (definition): Pre-determined areas (bound by latitude and longitude) of analysis will be established to create the data set.

Questions: If this data is segmented and no device (A)/ trip (B) are given, do we need to worry about PII?

BSM Fields: Time (C), GPS_Heading (G), GPS_Latitude (H), GPS_Longitude (I), GPS_Speed (L), InVehicle_Longitudinal_Accel (T), InVehicle_Longitudinal_Speed (U), InVehicle_Throttle_Position (Y), InVehicle_Wiper_Status (AC)

What alternative data sources exist for this use case?: probe vehicle accelerometer data (e.g., smart phone), GPS devices – when aggregated over time, Waze

Delay

Business Case (why should anyone care):

Use Case (definition):

What constitutes a delay versus an interim destination stop?

Questions: This depends on the answer to the main question

BSM Fields: Time (C), GPS_Latitude (H), GPS_Longitude (I), GPS_Speed (L), InVehicle_Brake_Status (R), InVehicle_Longitudinal_Accel (T), InVehicle_Longitudinal_Speed (U), InVehicle_Throttle_Position (Y)

What alternative data sources exist for this use case?: Waze

Detailed driving cycles (second-by-second speed, acceleration, deceleration), as input to energy/ emissions models

Business Case (why should anyone care): Understanding how people use their cars can help to determine what their energy usage is for vehicles and if known may also help to encourage different driving behaviors. The detailed driving cycles in aggregation may give indications for energy/ emissions models or projections for energy off-sets given a trip/ time of travel, but this information when analyzed may also be ascribed to behavior patterning; driver behavior and tendencies that can be generalized from a sampling pattern of driving data. For example, hard brake and quick accelerators may learn that they can reduce their fuel usage by slower acceleration and reduced speed to reduce the waste of fuel when followed by quick deceleration. Additionally, this type of collection and analysis may be found of great use for insurance companies when assessing what characteristics of driver type may be linked to a certain risk profile.

Use Case (definition): In addition to the suite of InVehicle instrument readings, it would be helpful to understand external conditions such as elevation changes to determine additional factors which may contribute to greater or less usage of fuel.

Questions: Think holistically; compare to other geographic areas for similarities/ differences.

BSM Fields: All BSM data points may be useful here, especially with extensions of this for use in other areas than energy/ emissions

What alternative data sources exist for this use case?: drive cycles for emissions models, on-board computer data

Origin-destination locations

Business Case (why should anyone care): First Mile – Last Mile information is critical in transportation planning to determine how people can be best served in their mobility and transportation needs. Understanding where people are coming from and where people are going to can serve this information when strategizing the best transportation solutions for the people that will be using them.

Use Case (definition):

Questions: How important is “first mile/ last mile” information to anyone? Perhaps depends if private residence in the suburbs vs. densely populated city with multiple places of interest.

BSM Fields: Time (C), GPS_Heading (G), GPS_Latitude (H), GPS_Longitude (I)

What alternative data sources exist for this use case?: Voluntary “check-in” data via smart phones can also indicate where people are going, business records of transactions/ sales

Detailed trip routing

Business Case (why should anyone care):

Use Case (definition): Pre-determined areas (bound by latitude and longitude) will be established to create the data set.

Driver/ Human behavior patterns are of great interest to many fields, not just transportation researchers. With the detailed trip routing information, it is conceivable that consumer behavior, social behavior, demographic inference, can all be attributed based on a detailed trip route. When PII is removed, there is less value in the data that remains for the researcher.

Questions: Can’t this information be obtained through other means? Is this not already bring done by companies like Waze (real time) and the Google Maps/ MapQuest (statically)?

BSM Fields: All BSM points may be useful here

What alternative data sources exist for this use case?: Voluntary “check-in” data via smart phones can also indicate where people are going

Travel time

Business Case (why should anyone care): If people know how long their intended trip will take at different times of the day, perhaps they may change their driving behavior if they have the flexibility to do so. Just by giving this information, people can plan accordingly and go at a less congested time (e.g., the online DMV wait times for services at different locations). Point to point travel time may help determine the most efficient routing to take on a multi-stop delivery route, if historical data is enough to help with predictive modeling. Multi-criteria TSP or other types of delivery models could be built to help with different objectives that commercial (or even private) functions have. A plot of queues on a map can help identify areas which may benefit from some type of signage about wait time associated with the particular time of day or real-time (e.g., leave at 7 a.m. – anticipated travel time: 1 hour 20 minutes; leave at 8:30 a.m. – anticipated travel time: 55 minutes)

Use Case (definition): Pre-determined areas (bound by latitude and longitude) will be established to create the data set.

Questions: Does the total travel time matter if segments can be broken up and then pieced back together? Can we reduce PII exposure with this approach?

BSM Fields: Time (C), GPS_Latitude (H), GPS_Longitude (I), GPS_Speed (L), InVehicle_Brake_Status (R), InVehicle_Longitudinal_Accel (T), InVehicle_Longitudinal_Speed (U)

What alternative data sources exist for this use case?: Loop detectors, traffic cameras, existing probe vehicles