Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation to read OSM files directly. #63

Merged
merged 2 commits into from
Jun 14, 2020
Merged

Conversation

dkondor
Copy link
Contributor

@dkondor dkondor commented Apr 17, 2020

Implementation based on https://github.com/dkondor/osmconvert/, with simplifications.

I've split the new functionality to a new file to make things cleaner.

It performs the following steps:

  1. Read all OSM nodes in the input and read ways, but filter them to only include those that are suitable for car travel
    (similar to the 'drive' filter in osmnx -- https://github.com/gboeing/osmnx/blob/46ed807f870e3bede78f7e494e98490df470bd42/osmnx/downloader.py#L19 )

  2. Process ways and store a graph between the OSM nodes.

  3. Process the graph by excluding OSM nodes with degree == 2 and creating edges
    between nodes with degree != 2 along chains. Create a LineString for each edge
    that contains the original geometry.

  4. Add these edges to the FMM::NETWORK::Network class.

Limitations:

  • Edges are assigned arbitrary IDs during the processing. This is necessary
    since OSM way IDs are not guaranteed to be unique (a way object can span
    multiple intersections). This way, the IDs in the output are meaningless.
    There should be some way to avoid this problem, either by outputting the
    assigned edge IDs separately or maybe outputting the OSM node IDs in the path
    instead of edge IDs, which are indeed unique and can be used to identify the
    path later.

  • Filtering for ways is not consistent with osmnx -- way_filter class could be
    modified to reproduce the behavior of e.g. the 'drive' filter. Also, there could
    be an option to avoid filtering if it was done by an external program before.

  • It is not possible to cut the network; this needs to be done externally as
    well. Also, no attempt is made to detect connected components; this might be
    better done inside the Network class.

Implementation based on https://github.com/dkondor/osmconvert/, with simplifications.

It performs the following steps:

1. Read all OSM nodes in the input and read ways, but filter them to only include those that are suitable for car travel
(similar to the 'drive' filter in osmnx -- https://github.com/gboeing/osmnx/blob/46ed807f870e3bede78f7e494e98490df470bd42/osmnx/downloader.py#L19 )

2. Process ways and store a graph between the OSM nodes.

3. Process the graph by excluding OSM nodes with degree == 2 and creating edges
between nodes with degree != 2 along chains. Create a LineString for each edge
that contains the original geometry.

4. Add these edges to the FMM::NETWORK::Network class.

Limitations:

 - Edges are assigned arbitrary IDs during the processing. This is necessary
since OSM way IDs are not guaranteed to be unique (a way object can span
multiple intersections). This way, the IDs in the output are meaningless.
There should be some way to avoid this problem, either by outputting the
assigned edge IDs separately or maybe outputting the OSM node IDs in the path
instead of edge IDs, which are indeed unique and can be used to identify the
path later.

 - Filtering for ways is not consistent with osmnx -- way_filter class could be
modified to reproduce the behavior of e.g. the 'drive' filter. Also, there could
be an option to avoid filtering if it was done by an external program before.

 - It is not possible to cut the network; this needs to be done externally as
well. Also, no attempt is made to detect connected components; this might be
better done inside the Network class.
@cyang-kth
Copy link
Owner

@dkondor

Thanks for this quick update. Here are some comments.

  • I find that you place the source code into a file called network_osm_reader.cpp. Since there is no corresponding header, it seems to be a weird way to organize the code as this file and network.cpp file provide implementations of the network.hpp.
    • Either we move everything of network_osm_reader.cpp into the network.cpp file. It may make that file too big or the second choice will be better.
    • We actually create a separate header network_osm_reader.hpp for network_osm_reader.cpp and include that header in the network.cpp
  • About handling of the edge ID, I think it depends on how people want to perform map matching on OSM data. It might be better to keep it consistent with the original ID in the OSM file since the user may still have some other information stored/processed from other resources. But the too long edge is a problem. I think I need to check how other prevalent tools handle this issue before determine a final solution.
  • About the filtering of network edges, there could be an extra option called filter provided to the network configuration class. But I am not quite sure if this will make things overcomplicated or not because preprocessing can still be a clean way to address this issue.
  • Since the branch is not a master branch so it does not matter if I merge it now or latter. I think I can merge it and make some changes then merge into master.
  • Before merging this feature into the master branch, I need to make sure that it can be built successfully on the virtual machine of travis where the icon comes from build

@dkondor
Copy link
Contributor Author

dkondor commented Apr 17, 2020

  • Regarding the files, you can choose either; I didn't want to add to network.cpp so much code at first
  • For node IDs, what I noticed for osmnx is that if creating shapefile output, the 'osmid' field is actually a list of way IDs that belong to that edge, e.g. [628487417, 661665981, 661665982]
  • For filtering, I suggest to have a default road filter (maybe replicating the default behavior from osmnx) and on option to disable it for advanced users. I'll add this to the implementation

default behavior is the same as network_type == 'drive_service' in osmnx

bool flag in the constructor of way_filter and OSMHandler can be set false to disable filtering; this could be made into a commend line parameter for advanced users who have an OSM dataset already filtered
Copy link
Owner

@cyang-kth cyang-kth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkondor
Some comments are given here about the codes. You do not need to refactor the code as I will separate the header and source code for the osm reader and make some other changes.

For some questions about the code I will still need your clarification or suggestion.


public:
/** test if the given way matches the filters */
bool operator () (const osmium::Way& way) const {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to merge the code. This function is not clear for me.

What do you mean by matching the filter? There are two tags include and exclude, if it matches, should the edge be filtered out or be kept.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it is inappropriate to use an operator function here, which is a bit unclear. Perhaps rename it as a function like check_include or check_prune

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically the logic is the following:

  1. include_tags are checked: the way must have all the keys in include_tags. If no values are given (empty set), then any value is accepted at this stage. If a set of values are given, only those are accepted.
    Note that in practice, this step only checks that the "highway" tag is present with any value. So we actually don't need a generalized implementation here and this step could be replaced with a simple check for the "highway" tag.
    I'll suggest such a simplified implementation.

  2. exclude_tags are checked: we want to exclude way with specific tag values (key-value pairs), e.g. highway=footway is not suitable for driving

In the generalized implementation, a tag can be in both include_tags and exclude_tags. For highway, the first step makes it sure that it is present (with any value) and then the second step filters out values that are not suitable for driving.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I agree to rename the function if that makes it's purpose more clear :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dkondor@8756815
I suggest this as a simplification of way_filter -- it eliminates include_tags, since that is only used for testing for the presence of the "highway" tag key.
(note: I have not pushed it to this branch yet -- you can decide if this is an improvement here)

* where n is the size of include_tags, and m is the number of tags
* the way has; since the size of include_tags will be typically
* small (1), this will not be a practical problem */
for(const auto& p : include_tags) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is also weird.
If the way finds an include tag and does not find it. Then false is returned (excluded).
What if the way contains one of the include tags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As before: if a way does not contain an include tag, it should be filtered out (the reasoning is the all roads should have the "highway" tag).
Either we want to filter for specific values or just the general presence of a tag with any value. I agree this is too general for this case, so it can be a good idea to simplify it.

int dir = is_oneway(way);

/* process all nodes in this way */
osmium::unsigned_object_id_type last_node_id = 18446744073709551615UL;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the meaning of this special ID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the largest possible value, which I assume no OSM node has (I believe I read it is invalid value).
If it makes more sense, this code could be refactored to have a separate bool flag to determine if we have processed the first node already or to get the first node ID separately outside of the loop.
(The point is that the loop has N iterations, but we have only N-1 edges, so on the first iteration, we cannot add a new edge yet in the current logic)

### 1. Download test data

```
mkdir osm_test
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create a separate folder in this path example/osm for the osm demo and move this readme to that folder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

```
mkdir osm_test
cd osm_test
wget https://www.dropbox.com/s/a3d8hmtkflckw10/fmm_osm_test_data.zip
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one seems to be a personal shared folder. Perhaps it is better to use a permanent link from OSM to download the data.

As we are demonstrating OSM example. the shapefile is not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I meant this as a quick test, for a case where I know that the shapefile and the OSM file should give the exact same result.

I agree it is not suitable for example code.

I suggest that the example code can only work on OSM data and leave out the shapefile part.

We could additionally have a more complete test case with instructions to download OSM data and either extract a shapefile with osmnx or use the OSM data directly. It could be a good to have a testcase that is expected to work in general and give exact same output, so that the OSM reader code can be validated.

Since doing the test case in the Dropbox, I changed the filters to match the behavior of osmnx, so now a test based on an osmnx shapefile should be feasible.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I have created a separate repo https://github.com/cyang-kth/osm_mapmatching for instructions about working with OSM data where I use OSMNX and PostGIS for preprocessing.

Perhaps some command-line instructions to process osm file should also be placed there. A link can be provided here.

../../../build/stmatch --network ways.shp --gps test_points.csv --output stmatch_result.csv -k 4 -r 0.4 -e 0.5 --gps_geom WKT --gps_id id --source from --target to
```

### 3. Run stmatch with OSM input
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will also perhaps provide a jupyter-notebook here as an example.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps also add a figure for illustration.


Test data contains:
- OSM data for Singapore (defined by bounding box)
- Road network in shapefile format (converted from OSM with https://github.com/dkondor/osmconvert/ )
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As osmconvert is already used for this tool https://wiki.openstreetmap.org/wiki/Osmconvert.
Creating a routable road network from OSM is always a challenging problem. I suggest that maybe you can create a more powerful tool called osm2network which addresses the problems we have discussed. This is just a suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noting this, I'll rename my code to distinguish -- of course it is a work in progress, so far I use it mainly to create TSV format networks, and will add features as I need them :)

* This function identifies the edges and adds them to an
* FMM::NETWORK::Network class.
*/
void generate_simplified_graph(Network& network) const {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the filter is always needed but creating simplified graph should also be designed as one option. As sometimes merging the edges will lose some detailed edge information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

I see there could be three options:

-- current behavior, edges are defined between OSM nodes that have degree != 2
-- similar, but an edge can contain only one OSM way, so a graph node is kept if the way ID changes on a path even if this node has degree == 2; this way, each edge has only one OSM way associated with it, but an OSM way might correspond to multiple edges (if there is an intersection along it)
-- OSM ways correspond directly to edges: this is the simplest case (no need to store a temporary graph), but could result in problems if a way includes an intersection in the middle

I think the current behavior is good for users who want it to work "out of the box" for any OSM network they download, but it presents the problem of relabeling edges. I think one way to deal with that is to have an option to output OSM node IDs for a path instead of edge ID, since those need not be relabeled (if using 64-bit node IDs, see below).

It could be good to give options for more advanced users who will want to perform filtering or editing themselves with external tools. I can implement the above three options that could be selected with further arguments in the constructor.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before you try to integrate this feature into the constructor, I am a bit concerned about it as it will make this configuration over-complicated. We want to make it both general and simple.

  • I think whenever you need to change the network data, that should go to a preprocessing stage.
  • For the problems with the OSM data, if the user really needs to do such a cleaning task, then it is better to switch to creating a shapefile as input rather than using OSM file as input. That also addresses the problem saving extra output information.
  • What I expect for reading OSM file is very simple. The user just downloads an OSM file and provides it directly to the program and we can run map matching on it. That is also why I am more positive about providing filter command instructions for the user to filter out the data in need.
  • Perhaps in some regions, the network quality is much higher, like this one used https://map.project-osrm.org/ or the problem does not exist or in the future it is corrected by the users. I think it is better for the network processing functions to go to a preprocessing stage.
  • That will make this program inherent and it will also be much easier to be maintained.
  • I think this feature only concerns with reading OSM data. Preprocessing or cleaning road network should be treated as another feature or perhaps a separate app called osm2network can be provided for the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. In this case, a much simplified version will be sufficient, more in line with your original implementation. In this case, I think the main change needed is to ensure 64-bit IDs are used everywhere where OSM IDs are encountered. You can of course refer people to my repository if they need to preprocess the network differently :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that is a good solution. The filtering of OSM ways is still needed here. Too complicated preprocessing such as simplifying graph can be removed here. But they can go to your program as a function.

@@ -19,7 +19,7 @@
namespace FMM {
namespace NETWORK{

typedef int NodeID; /**< Node ID in the network, can be discontinuous int */
typedef uint64_t NodeID; /**< Node ID in the network, can be discontinuous int */
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will think more about this part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to change this since OSM node IDs are 64-bit. At the same time, I understand this might introduce other issues (if other places still expect 32-bit integer or in format strings when writing output).

Alternatively, we could have an extra mapping from 64-bit to 32-bit node IDs (e.g. as an std::unordered_map), but that will also add complexity.

I think it would add convenience for users if the output could list the original OSM node IDs along the path, so that there is no extra relabeling step that needs to be considered.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this issue, if you check this matched information output https://fmm-wiki.github.io/docs/documentation/output/
I think one extra field should be designed there so you can easily export the node list. Then it becomes a problem whether you need to the start node ID or end node id of each edge.



/** Helper class to filter out OSM ways that are not roads */
class way_filter {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep it consistent with the styles, I will rename the class to WayFilter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I agree

@cyang-kth cyang-kth merged commit f19cde7 into cyang-kth:osm Jun 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants