-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation to read OSM files directly. #63
Conversation
Implementation based on https://github.com/dkondor/osmconvert/, with simplifications. It performs the following steps: 1. Read all OSM nodes in the input and read ways, but filter them to only include those that are suitable for car travel (similar to the 'drive' filter in osmnx -- https://github.com/gboeing/osmnx/blob/46ed807f870e3bede78f7e494e98490df470bd42/osmnx/downloader.py#L19 ) 2. Process ways and store a graph between the OSM nodes. 3. Process the graph by excluding OSM nodes with degree == 2 and creating edges between nodes with degree != 2 along chains. Create a LineString for each edge that contains the original geometry. 4. Add these edges to the FMM::NETWORK::Network class. Limitations: - Edges are assigned arbitrary IDs during the processing. This is necessary since OSM way IDs are not guaranteed to be unique (a way object can span multiple intersections). This way, the IDs in the output are meaningless. There should be some way to avoid this problem, either by outputting the assigned edge IDs separately or maybe outputting the OSM node IDs in the path instead of edge IDs, which are indeed unique and can be used to identify the path later. - Filtering for ways is not consistent with osmnx -- way_filter class could be modified to reproduce the behavior of e.g. the 'drive' filter. Also, there could be an option to avoid filtering if it was done by an external program before. - It is not possible to cut the network; this needs to be done externally as well. Also, no attempt is made to detect connected components; this might be better done inside the Network class.
Thanks for this quick update. Here are some comments.
|
|
default behavior is the same as network_type == 'drive_service' in osmnx bool flag in the constructor of way_filter and OSMHandler can be set false to disable filtering; this could be made into a commend line parameter for advanced users who have an OSM dataset already filtered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dkondor
Some comments are given here about the codes. You do not need to refactor the code as I will separate the header and source code for the osm reader and make some other changes.
For some questions about the code I will still need your clarification or suggestion.
|
||
public: | ||
/** test if the given way matches the filters */ | ||
bool operator () (const osmium::Way& way) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to merge the code. This function is not clear for me.
What do you mean by matching the filter? There are two tags include and exclude, if it matches, should the edge be filtered out or be kept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it is inappropriate to use an operator function here, which is a bit unclear. Perhaps rename it as a function like check_include
or check_prune
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically the logic is the following:
-
include_tags are checked: the way must have all the keys in include_tags. If no values are given (empty set), then any value is accepted at this stage. If a set of values are given, only those are accepted.
Note that in practice, this step only checks that the "highway" tag is present with any value. So we actually don't need a generalized implementation here and this step could be replaced with a simple check for the "highway" tag.
I'll suggest such a simplified implementation. -
exclude_tags are checked: we want to exclude way with specific tag values (key-value pairs), e.g. highway=footway is not suitable for driving
In the generalized implementation, a tag can be in both include_tags and exclude_tags. For highway, the first step makes it sure that it is present (with any value) and then the second step filters out values that are not suitable for driving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I agree to rename the function if that makes it's purpose more clear :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dkondor@8756815
I suggest this as a simplification of way_filter -- it eliminates include_tags, since that is only used for testing for the presence of the "highway" tag key.
(note: I have not pushed it to this branch yet -- you can decide if this is an improvement here)
* where n is the size of include_tags, and m is the number of tags | ||
* the way has; since the size of include_tags will be typically | ||
* small (1), this will not be a practical problem */ | ||
for(const auto& p : include_tags) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is also weird.
If the way finds an include tag and does not find it. Then false is returned (excluded).
What if the way contains one of the include tags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As before: if a way does not contain an include tag, it should be filtered out (the reasoning is the all roads should have the "highway" tag).
Either we want to filter for specific values or just the general presence of a tag with any value. I agree this is too general for this case, so it can be a good idea to simplify it.
int dir = is_oneway(way); | ||
|
||
/* process all nodes in this way */ | ||
osmium::unsigned_object_id_type last_node_id = 18446744073709551615UL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the meaning of this special ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the largest possible value, which I assume no OSM node has (I believe I read it is invalid value).
If it makes more sense, this code could be refactored to have a separate bool flag to determine if we have processed the first node already or to get the first node ID separately outside of the loop.
(The point is that the loop has N iterations, but we have only N-1 edges, so on the first iteration, we cannot add a new edge yet in the current logic)
### 1. Download test data | ||
|
||
``` | ||
mkdir osm_test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will create a separate folder in this path example/osm
for the osm demo and move this readme to that folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
``` | ||
mkdir osm_test | ||
cd osm_test | ||
wget https://www.dropbox.com/s/a3d8hmtkflckw10/fmm_osm_test_data.zip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one seems to be a personal shared folder. Perhaps it is better to use a permanent link from OSM to download the data.
As we are demonstrating OSM example. the shapefile is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I meant this as a quick test, for a case where I know that the shapefile and the OSM file should give the exact same result.
I agree it is not suitable for example code.
I suggest that the example code can only work on OSM data and leave out the shapefile part.
We could additionally have a more complete test case with instructions to download OSM data and either extract a shapefile with osmnx or use the OSM data directly. It could be a good to have a testcase that is expected to work in general and give exact same output, so that the OSM reader code can be validated.
Since doing the test case in the Dropbox, I changed the filters to match the behavior of osmnx, so now a test based on an osmnx shapefile should be feasible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I have created a separate repo https://github.com/cyang-kth/osm_mapmatching for instructions about working with OSM data where I use OSMNX and PostGIS for preprocessing.
Perhaps some command-line instructions to process osm file should also be placed there. A link can be provided here.
../../../build/stmatch --network ways.shp --gps test_points.csv --output stmatch_result.csv -k 4 -r 0.4 -e 0.5 --gps_geom WKT --gps_id id --source from --target to | ||
``` | ||
|
||
### 3. Run stmatch with OSM input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will also perhaps provide a jupyter-notebook here as an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps also add a figure for illustration.
|
||
Test data contains: | ||
- OSM data for Singapore (defined by bounding box) | ||
- Road network in shapefile format (converted from OSM with https://github.com/dkondor/osmconvert/ ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As osmconvert is already used for this tool https://wiki.openstreetmap.org/wiki/Osmconvert.
Creating a routable road network from OSM is always a challenging problem. I suggest that maybe you can create a more powerful tool called osm2network which addresses the problems we have discussed. This is just a suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noting this, I'll rename my code to distinguish -- of course it is a work in progress, so far I use it mainly to create TSV format networks, and will add features as I need them :)
* This function identifies the edges and adds them to an | ||
* FMM::NETWORK::Network class. | ||
*/ | ||
void generate_simplified_graph(Network& network) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the filter is always needed but creating simplified graph should also be designed as one option. As sometimes merging the edges will lose some detailed edge information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
I see there could be three options:
-- current behavior, edges are defined between OSM nodes that have degree != 2
-- similar, but an edge can contain only one OSM way, so a graph node is kept if the way ID changes on a path even if this node has degree == 2; this way, each edge has only one OSM way associated with it, but an OSM way might correspond to multiple edges (if there is an intersection along it)
-- OSM ways correspond directly to edges: this is the simplest case (no need to store a temporary graph), but could result in problems if a way includes an intersection in the middle
I think the current behavior is good for users who want it to work "out of the box" for any OSM network they download, but it presents the problem of relabeling edges. I think one way to deal with that is to have an option to output OSM node IDs for a path instead of edge ID, since those need not be relabeled (if using 64-bit node IDs, see below).
It could be good to give options for more advanced users who will want to perform filtering or editing themselves with external tools. I can implement the above three options that could be selected with further arguments in the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before you try to integrate this feature into the constructor, I am a bit concerned about it as it will make this configuration over-complicated. We want to make it both general and simple.
- I think whenever you need to change the network data, that should go to a preprocessing stage.
- For the problems with the OSM data, if the user really needs to do such a cleaning task, then it is better to switch to creating a shapefile as input rather than using OSM file as input. That also addresses the problem saving extra output information.
- What I expect for reading OSM file is very simple. The user just downloads an OSM file and provides it directly to the program and we can run map matching on it. That is also why I am more positive about providing filter command instructions for the user to filter out the data in need.
- Perhaps in some regions, the network quality is much higher, like this one used https://map.project-osrm.org/ or the problem does not exist or in the future it is corrected by the users. I think it is better for the network processing functions to go to a preprocessing stage.
- That will make this program inherent and it will also be much easier to be maintained.
- I think this feature only concerns with reading OSM data. Preprocessing or cleaning road network should be treated as another feature or perhaps a separate app called
osm2network
can be provided for the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. In this case, a much simplified version will be sufficient, more in line with your original implementation. In this case, I think the main change needed is to ensure 64-bit IDs are used everywhere where OSM IDs are encountered. You can of course refer people to my repository if they need to preprocess the network differently :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that is a good solution. The filtering of OSM ways is still needed here. Too complicated preprocessing such as simplifying graph can be removed here. But they can go to your program as a function.
@@ -19,7 +19,7 @@ | |||
namespace FMM { | |||
namespace NETWORK{ | |||
|
|||
typedef int NodeID; /**< Node ID in the network, can be discontinuous int */ | |||
typedef uint64_t NodeID; /**< Node ID in the network, can be discontinuous int */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will think more about this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed to change this since OSM node IDs are 64-bit. At the same time, I understand this might introduce other issues (if other places still expect 32-bit integer or in format strings when writing output).
Alternatively, we could have an extra mapping from 64-bit to 32-bit node IDs (e.g. as an std::unordered_map), but that will also add complexity.
I think it would add convenience for users if the output could list the original OSM node IDs along the path, so that there is no extra relabeling step that needs to be considered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this issue, if you check this matched information output https://fmm-wiki.github.io/docs/documentation/output/
I think one extra field should be designed there so you can easily export the node list. Then it becomes a problem whether you need to the start node ID or end node id of each edge.
|
||
|
||
/** Helper class to filter out OSM ways that are not roads */ | ||
class way_filter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep it consistent with the styles, I will rename the class to WayFilter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I agree
Implementation based on https://github.com/dkondor/osmconvert/, with simplifications.
I've split the new functionality to a new file to make things cleaner.
It performs the following steps:
Read all OSM nodes in the input and read ways, but filter them to only include those that are suitable for car travel
(similar to the 'drive' filter in osmnx -- https://github.com/gboeing/osmnx/blob/46ed807f870e3bede78f7e494e98490df470bd42/osmnx/downloader.py#L19 )
Process ways and store a graph between the OSM nodes.
Process the graph by excluding OSM nodes with degree == 2 and creating edges
between nodes with degree != 2 along chains. Create a LineString for each edge
that contains the original geometry.
Add these edges to the FMM::NETWORK::Network class.
Limitations:
Edges are assigned arbitrary IDs during the processing. This is necessary
since OSM way IDs are not guaranteed to be unique (a way object can span
multiple intersections). This way, the IDs in the output are meaningless.
There should be some way to avoid this problem, either by outputting the
assigned edge IDs separately or maybe outputting the OSM node IDs in the path
instead of edge IDs, which are indeed unique and can be used to identify the
path later.
Filtering for ways is not consistent with osmnx -- way_filter class could be
modified to reproduce the behavior of e.g. the 'drive' filter. Also, there could
be an option to avoid filtering if it was done by an external program before.
It is not possible to cut the network; this needs to be done externally as
well. Also, no attempt is made to detect connected components; this might be
better done inside the Network class.