Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plot point type geopandas.GeoDataFrames as points directly instead of as lines #1373

Closed
weiji14 opened this issue Jul 6, 2021 · 20 comments · Fixed by #1405 or #1438
Closed

Plot point type geopandas.GeoDataFrames as points directly instead of as lines #1373

weiji14 opened this issue Jul 6, 2021 · 20 comments · Fixed by #1405 or #1438
Assignees
Labels
feature request New feature wanted scipy-sprint Things to work on during the SciPy sprint!

Comments

@weiji14
Copy link
Member

weiji14 commented Jul 6, 2021

Description of the desired feature

⚠️ Note to SciPy 2021 sprinters: this is a bit more of an advanced feature enhancement request, which you are happy to help with if you good with GeoPandas and PyGMT. But if not, please take a look at other scipy-sprint issues which are easier ⚠️

PyGMT added initial support for plotting geopandas.GeoDataFrame objects in #1000 (see also #608). However, GMT by default will plot lines even if you give it points like so:

import geopandas as gpd
import pygmt

cities: gpd.GeoDataFrame = gpd.read_file(gpd.datasets.get_path("naturalearth_cities"))
cities.head()
# 	name	geometry
# 0	Vatican City	POINT (12.45339 41.90328)
# 1	San Marino	POINT (12.44177 43.93610)
# 2	Vaduz	POINT (9.51667 47.13372)
# 3	Luxembourg	POINT (6.13000 49.61166)
# 4	Palikir	POINT (158.14997 6.91664)

fig = pygmt.Figure()
fig.plot(data=cities, color="black")
fig.show()

cities as lines

Having lines show up can be counter-intuitive, especially to new users unfamiliar with how GMT works. The workaround is to use fig.plot(data=cities, style="s0.2c", color="black") so that circles are plotted instead of lines. Ideally however, a POINT based GeoDataFrame should just be plotted as that, points.

cities as points

What needs to be done 🧐

To enable point GeoDataFrames to be plotted directly as points, there are probably a few ways we can update PyGMT's code.

Solution 1

Update the plot.py code to add a default style when points are given but no style/-S argument is given. This would involve changing the code somewhere along here:

pygmt/pygmt/src/plot.py

Lines 210 to 211 in d90b3fc

if "S" in kwargs and kwargs["S"][0] in "vV" and direction is not None:
extra_arrays.extend(direction)

Note that the code for plot3d would also need to be updated too at:

pygmt/pygmt/src/plot3d.py

Lines 179 to 180 in d90b3fc

if "S" in kwargs and kwargs["S"][0] in "vV" and direction is not None:
extra_arrays.extend(direction)

Solution 2

Alternatively, we could set the style in the temporary OGR_GMT file somehow, which would involve modifying the code somewhere along here:

# Using geopandas.to_file to directly export to OGR_GMT format
geojson.to_file(**ogrgmt_kwargs)

Solution 3

Anyone have a better idea? Let's discuss!

Are you willing to help implement and maintain this feature? Happy to mentor someone on tackling this issue.

@weiji14 weiji14 added question Further information is requested feature request New feature wanted scipy-sprint Things to work on during the SciPy sprint! labels Jul 6, 2021
@yohaimagen
Copy link
Contributor

I think I can do that. I'll prefer Solution 2, it seems cleaner and does not involve the plot function that is used in many cases that do not involve pointed Geopandas DataFrame.

@weiji14
Copy link
Member Author

weiji14 commented Aug 1, 2021

Cool, I've assigned you to the task, feel free to open a Pull Request to work on it and let us know if you need any help 😁

@yohaimagen
Copy link
Contributor

I am trying to find a way to specify the style attributes in the temporary OGR_GMT file looking in the following GMT doc page and didn't find the solution. any suggestion on how to set the style in the temporary OGR_GMT file?

@weiji14
Copy link
Member Author

weiji14 commented Aug 1, 2021

Good question 😅 I'm not sure if it works to be honest, but I know .gmt files can have header lines setting the region @R, projection @J and so on (see e.g. https://github.com/weiji14/nzasc2021/blob/v1.0.0/antarctic_subglacial_lakes_3031.gmt), so I though maybe the style can be set using @S? Another reference to use would be https://docs.generic-mapping-tools.org/6.2/cookbook/ogrgmt-format.html. See if it works, and if not, we might need to think of another way.

@yohaimagen
Copy link
Contributor

yohaimagen commented Aug 1, 2021

So, I tried already your suggestion and you can not set -S in the header. trying to set "# @sc1c" for example will lead to the following error:
plot [ERROR]: Bad OGR/GMT: @s not allowed before FEATURE_DATA
from https://docs.generic-mapping-tools.org/6.2/cookbook/ogrgmt-format.html we should set geometry with "# @g" option.
I don't know if it is a GMT issue but with the following simple script

gmt begin temp
    gmt basemap -R0/10/0/10 -JX10 -Bf1
    gmt plot ttt.gmt
gmt end show

with ttt.gmt being

# @VGMT1.0
# @GPOINT
# @R1/5/1/6UB       
# @Nx|y
# @Tstring|string
# FEATURE_DATA @S1c1
# @D1|6 
1 6
# @D2|3
2 3
# @D3|2
3 2
# @D4|1
4 1
# @D5|3
5 3

we get

temp.pdf

and when "# @g" set to POLYGON as follow

# @VGMT1.0
# @GPOLYGON
# @R1/5/1/6UB       
# @Nx|y
# @Tstring|string
# FEATURE_DATA @S1c1
# @D1|6 
1 6
# @D2|3
2 3
# @D3|2
3 2
# @D4|1
4 1
# @D5|3
5 3

we get

temp.pdf

@weiji14
Copy link
Member Author

weiji14 commented Aug 1, 2021

Hmm I see. As I understand it, GMT defaults to plotting lines (even when using @GPOINT), and there's probably a historic reason for this default behaviour.

Since @S doesn't work, we might need to go with Option 1 then and modify the code in plot.py. Sorry for taking you down the wrong rabbit hole!

@seisman
Copy link
Member

seisman commented Aug 8, 2021

As I understand it, GMT defaults to plotting lines (even when using @GPOINT), and there's probably a historic reason for this default behaviour.

Ping @PaulWessel for comments on this before we merge #1405.

@PaulWessel
Copy link
Member

This behavior predates our understanding of GIS and geometries. However, GMT is generic and if the user wants to plot points, lines, or polygons then that is up the them, not a data set. So I think the only real solution here is for PyGMT to decide if you want to intervene and impose geometry rules.

@seisman
Copy link
Member

seisman commented Aug 9, 2021

I agree with @weiji14 that it's more intuitive to plot symbols rather than lines if the data geometry is Point or MultiPoint.

I'm not familiar with geopandas, but my big concern is the inconsistency between passing a GeoDataFrame and passing a file.

If I understand it correctly, after PR #1405 is merged, when a GeoDataFrames with a Point/MultiPoint geometry is passed, the data points will be plotted as symbols (square in 2D and cube in 3D). However, if someone saves the GeoDataFrame into a OGR/GMT file, and passes the file to PyGMT, data points will be plotted as lines. Am I understanding it right?

@weiji14
Copy link
Member Author

weiji14 commented Aug 10, 2021

If I understand it correctly, after PR #1405 is merged, when a GeoDataFrames with a Point/MultiPoint geometry is passed, the data points will be plotted as symbols (square in 2D and cube in 3D).

Yes, this is correct, and is probably more intuitive for a user used to plotting Point/MultiPoint geometries as point symbols.

However, if someone saves the GeoDataFrame into a OGR/GMT file, and passes the file to PyGMT, data points will be plotted as lines. Am I understanding it right?

The problem here is that an OGR/GMT file with a header line containing @GPOINT won't be plotted as point symbols (in GMT or PyGMT) but as lines. It may well be considered a 'bug', or a 'feature' if backwards compatibility is desired.

@seisman
Copy link
Member

seisman commented Aug 10, 2021

However, if someone saves the GeoDataFrame into a OGR/GMT file, and passes the file to PyGMT, data points will be plotted as lines. Am I understanding it right?

The problem here is that an OGR/GMT file with a header line containing @GPOINT won't be plotted as point symbols (in GMT or PyGMT) but as lines. It may well be considered a 'bug', or a 'feature' if backwards compatibility is desired.

As answered by @PaulWessel in #1373 (comment), it's likely that GMT won't change the behavior.

@weiji14
Copy link
Member Author

weiji14 commented Aug 10, 2021

However, if someone saves the GeoDataFrame into a OGR/GMT file, and passes the file to PyGMT, data points will be plotted as lines. Am I understanding it right?

The problem here is that an OGR/GMT file with a header line containing @GPOINT won't be plotted as point symbols (in GMT or PyGMT) but as lines. It may well be considered a 'bug', or a 'feature' if backwards compatibility is desired.

As answered by @PaulWessel in #1373 (comment), it's likely that GMT won't change the behavior.

That's ok. So the question is, do we prefer PyGMT to:

  1. Plot Point/MultiPoint as lines for both geopandas.GeoDataFrame and OGR/GMT file (current behaviour in PyGMT v0.4.1)
  2. Plot Point/MultiPoint as squares/cubes for geopandas.GeoDataFrame and as lines for OGR/GMT file (behaviour after merging Plot square or cube by default for geopandas Point/MultiPoint types #1405)
  3. Plot Point/MultiPoint as squares/cubes for both geopandas.GeoDataFrame and OGR/GMT files (something that needs more work after Plot square or cube by default for geopandas Point/MultiPoint types #1405).

@PaulWessel
Copy link
Member

Well, we can discuss this. There is a difference between a plain data file with coordinates and a shapefile or OGR/GMT file that says it has points. I could entertain suggestions to have plot and plot3d give errors if the -S is not selected when geometry is points.

@seisman
Copy link
Member

seisman commented Aug 10, 2021

So the question is, do we prefer PyGMT to:

  1. Plot Point/MultiPoint as lines for both geopandas.GeoDataFrame and OGR/GMT file (current behaviour in PyGMT v0.4.1)
  2. Plot Point/MultiPoint as squares/cubes for geopandas.GeoDataFrame and as lines for OGR/GMT file (behaviour after merging Plot square or cube by default for geopandas Point/MultiPoint types #1405)
  3. Plot Point/MultiPoint as squares/cubes for both geopandas.GeoDataFrame and OGR/GMT files (something that needs more work after Plot square or cube by default for geopandas Point/MultiPoint types #1405).

Option 2 definitely will cause some confusion. For option 3, the problem is, how can PyGMT determines if a text file is a plain text file or an OGR/GMT file. If it's an OGR/GMT file, how can PyGMT know its geometry? As I understand it, PyGMT has to read the file to detect its format and geometry, which is very inefficient if a file is large.

@yohaimagen
Copy link
Contributor

how can PyGMT determines if a text file is a plain text file or an OGR/GMT file. If it's an OGR/GMT file,

the major difference between a plain text file and a GMT file is a GMT file contains a header.

how can PyGMT know its geometry? As I understand it, PyGMT has to read the file to detect its format and geometry, which is very inefficient if a file is large.

Only the header is needed to understand the geometry, so only a few lines are needed.

The issue is how to deal with the other acceptable file formats.

@weiji14
Copy link
Member Author

weiji14 commented Aug 11, 2021

how can PyGMT determines if a text file is a plain text file or an OGR/GMT file. If it's an OGR/GMT file,

the major difference between a plain text file and a GMT file is a GMT file contains a header.

how can PyGMT know its geometry? As I understand it, PyGMT has to read the file to detect its format and geometry, which is very inefficient if a file is large.

Only the header is needed to understand the geometry, so only a few lines are needed.

The issue is how to deal with the other acceptable file formats.

Ok, so for Option 3, how about we detect if a Point type OGR/GMT file is passed in using the following conditions:

If both these conditions are met, we then set the default style -S as a square or cube for plot and plot3d respectively in PyGMT. Otherwise, it will just fallback to the GMT default style of using lines.

@seisman
Copy link
Member

seisman commented Aug 11, 2021

so for Option 3, how about we detect if a Point type OGR/GMT file is passed in using the following conditions:

If both these conditions are met, we then set the default style -S as a square or cube for plot and plot3d respectively in PyGMT. Otherwise, it will just fallback to the GMT default style of using lines.

Sounds a good compromise to me.

@yohaimagen
Copy link
Contributor

O.K, should I add it at #1405 or on a different pull request?

@weiji14
Copy link
Member Author

weiji14 commented Aug 11, 2021

O.K, should I add it at #1405 or on a different pull request?

Please put it in a separate Pull Request. Oh and btw @yohaimagen, since you belong to the pygmt-contributors team now, it would be easier if you can create new Pull Requests which are branched off of https://github.com/GenericMappingTools/pygmt instead of your own fork at https://github.com/yohaimagen/pygmt. Just makes it easier for the PyGMT team to checkout your work (and prevent the dvc workflow failure mentioned at #1405 (comment)).

@weiji14
Copy link
Member Author

weiji14 commented Aug 17, 2021

Reopening as we need to handle OGR/GMT files too, being done at #1438, but see also upstream GMT issue at GenericMappingTools/gmt#5629.

@weiji14 weiji14 reopened this Aug 17, 2021
@weiji14 weiji14 removed the question Further information is requested label Aug 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature wanted scipy-sprint Things to work on during the SciPy sprint!
Projects
None yet
4 participants