Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geocoding 2.0 #304

Merged
merged 79 commits into from
Jan 29, 2021
Merged
Changes from 1 commit
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
173d618
Put parents into regions dataframe and geodataframe
IKupriyanov-HORIS Sep 11, 2020
c8f8d1b
Implementing parents in regions_builder.
IKupriyanov-HORIS Sep 14, 2020
1fccca9
Merge branch 'master' into geoservices-1.1.0
IKupriyanov-HORIS Sep 14, 2020
6fef609
WIP: parents
IKupriyanov-HORIS Sep 15, 2020
9b5ae98
Fix tests, sync protocol with server
IKupriyanov-HORIS Sep 15, 2020
38f1db7
Fix more tests
IKupriyanov-HORIS Sep 16, 2020
54b4457
Fix more tests
IKupriyanov-HORIS Sep 25, 2020
299b6aa
New server IP in demo
IKupriyanov-HORIS Sep 25, 2020
701f78c
Merge branch 'master' into geoservices-1.1.0
IKupriyanov-HORIS Oct 5, 2020
a37a3f5
WIP: answers in protocol
IKupriyanov-HORIS Oct 14, 2020
f6ca8a3
WIP: moving to Answers instead of flat list of GeocodedFeature
IKupriyanov-HORIS Oct 14, 2020
52e7c1f
WIP: link between request and response
IKupriyanov-HORIS Oct 15, 2020
be18cc5
Getting rid of a query string from Answer and GeocodedFeature
IKupriyanov-HORIS Oct 17, 2020
e8219a8
Remove query from Answer and GeocodedFeature
IKupriyanov-HORIS Oct 19, 2020
326550b
Fix assertion condition
IKupriyanov-HORIS Oct 20, 2020
f6fc7c8
Remove chunked requests, better docs for new API
IKupriyanov-HORIS Oct 20, 2020
cbc1745
WIP: scope
IKupriyanov-HORIS Oct 21, 2020
02aa0ba
Merge branch 'master' into geoservices-1.1.0
IKupriyanov-HORIS Oct 21, 2020
cca501d
Update example with counties
IKupriyanov-HORIS Oct 22, 2020
91b027a
Update example with counties
IKupriyanov-HORIS Oct 22, 2020
21ec386
Basic support for scope in the new geocoding API
IKupriyanov-HORIS Oct 22, 2020
6f99bbe
Merge branch 'master' into geoservices-1.1.0
IKupriyanov-HORIS Oct 23, 2020
60968ca
Default scope value to None
IKupriyanov-HORIS Oct 26, 2020
f44d979
Unit tests for new geocoding API
IKupriyanov-HORIS Oct 28, 2020
9ed17b0
Merge branch 'master' into geoservices-1.1.0
IKupriyanov-HORIS Oct 28, 2020
539ef98
Merge branch 'master' into geoservices-1.1.0
IKupriyanov-HORIS Oct 28, 2020
4345821
Request validation, tests
IKupriyanov-HORIS Oct 28, 2020
648eef7
Experiment with map and nbviewer
IKupriyanov-HORIS Oct 29, 2020
d492167
Experiment with map and nbviewer and stamen tiles
IKupriyanov-HORIS Oct 29, 2020
bc846fa
Better parents/scope support, more tests
IKupriyanov-HORIS Oct 30, 2020
dcfd17b
where(..., near=..., within=...)
IKupriyanov-HORIS Nov 2, 2020
41e0ce1
Better error handling
IKupriyanov-HORIS Nov 3, 2020
26804b4
Tests for not available features in new API
IKupriyanov-HORIS Nov 4, 2020
fbd7c50
Better error handling for where function (missing key, scope and othe…
IKupriyanov-HORIS Nov 5, 2020
c444036
map_join with multikeys
IKupriyanov-HORIS Nov 14, 2020
0e213d9
Merge branch 'master' into geoservices-1.1.0
IKupriyanov-HORIS Nov 14, 2020
f7f5df8
Single key data_join_on and map_join_on.
IKupriyanov-HORIS Nov 15, 2020
a31ca74
Multi key data_join_on and map_join_on.
IKupriyanov-HORIS Nov 16, 2020
1f69608
Merge branch 'master' into geoservices-1.1.0
IKupriyanov-HORIS Nov 23, 2020
08f250f
map_join with dups, but livemap fails with null values in data
IKupriyanov-HORIS Nov 24, 2020
71060c2
Merge branch 'master' into geoservices-1.1.0
IKupriyanov-HORIS Nov 25, 2020
6c5f57b
Handle null values in pie/bar
IKupriyanov-HORIS Nov 26, 2020
cec1bdc
LP-62
IKupriyanov-HORIS Nov 27, 2020
c5375fb
Better error message on using scope with parents
IKupriyanov-HORIS Dec 4, 2020
5a2569d
Better error message for invalid scope type, countries request suppor…
IKupriyanov-HORIS Dec 7, 2020
4926246
More tests for us-48
IKupriyanov-HORIS Dec 8, 2020
c7c72ba
Merge branch 'master' into geoservices-1.1.0
IKupriyanov-HORIS Dec 8, 2020
dc5a21f
Merge branch 'master' into geoservices-1.1.0
IKupriyanov-HORIS Dec 14, 2020
b0ba74f
WIP: changing public API
IKupriyanov-HORIS Dec 15, 2020
536ed25
Merge branch 'master' into geoservices-1.1.0
IKupriyanov-HORIS Dec 18, 2020
fe750f7
Support Geocoder as geom_xxx(map=...) parameter
IKupriyanov-HORIS Dec 21, 2020
c317a16
Merge branch 'master' into geoservices-1.1.0
IKupriyanov-HORIS Dec 21, 2020
7531be1
Code cleanup, more tests
IKupriyanov-HORIS Dec 22, 2020
425fef3
Copy not matched dups from map on map_join (fix for last cell in geop…
IKupriyanov-HORIS Dec 23, 2020
ed5f0ea
Use single entry parent for all names
IKupriyanov-HORIS Dec 23, 2020
371525a
Fix empty result for select all kind of request
IKupriyanov-HORIS Dec 24, 2020
481b90d
where -> scope, near -> closest_to, single entry scope, remove auto-c…
IKupriyanov-HORIS Jan 11, 2021
4f3e114
Merge branch 'master' into geoservices-1.1.0
IKupriyanov-HORIS Jan 11, 2021
5002f32
Merge branch 'master' into geoservices-1.1.0
IKupriyanov-HORIS Jan 11, 2021
d316cbe
Remove MapRegion from 'scope' type error message
IKupriyanov-HORIS Jan 13, 2021
bfe2d0b
scope can now be used with county and state. scope and country will c…
IKupriyanov-HORIS Jan 14, 2021
f1a3014
Merge branch 'master' into geoservices-1.1.0
IKupriyanov-HORIS Jan 18, 2021
e747a7e
Handle any Iterable types in parents functions (counties/states/count…
IKupriyanov-HORIS Jan 20, 2021
3915dba
Enable gzip for geocoding responses
IKupriyanov-HORIS Jan 21, 2021
d8b1fac
Merge branch 'master' into geoservices-1.1.0
IKupriyanov-HORIS Jan 21, 2021
93c7aaa
Update docs
IKupriyanov-HORIS Jan 21, 2021
f1e98cd
Update docs
IKupriyanov-HORIS Jan 21, 2021
4b4a9ba
Keep original query name for ambiguous result with allow_ambiguous flag
IKupriyanov-HORIS Jan 22, 2021
ffaaab8
Fix error message
IKupriyanov-HORIS Jan 22, 2021
50156ed
Remove method to_data_frame and interface CanToDataFrame
IKupriyanov-HORIS Jan 25, 2021
8131764
WIP: update geocoding.md
IKupriyanov-HORIS Jan 26, 2021
1917b39
WIP: update geocoding.md
IKupriyanov-HORIS Jan 26, 2021
a908666
Revert changes to builder.ipynb
IKupriyanov-HORIS Jan 26, 2021
ff331eb
WIP: update geocoding.md
IKupriyanov-HORIS Jan 26, 2021
b6fa9b4
Use geocoding level instead of 'request' for a column name, remove da…
IKupriyanov-HORIS Jan 28, 2021
851fa40
Merge branch 'master' into geoservices-1.1.0
IKupriyanov-HORIS Jan 28, 2021
83002eb
Support Geocoder in ggplot(data=...)
IKupriyanov-HORIS Jan 29, 2021
99bd3c3
Switch to the new geocoding server
IKupriyanov-HORIS Jan 29, 2021
a203c3b
Merge branch 'master' into geoservices-1.1.0
IKupriyanov-HORIS Jan 29, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
WIP: update geocoding.md
  • Loading branch information
IKupriyanov-HORIS committed Jan 26, 2021
commit 813176493dc651543412013a8dbd7d6f841a657a
367 changes: 330 additions & 37 deletions docs/geocoding.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,60 +11,69 @@ Geocoding is the process of converting names of places into geographic coordinat
*Lets-Plot* geocoding API allows a user to execute a single and batch geocoding queries, and handle possible
names ambiguity.

Relatively simple geocoding queries are executed using the `regions_xxx()` functions family. For example:
The core class is `Geocoder`. There is a function's family for constsructing the `Geocoder` object - `geocode_cities()`, `geocode_counties()`, `geocode_states()`, `geocode_countries()` and `geocode()`. For example:
```python
from lets_plot.geo_data import *
regions_country(['usa', 'canada'])
countries = geocode_countries(['usa', 'canada'])
```
returns the `Regions` object containing internal IDs for Canada and the US:
Notice that actual geocoding process is not happening here, it starts when any `get_xxx()` function get called. For this document I will use function `get_geocodes()` that returns `DataFrame` with metadata. It is usefull for testing.

Lets geocode countries:
```python
countries.get_geocodes()
```
returns the `DataFrame` object containing internal IDs for Canada and the US:
```
request id found name
0 usa 297677 United States of America
1 canada 2856251 Canada
|id |request |found name
----------------------------------
0 |297677 |usa |United States
1 |2856251 |canada |Canada
```
More complex geocoding queries can be created with the help of the `regions_builder()` function that
returns the `RegionsBuilder` object and allows chaining its various methods in order to specify
More complex geocoding queries can be created with the help of the `Geocoder` object by chaining its various methods in order to specify
how to handle geocoding ambiguities.

For example:
```python
regions_builder(request='warwick', level='city') \
geocode_cities('warwick') \
.allow_ambiguous() \
.build()
.get_geocodes()
```
This sample returns the `Regions` object containing IDs of all cities matching "warwick":
```
request id found name
0 warwick 785807 Warwick
1 warwick 363189 Warwick
2 warwick 352173 Warwick
3 warwick 15994531 Warwick
4 warwick 368499 Warwick
5 warwick 239553 Warwick
6 warwick 352897 Warwick
7 warwick 3679247 Warwick
8 warwick 8144841 Warwick
9 warwick 382429 West Warwick
10 warwick 7042961 Warwick Township
11 warwick 6098747 Warwick Township
12 warwick 15994533 Sainte-Élizabeth-de-Warwick
This sample returns the `DataFrame` object containing IDs of all cities matching "warwick":
```

|id |request |found name
----------------------------------------------------
0 |239553 |warwick |Warwick
1 |352173 |warwick |Warwick
2 |352897 |warwick |Warwick
3 |363189 |warwick |Warwick
4 |368499 |warwick |Warwick
5 |785807 |warwick |Warwick
6 |3679247 |warwick |Warwick
7 |8144841 |warwick |Warwick
8 |15994531 |warwick |Warwick
9 |382429 |warwick |West Warwick
10 |6098747 |warwick |Warwick Township
11 |7042961 |warwick |Warwick Township
12 |18489127 |warwick |Warwick Mountain
13 |15994533 |warwick |Sainte-Élizabeth-de-Warwick
```
```python
boston_us = regions(request='boston', within='us')
regions_builder(request='warwick', level='city') \
.where('warwick', near=boston_us) \
.build()
boston_us = geocode_cities('boston').scope('us')
geocode_cities('warwick') \
.where('warwick', closest_to=boston_us) \
.get_geocodes()
```
This example returns the `Regions` object containing the ID of one particular "warwick" near Boston (US):
This example returns the `DataFrame` object containing the ID of one particular "warwick" closest to Boston (US):
```
request id found name
0 warwick 785807 Warwick
|id |request |found name
------------------------------
0 |785807 |warwick |Warwick
```
Once the `Regions` object is available, it can be passed to any *Lets-Plot* geom
Once the `Geocoder` object is available, it can be passed to any *Lets-Plot* geom
supporting the `map` parameter.

If necessary, the `Regions` object can be transformed into a regular pandas `DataFrame` using `to_data_frame()` method
or to a geopandas `GeoDataFrame` using one of `centroids()`, `boundaries()`, or `limits()` methods.
If necessary, the `Geocoder` object can be transformed to a geopandas `GeoDataFrame` using one of `get_centroids()`, `get_boundaries()`, or `get_limits()` methods.

All coordinates are in the EPSG:4326 coordinate reference system (CRS).

Expand Down Expand Up @@ -96,4 +105,288 @@ Examples:
<img src="https://raw.githubusercontent.com/JetBrains/lets-plot/master/docs/examples/images/logo_kaggle.svg" width="20" height="20">
</a>
<br>
<img src="https://raw.githubusercontent.com/JetBrains/lets-plot/master/docs/examples/images/map_airports.png" alt="Couldn't load map_airports.png" width="547" height="311">
<img src="https://raw.githubusercontent.com/JetBrains/lets-plot/master/docs/examples/images/map_airports.png" alt="Couldn't load map_airports.png" width="547" height="311">

##Reference

#### Levels
Geocoding supports 4 administrative levels:
- city
- county
- state
- country


Function `geocode()` with `level=None` can try to detect level automatically - it enumerates all levels from country to city and selects best matching level (result without ambiguity and unknown names). For example:
```python
geocode(names=['florida', 'tx']).get_geocodes()
```

```
|id |request |found name
------------------------------
0 |324101 |florida |Florida
1 |229381 |tx |Texas
```
While it is usefull it works slower and is not recomended to use on large data sets.


Functions `geocode_cities()`, `geocode_counties()`, `geocode_states()`, `geocode_countries()` or `geocode(level=xxx)` search names only at given level or return an error.
```python
geocode_states(['florida', 'tx']).get_geocodes()
```



####Parents
`Geocoder` class provides functions for defining parents with giving administrative level - `counties()`, `states()`, `countries()`. Functions can handle single or miltiply values of types string or `Geocoder`. Number of values must match number of names in `Geocoder` so they form a table, i.e. every name associated by an index with coresponding parent. Parents will be present in result `DataFrame` to make it possible to join data and geometry via `map_join`.

```python
geocode_cities(['warwick', 'worcester'])\
.counties(['Worth County', 'worcester county'])\
.states(['georgia', 'massachusetts'])\
.get_geocodes()
```
```
|id | request |found name |county |state
--------------------------------------------------------------
0 |239553 | warwick |Warwick |Worth County |georgia
1 |3688419 | worcester |Worcester |worcester county |massachusetts
```

Parents can contain `None` values, e.g., countries having different administrative division:
```python
geocode_cities(['warwick', 'worcester'])\
.states(['Georgia', None])\
.countries(['USA', 'United Kingdom'])\
.get_geocodes()
```
```

|id |request |found name |state |country
--------------------------------------------------------------
0 |239553 |warwick |Warwick |Georgia |USA
1 |3750683 |worcester |Worcester |None |United Kingdom
```

Parent can be `Geocoder` object. This allows resolving parent's ambiguity:
```python

s = geocode_states(['vermont', 'georgia']).scope('usa')
geocode_cities(['worcester', 'warwick']).states(s).get_geocodes()
```
```
|id |request |found name |state
-------------------------------------------
0 |17796275 |worcester |Worcester |vermont
1 |239553 |warwick |Warwick |georgia
```

#####Scope
`scope()` is a special kind of parent. `scope()` can handle a `string` or a single entry `Geocoder` object. `scope()` is not associated with any administrative level, it acts as parent for any other parents (or names if no other parents set). `scope()` can't be used with `countries()` - countries don't have parents. Typical use-case is when all names belong to the same parent - you don't need to generate list with required length to pass it as a parent, just use the `scope()` with single value.

```python
geocode_counties(['Dakota County', 'Nevada County']).states(['NE', 'AR']).scope('USA').get_geocodes()
```
```
|id |request |found name |state
------------------------------------------------
0 |2850895 |Dakota County |Dakota County |NE
1 |3653651 |Nevada County |Nevada County |AR
```

Parents can be modified between searches:

```python
florida = geocode_states('florida')

display(florida.countries('usa').get_geocodes())
display(florida.countries('uruguay').get_geocodes())
display(florida.countries(None).get_geocodes())
```

```
id |request |found name |country
------------------------------------
324101 |florida |Florida |usa

id |request |found name |country
------------------------------------
3270329|florida |Florida |uruguay

id |request |found name
---------------------------
324101 |florida |Florida
```
#####Fetch all

It is possible to fetch all objects within parent - just don't set the `names` parameter.

```python
geocode_counties().states('massachusetts').get_geocodes()
```

```
|id |request |found name |state
-------------------------------------------------------------
0 |2363239 |Hampden County |Hampden County |massachusetts
1 |122643 |Berkshire County |Berkshire County |massachusetts
2 |180869 |Essex County |Essex County |massachusetts
3 |3677609 |Hampshire County |Hampshire County |massachusetts
4 |3677611 |Worcester County |Worcester County |massachusetts
...
```

#####US-48 (CONUS)
Geocoding supports a special name - `us-48` also known as CONUS. This name can be used as name or parent.
```python
geocode_states('us-48').get_geocodes()
```
```
|id |request |found name
---------------------------------------
0 |121519 |Vermont |Vermont
1 |122631 |Massachusetts |Massachusetts
2 |122641 |New York |New York
3 |127025 |Maine |Maine
4 |134427 |New Hampshire |New Hampshire
...
```

####Ambiguity
Often geocoding can find multiply objects for a name or don't find anything. in this case error will be generated:
```python
geocode_cities(['warwick', 'worcester']).get_geocodes()
```
```
Multiple objects (14) were found for warwick:

- Warwick (United States, Georgia, Worth County)
- Warwick (United States, New York, Orange County)
- Warwick (United Kingdom, England, West Midlands, Warwickshire)
- Warwick (United States, North Dakota, Benson County)
- Warwick (United States, Oklahoma, Lincoln County)
- Warwick (United States, Rhode Island, Kent County)
- Warwick (United States, Massachusetts, Franklin County)
- Warwick (Canada, Ontario, Southwestern Ontario, Lambton County)
- Warwick (Canada, Québec, Centre-du-Québec, Arthabaska)
- West Warwick (United States, Rhode Island, Kent County) Multiple objects (4) were found for worcester:
- Worcester (United States, Massachusetts, Worcester County)
- Worcester (United Kingdom, England, West Midlands, Worcestershire)
- Worcester (United States, Vermont, Washington County)
- Worcester Township (United States, Pennsylvania, Montgomery County)
```

The ambiguity can be resolved in different ways.

#####`allow_ambiguous()`

The best way is to find an object that we search and use its parents. Function `allow_ambiguous()` converts error result into success result that can be rendered on a map or verified manually in other way.

```python
geocode_cities(['warwick', 'worcester']).allow_ambiguous().get_geocodes()
```
```
|id |request |found name
------------------------------
0 |239553 |warwick |Warwick
1 |352173 |warwick |Warwick
2 |352897 |warwick |Warwick
3 |363189 |warwick |Warwick
4 |368499 |warwick |Warwick
```

#####`sksip_missing()`
The function `drop_not_found()` removes unknown names from result.
```python
geocode_cities(['paris', 'foo']).drop_not_found().get_geocodes()
```

```
|id |request |found name
-----------------------------
0 |14889 |paris |Paris
```

#####`drop_not_matched()`
If request contains both unknown and ambiguous names then `drop_not_matched()` function can be used to remove them all from result.
```python
geocode_cities(['paris', 'worcester', 'foo']).drop_not_matched().get_geocodes()
```
```
|id |request |found name
-----------------------------
0 |14889 |paris |Paris
```

#####`where()`
For resolving an ambiguity geocoding provides a function that can configure names individually.
To configure a name the function `where(...)` should be called with the place name and all given parent names. Parents can't be changed via `where()` function call. If name and parents don't match with ones from the `where()` function an error will be generated. This is importnant for cases like this:
```python
geocode_counties(['Washington', 'Washington']).states(['oregon', 'utah']).get_geocodes()
```
```
|id |request |found name |state
-------------------------------------------------
0 |3674267 |Washington |Washington County |oregon
1 |3488745 |Washington |Washington County |utah
```

With parameter `closest_to` geocoding will take the only object that is closest to it. Parameter can be a single value `Geocoder`.
```python
boston = geocode_cities('boston')
geocode_cities('worcester').where('worcester', closest_to=boston).get_geocodes()
```

```
|id |request |found name
---------------------------------
0 |3688419 |worcester |Worcester
```
Or parameter can be a `shapely.geometry.Point`.
```python
geocode_cities('worcester').where('worcester', closest_to=shapely.geometry.Point(-71.088, 42.311)).get_geocodes()
```
```
|id |request |found name
---------------------------------
0 |3688419 |worcester |Worcester
```

With parameter `scope` a `shapely.geometry.Polygon` can be used for limiting an area of the search (coordinates should be in WGS84 cordinate system). Notice that bbox of the polygon will be used:
```python
geocode_cities('worcester')\
.where('worcester', scope=shapely.geometry.box(-71.00, 42.00, -72.00, 43.00))\
.get_geocodes()
```
```
|id |request |found name
---------------------------------
0 |3688419 |worcester |Worcester
```

Also, `scope` can be a single value `Geocoder` object or a `string`:
```python
massachusetts = geocode_states('massachusetts')
geocode_cities('worcester').where('worcester', scope=massachusetts).get_geocodes()
```

`scope` doesn't change parents in a result `DataFrame`:
```python
worcester_county=geocode_counties('Worcester County').states('massachusetts').countries('usa')

geocode_cities(['worcester', 'worcester'])\
.countries(['USA', 'United Kingdom'])\
.where('worcester', country='USA', scope=worcester_county)\
.get_geocodes()
```

```
|id |request |found name |country
-------------------------------------------------
0 |3688419 |worcester |Worcester |USA
1 |3750683 |worcester |Worcester |United Kingdom
```

##`map_join`
WIP