To download OSM data in the desired format, follow these steps:
- Go to the Overpass Turbo website: https://overpass-turbo.eu/
- In the top left corner, click the "Wizard" button.
- Enter a search query to filter the data based on your project's needs. For example, you can search for specific types of points of interest like "amenity=cafe" to retrieve all cafes in the current map view.
- Click "Build and Run Query" to execute the query.
- Once the query is completed, the map will show the matching features. You can pan and zoom to adjust the area of interest.
- Click the "Export" button in the top right corner.
- Choose the desired format (GeoJSON or KML) and click "download."
Downloaded all data for cafes in New York City Area in GeoJSON format. Used the following query:
Then we imported the data using the following command:
ogr2ogr -f "PostgreSQL" PG:"dbname=gis_analysis user=postgres host=/var/run/postgresql port=5432" "/home/ubuntu/export.geojson" -nln public.geo_features -lco GEOMETRY_NAME=geom -lco FID=gid -lco PRECISION=NO -nlt PROMOTE_TO_MULTI -a_srs EPSG:4326
We retreived cafe which are marked as coffee_shop as cuisine. We used the following query for the same:
SELECT gid, name, ST_AsText(geom) AS location
FROM public.geo_features
WHERE cuisine = 'coffee_shop' LIMIT 10;
Below is the output:
Below is the query that we used for gid 1 and gid 2
SELECT
ST_Distance(
ST_Transform(a.geom, 3857),
ST_Transform(b.geom, 3857)
) AS distance_meters
FROM
public.geo_features AS a,
public.geo_features AS b
WHERE
a.gid = 1 AND
b.gid = 2;
Below is the output:
This query creates a temporary table called buffer_areas with a single row containing the union of the 500-meter buffer areas around cafes that are in buildings. Then, it calculates the area of the resulting polygon in square meters.
WITH buffer_areas AS (
SELECT ST_Union(ST_Buffer(ST_Transform(geom, 3857), 500)) AS geom
FROM public.geo_features
WHERE amenity = 'cafe' AND building = 'yes'
)
SELECT ST_Area(ST_Transform(geom, 3857)) AS area_sq_meters
FROM buffer_areas;
Following is the output:
Here is the analysis of Goal 1:
Here is the analysis for Goal 2:
Here is the analysis for Goal 3:
Sorting and limiting query results can help us retrieve only the most relevant data or present the results in a more user-friendly manner. You can use the ORDER BY clause to sort the results and the LIMIT clause to limit the number of rows returned.
Sorting and Limit on Goal 1:
SELECT gid, name, ST_AsText(geom) AS location
FROM public.geo_features
WHERE cuisine = 'coffee_shop' LIMIT 10;
Sorting and Limit on Goal 2:
SELECT gid, name, ST_AsText(geom) AS location,
ST_Distance(ST_Transform(geom, 3857), ST_Transform(ST_SetSRID(ST_Point(-74.0014, 40.7289), 4326), 3857)) AS distance_meters
FROM public.geo_features
WHERE amenity = 'cafe'
ORDER BY distance_meters ASC
LIMIT 5;
Sorting and Limit on Goal 3:
WITH buffer_areas AS (
SELECT gid, ST_Union(ST_Buffer(ST_Transform(geom, 3857), 500)) AS geom
FROM public.geo_features
WHERE amenity = 'cafe' AND building = 'yes'
GROUP BY gid
),
buffer_areas_with_area AS (
SELECT gid, geom, ST_Area(ST_Transform(geom, 3857)) AS area_sq_meters
FROM buffer_areas
)
SELECT gid, area_sq_meters
FROM buffer_areas_with_area
ORDER BY area_sq_meters DESC
LIMIT 3;
Query optimization can be done using indexes, simplifying calculations, or reducing the number of rows scanned. Here are some optimization techniques applied to the queries we used before:
Optimization of Goal 1:
To optimize this query, we can create an index on the cuisine column, which will speed up the selection process:
CREATE INDEX idx_cuisine ON public.geo_features (cuisine);
The query then remains the same:
By creating an index on the cuisine column, the database will be able to quickly filter the rows that match the condition, improving the query's performance. Keep in mind that creating indexes may increase storage space and impact data modification performance, so you should find the right balance between query performance and storage/modification costs.
Optimization of Goal 2:
We can create a spatial index on the geom column to speed up spatial operations:
CREATE INDEX idx_geo_features_geom ON public.geo_features USING gist(geom);
The query remains the same:
SELECT gid, name, ST_AsText(geom) AS location,
ST_Distance(ST_Transform(geom, 3857), ST_Transform(ST_SetSRID(ST_Point(-74.0014, 40.7289), 4326), 3857)) AS distance_meters
FROM public.geo_features
WHERE amenity = 'cafe'
ORDER BY distance_meters ASC
LIMIT 5;
Optimization of Goal 3:
We can use the previously created spatial index on the geom column to speed up the spatial operations in this query:
WITH buffer_areas AS (
SELECT gid, ST_Union(ST_Buffer(ST_Transform(geom, 3857), 500)) AS geom
FROM public.geo_features
WHERE amenity = 'cafe' AND building = 'yes'
GROUP BY gid
),
buffer_areas_with_area AS (
SELECT gid, geom, ST_Area(ST_Transform(geom, 3857)) AS area_sq_meters
FROM buffer_areas
)
SELECT gid, area_sq_meters
FROM buffer_areas_with_area
ORDER BY area_sq_meters DESC
LIMIT 3;
These optimizations should improve query performance by utilizing indexes for filtering and spatial operations. Note that creating indexes can increase storage space and impact the performance of data modifications (insert, update, delete), so it's essential to find the right balance between query performance and storage/modification costs.
Optimization of Goal 1:
To N-optimize this query, we can create an index on the cuisine column, which will help filter rows more efficiently as the database grows:
CREATE INDEX idx_cuisine ON public.geo_features (cuisine);
Then, run the original query:
SELECT gid, name, ST_AsText(geom) AS location
FROM public.geo_features
WHERE cuisine = 'coffee_shop'
LIMIT 10;
By creating an index on the cuisine column, the database will be able to filter the rows with 'coffee_shop' value faster, even when the number of rows in the database increases significantly.
N-Optimization of Goal 2:
In this query, we created a spatial index on the geom column in a previous step:
CREATE INDEX idx_geo_features_geom ON public.geo_features USING gist(geom);
However, the ST_Distance function may not be efficient with a large number of rows. To improve the performance, we can use the ST_DWithin function to filter only the cafes within a certain distance of the specific point, and then calculate the exact distance:
SELECT gid, name, ST_AsText(geom) AS location,
ST_Distance(ST_Transform(geom, 3857), ST_Transform(ST_SetSRID(ST_Point(-74.0014, 40.7289), 4326), 3857)) AS distance_meters
FROM public.geo_features
WHERE amenity = 'cafe' AND ST_DWithin(ST_Transform(geom, 3857), ST_Transform(ST_SetSRID(ST_Point(-74.0014, 40.7289), 4326), 3857), 5000)
ORDER BY distance_meters ASC
LIMIT 5;
This query will first filter cafes within 5000 meters of the specific point and then sort them by distance.
N-Optimization of Goal 3:
As we already created a spatial index on the geom column, it will help in spatial operations for this query:
CREATE INDEX idx_geo_features_geom ON public.geo_features USING gist(geom);
Additional Query: Cafes with cuisine = 'coffee_shop'
We created an index on the cuisine column to optimize this query:
CREATE INDEX idx_cuisine ON public.geo_features (cuisine);
These optimizations will help the queries run efficiently even when the number of rows in the database grows significantly.
Here are the github links for all the 3 group members:
Tarun Dagar
Ankush Gurhani
Dhruv Chamaria
Here are the links to the Presentataion by each student:
Tarun Dagar
Anksuh Gurhani
Dhruv Chamaria