Skip to content

Ankush-PR/DBMS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DBMS

Step 0: Install Postgis and Enable it on Postgres

Created a new PostgreSQL database and enable the PostGIS extension:

Screenshot 2023-05-10 at 10 47 59 PM

Step 1: Get the Data from an online source.

To download OSM data in the desired format, follow these steps:

  1. Go to the Overpass Turbo website: https://overpass-turbo.eu/
  2. In the top left corner, click the "Wizard" button.
  3. Enter a search query to filter the data based on your project's needs. For example, you can search for specific types of points of interest like "amenity=cafe" to retrieve all cafes in the current map view.
  4. Click "Build and Run Query" to execute the query.
  5. Once the query is completed, the map will show the matching features. You can pan and zoom to adjust the area of interest.
  6. Click the "Export" button in the top right corner.
  7. Choose the desired format (GeoJSON or KML) and click "download."

Downloaded all data for cafes in New York City Area in GeoJSON format. Used the following query:

Screenshot 2023-05-10 at 9 40 02 PM

Step 2: Get the Data from an online source.

Then we imported the data using the following command:

ogr2ogr -f "PostgreSQL" PG:"dbname=gis_analysis user=postgres host=/var/run/postgresql port=5432" "/home/ubuntu/export.geojson" -nln public.geo_features -lco GEOMETRY_NAME=geom -lco FID=gid -lco PRECISION=NO -nlt PROMOTE_TO_MULTI -a_srs EPSG:4326

Goals

Goal 1: Retrieve Locations of specific features

We retreived cafe which are marked as coffee_shop as cuisine. We used the following query for the same:

SELECT gid, name, ST_AsText(geom) AS location
FROM public.geo_features
WHERE cuisine = 'coffee_shop' LIMIT 10;

Below is the output:

Screenshot 2023-05-10 at 9 44 43 PM

Goal 2: Retrieve Locations of specific features

To calculate the distance between two points, we used the ST_Distance function in PostGIS. First, we need to pick two points based on gid value and then use the geom column for the distance calculation.

Below is the query that we used for gid 1 and gid 2

SELECT
ST_Distance(
ST_Transform(a.geom, 3857),
ST_Transform(b.geom, 3857)
) AS distance_meters
FROM
public.geo_features AS a,
public.geo_features AS b
WHERE
a.gid = 1 AND
b.gid = 2;

Below is the output:

Screenshot 2023-05-10 at 10 05 41 PM

Goal 3: Calculate Areas of Interest (specific to each group)

Based on the data that we have let's create a buffer of 500 meters around cafes that are located in buildings and then calculate the area of the resulting polygons.

This query creates a temporary table called buffer_areas with a single row containing the union of the 500-meter buffer areas around cafes that are in buildings. Then, it calculates the area of the resulting polygon in square meters.

WITH buffer_areas AS (
SELECT ST_Union(ST_Buffer(ST_Transform(geom, 3857), 500)) AS geom
FROM public.geo_features
WHERE amenity = 'cafe' AND building = 'yes'
)
SELECT ST_Area(ST_Transform(geom, 3857)) AS area_sq_meters
FROM buffer_areas;

Following is the output:

Screenshot 2023-05-10 at 10 10 16 PM

Goal 4: Analyze the Queries

Analyzing queries can help you understand the performance and efficiency of your queries. In PostgreSQL, you can use the EXPLAIN or EXPLAIN ANALYZE command to get insights about your queries. The EXPLAIN command provides the query execution plan, while EXPLAIN ANALYZE also executes the query and provides additional details about actual execution times and rows returned.

Here is the analysis of Goal 1:

Screenshot 2023-05-10 at 10 12 18 PM

Here is the analysis for Goal 2:

Screenshot 2023-05-10 at 10 13 10 PM

Here is the analysis for Goal 3:

Screenshot 2023-05-10 at 10 14 01 PM

Goal 5: Sorting and Limit Executions

Sorting and limiting query results can help us retrieve only the most relevant data or present the results in a more user-friendly manner. You can use the ORDER BY clause to sort the results and the LIMIT clause to limit the number of rows returned.

Sorting and Limit on Goal 1:

SELECT gid, name, ST_AsText(geom) AS location
FROM public.geo_features
WHERE cuisine = 'coffee_shop' LIMIT 10;
Screenshot 2023-05-10 at 10 16 19 PM

Sorting and Limit on Goal 2:

SELECT gid, name, ST_AsText(geom) AS location,
ST_Distance(ST_Transform(geom, 3857), ST_Transform(ST_SetSRID(ST_Point(-74.0014, 40.7289), 4326), 3857)) AS distance_meters
FROM public.geo_features
WHERE amenity = 'cafe'
ORDER BY distance_meters ASC
LIMIT 5;

Screenshot 2023-05-10 at 10 17 50 PM

Sorting and Limit on Goal 3:

WITH buffer_areas AS (
SELECT gid, ST_Union(ST_Buffer(ST_Transform(geom, 3857), 500)) AS geom
FROM public.geo_features
WHERE amenity = 'cafe' AND building = 'yes'
GROUP BY gid
),
buffer_areas_with_area AS (
SELECT gid, geom, ST_Area(ST_Transform(geom, 3857)) AS area_sq_meters
FROM buffer_areas
)
SELECT gid, area_sq_meters
FROM buffer_areas_with_area
ORDER BY area_sq_meters DESC
LIMIT 3;

Screenshot 2023-05-10 at 10 19 47 PM

Goal 6: Optimize the queries to speed up execution time (10 marks)

Query optimization can be done using indexes, simplifying calculations, or reducing the number of rows scanned. Here are some optimization techniques applied to the queries we used before:

Optimization of Goal 1:

To optimize this query, we can create an index on the cuisine column, which will speed up the selection process:

CREATE INDEX idx_cuisine ON public.geo_features (cuisine);

The query then remains the same:

Screenshot 2023-05-10 at 10 22 03 PM

Screenshot 2023-05-10 at 10 23 08 PM

By creating an index on the cuisine column, the database will be able to quickly filter the rows that match the condition, improving the query's performance. Keep in mind that creating indexes may increase storage space and impact data modification performance, so you should find the right balance between query performance and storage/modification costs.
Optimization of Goal 2:

We can create a spatial index on the geom column to speed up spatial operations:

CREATE INDEX idx_geo_features_geom ON public.geo_features USING gist(geom);

The query remains the same:

SELECT gid, name, ST_AsText(geom) AS location,
ST_Distance(ST_Transform(geom, 3857), ST_Transform(ST_SetSRID(ST_Point(-74.0014, 40.7289), 4326), 3857)) AS distance_meters
FROM public.geo_features
WHERE amenity = 'cafe'
ORDER BY distance_meters ASC
LIMIT 5;
Screenshot 2023-05-10 at 10 25 36 PM

Screenshot 2023-05-10 at 10 26 16 PM

Optimization of Goal 3:

We can use the previously created spatial index on the geom column to speed up the spatial operations in this query:
WITH buffer_areas AS (
SELECT gid, ST_Union(ST_Buffer(ST_Transform(geom, 3857), 500)) AS geom
FROM public.geo_features
WHERE amenity = 'cafe' AND building = 'yes'
GROUP BY gid
),
buffer_areas_with_area AS (
SELECT gid, geom, ST_Area(ST_Transform(geom, 3857)) AS area_sq_meters
FROM buffer_areas
)
SELECT gid, area_sq_meters
FROM buffer_areas_with_area
ORDER BY area_sq_meters DESC
LIMIT 3;

These optimizations should improve query performance by utilizing indexes for filtering and spatial operations. Note that creating indexes can increase storage space and impact the performance of data modifications (insert, update, delete), so it's essential to find the right balance between query performance and storage/modification costs.

Screenshot 2023-05-10 at 10 29 48 PM

Goal 7: N-Optimization of Queries

N-optimization refers to optimizing queries in a way that they run efficiently even when the number of rows in the database increases significantly. This can be achieved by using indexes, limiting the amount of data processed, and selecting only the necessary columns. Below are the N-optimizations for the queries we used.
Optimization of Goal 1:
To N-optimize this query, we can create an index on the cuisine column, which will help filter rows more efficiently as the database grows:
CREATE INDEX idx_cuisine ON public.geo_features (cuisine);
Then, run the original query:
SELECT gid, name, ST_AsText(geom) AS location
FROM public.geo_features
WHERE cuisine = 'coffee_shop'
LIMIT 10;
Screenshot 2023-05-10 at 10 32 40 PM
By creating an index on the cuisine column, the database will be able to filter the rows with 'coffee_shop' value faster, even when the number of rows in the database increases significantly.
N-Optimization of Goal 2:
In this query, we created a spatial index on the geom column in a previous step:
CREATE INDEX idx_geo_features_geom ON public.geo_features USING gist(geom);
However, the ST_Distance function may not be efficient with a large number of rows. To improve the performance, we can use the ST_DWithin function to filter only the cafes within a certain distance of the specific point, and then calculate the exact distance:

SELECT gid, name, ST_AsText(geom) AS location,
ST_Distance(ST_Transform(geom, 3857), ST_Transform(ST_SetSRID(ST_Point(-74.0014, 40.7289), 4326), 3857)) AS distance_meters
FROM public.geo_features
WHERE amenity = 'cafe' AND ST_DWithin(ST_Transform(geom, 3857), ST_Transform(ST_SetSRID(ST_Point(-74.0014, 40.7289), 4326), 3857), 5000)
ORDER BY distance_meters ASC
LIMIT 5;
Screenshot 2023-05-10 at 10 35 39 PM
This query will first filter cafes within 5000 meters of the specific point and then sort them by distance.
N-Optimization of Goal 3:
As we already created a spatial index on the geom column, it will help in spatial operations for this query:

CREATE INDEX idx_geo_features_geom ON public.geo_features USING gist(geom);

Additional Query: Cafes with cuisine = 'coffee_shop'

We created an index on the cuisine column to optimize this query:

CREATE INDEX idx_cuisine ON public.geo_features (cuisine);

These optimizations will help the queries run efficiently even when the number of rows in the database grows significantly.
Screenshot 2023-05-10 at 10 37 29 PM

Goal 8: Presentation and Posting to Individual GitHub

Here are the github links for all the 3 group members:

Tarun Dagar
Ankush Gurhani
Dhruv Chamaria

Here are the links to the Presentataion by each student:

Tarun Dagar
Anksuh Gurhani
Dhruv Chamaria

Goal 9: Code functionality, documentation and proper output provided

This document already contains all the details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published