Spark GDB

In the wake of the unpredictable future of User Defined Types (UDT), this is a hasty minimalist re-implementation of the spark-gdb project, in such that the content of a File GeoDatabase can be mapped to a read-only Spark DataFrame. It is minimalist as it only supports features with simple geometries (for now :-) with no M or Z.

In the previous implementation, a GeometryType was defined using the UDT framework. However in this implementation, points are stored in a field with two sub-fields x and y. ~~Polylines and polygons are stored as a string in the Esri JSON format. It is not the most efficient format, but will make the interoperability with the ArcGIS API for Python a bit seamless.~~ Polylines and Polygons shapes are stored as two sub fields, parts and coords. Parts is an array of integers, where the values are the number of points in the part. Coords is an array of doubles, where the values are a sequence of x,y pairs.

Notes:

This implementation does not support compressed file geo databases.
It is HIGHLY recommended to create a fully compacted feature class before using this implementation.
The best way to create a compacted feature class is to copy the edited feature class to a new feature class.
Date field is a timestamp with UTC timezone.

Changes

Sep 10, 2021, Version 0.41 is a breaking change in the FileGDB object.

Building the project using Maven:

mvn clean install

Usage

The best demonstration of the usage of this implementation is with PySpark DataFrames and in conjunction with the ArcGIS API for Python.

Create a Python 3 conda environment:

conda remove --yes --all --name py36
conda create --yes -n py36 -c conda-forge python=3.6 openjdk=8 findspark py4j

conda create --name arcgis python=3.6
conda activate arcgis
conda install -c esri arcgis
conda install matplotlib

Assuming that the environment variable SPARK_HOME points to the location of a Spark installation, start a Jupyter notebook that is backed by PySpark:

export PATH=${SPARK_HOME}/bin:${PATH}
export SPARK_LOCAL_IP=$(hostname)
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
export GDB_MIN=2.11 # Spark 2.3
# export GDB_MIN=2.12 # Spark 2.4
export GDB_VER=0.18
pyspark\
 --master local[*]\
 --num-executors 1\
 --driver-memory 16G\
 --executor-memory 16G\
 --packages com.esri:webmercator_${GDB_MIN}:1.4,com.esri:filegdb_${GDB_MIN}:${GDB_VER}

Check out the Broadcast and Countries example notebooks.

Here is yet another example in Scala:

import com.esri.gdb._

val path = "World.gdb"
val name = "Countries"

val spark = SparkSession.builder().getOrCreate()
try
{
    spark
      .read
      .gdb(path, name)
      .createTempView(name)

    spark
      .sql(s"select CNTRY_NAME,SQKM from $name where SQKM < 10000.0 ORDER BY SQKM DESC LIMIT 10")
      .collect()
      .foreach(println)
}
finally
{
    spark.stop()
}

TODO

Write test cases. Come on Mansour, u know better !!
~~Save geometry as a struct(type,xmin,ymin,xmax,ymax,parts,coords)~~
Add option to skip reading the geometry.
Add option to return geometry envelope only.
Add option to return timestamp field as millis long.
Read geometry as WKB.
Add geometry extent as subfields to Shape.

Notes To Self

Install JDK-1.8
Set path to %JAVA_HOME%\bin,%JAVA_HOME%\jre\bin
keytool -import -alias cacerts -keystore cacerts -file C:\Windows\System32\documentdbemulatorcert.cer

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
data		data
notebooks		notebooks
project		project
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark GDB

Changes

Building the project using Maven:

Usage

TODO

Notes To Self

References

About

Releases

Packages

Languages

License

dingsl-giser/FileGDB

Folders and files

Latest commit

History

Repository files navigation

Spark GDB

Changes

Building the project using Maven:

Usage

TODO

Notes To Self

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages