Skip to content
/ livyc Public

Apache Spark as a Service with Apache Livy Client

License

Notifications You must be signed in to change notification settings

Wittline/livyc

Repository files navigation

livyc

Apache Livy Client

Install library

pip install livyc

Import library

from livyc import livyc

Setting livy configuration

data_livy = {
    "livy_server_url": "localhost",
    "port": "8998",
    "jars": ["org.postgresql:postgresql:42.3.1"]
}

Let's try launch a pySpark script to Apache Livy Server

params = {"host": "localhost", "port":"5432", "database": "db", "table":"staging", "user": "postgres", "password": "pg12345"}
pyspark_script = """

    from pyspark.sql.functions import udf, col, explode
    from pyspark.sql.types import StructType, StructField, IntegerType, StringType, ArrayType
    from pyspark.sql import Row
    from pyspark.sql import SparkSession


    df = spark.read.format("jdbc") \
        .option("url", "jdbc:postgresql:https://{host}:{port}/{database}") \
        .option("driver", "org.postgresql.Driver") \
        .option("dbtable", "{table}") \
        .option("user", "{user}") \
        .option("password", "{password}") \
        .load()
        
    n_rows = df.count()

    spark.stop()
"""

Creating an livyc Object

lvy = livyc.LivyC(data_livy)

Creating a new session to Apache Livy Server

session = lvy.create_session()

Send and execute script in the Apache Livy server

lvy.run_script(session, pyspark_script.format(**params))

Accesing to the variable "n_rows" available in the session

lvy.read_variable(session, "n_rows")

Contributing and Feedback

Any ideas or feedback about this repository?. Help me to improve it.

Authors

License

This project is licensed under the terms of the MIT License.