Setup Greenplum and Spark¶
This page describes how to setup Greenpum and Spark dockers
Pre-requisites:¶
- docker compose
- Greenplum Spark connector
- Postgres JDBC driver - if you want to write data from Spark into Greenplum.
Using docker-compose¶
To create a standalone Greenplum cluster with the following command in the root directory. It builds a docker image with Pivotal Greenplum binaries and download some existing images such as Spark master and worker. Initially, it may take some time to download the docker image.
$ ./runDocker.sh -t usecase1 -c up
Creating network “usecase1_default” with the default driver Creating sparkmaster … done Creating gpdbsne … done Creating sparkworker … done … …
The SparkUI will be running at http://localhost:8081 with one worker listed.
Setup Greenplum with sample tables¶
Click on the section “Create database and table”
Connect to Spark master instance¶
- Connect to the Spark master docker image
$ docker exec -it sparkmaster /bin/bash
Connect to Greenplum instance¶
- Connect to the GPDB docker image
$ docker exec -it gpdbsne bin/bash
root@master:/usr/spark-2.1.0#