Setup Greenplum and Spark

This page describes how to setup Greenpum and Spark dockers

Pre-requisites:

Using docker-compose

To create a standalone Greenplum cluster with the following command in the root directory. It builds a docker image with Pivotal Greenplum binaries and download some existing images such as Spark master and worker. Initially, it may take some time to download the docker image.

$ ./runDocker.sh -t usecase1 -c up

Creating network “usecase1_default” with the default driver Creating sparkmaster … done Creating gpdbsne … done Creating sparkworker … done … …

The SparkUI will be running at http://localhost:8081 with one worker listed.

Setup Greenplum with sample tables

Click on the section “Create database and table”

Connect to Spark master instance

  1. Connect to the Spark master docker image
$ docker exec -it sparkmaster /bin/bash

Connect to Greenplum instance

  1. Connect to the GPDB docker image
$ docker exec -it gpdbsne bin/bash
  root@master:/usr/spark-2.1.0#