Overview

Pivotal Greenplum

The Pivotal Greenplum Database is an advanced, fully featured, open source data warehouse. It provides powerful and rapid analytics on petabyte scale data volumes. Uniquely geared toward big data analytics, Greenplum Database is powered by the world’s most advanced cost-based query optimizer delivering high analytical query performance on large data volumes.

Pivotal Greenplum-Spark Connector

The Pivotal Greenplum Spark Connector provides high speed, parallel data transfer between Greenplum Database and Apache Spark clusters to support:

  • Interactive data analysis
  • In-memory analytics processing
  • Batch ETL

Apache Spark

Apache Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing.

References: - [Introduction](https://gitpitch.com/kongyew/greenplum-spark-connector) - [Greenplum-Spark connector docs](http://greenplum-spark.docs.pivotal.io/latest/index.html)