Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Table of Contents

Description

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

Versionmodulename
3.2.0spark/3.2.0

...

PackageVersion
openjdk11.0.2
Hadoophadoop3.3.1

Some users might want to consider also loading Apache Hive, R and one of the many Python modules, conda/spark that provides both R and Python is recommended and loaded by default in the Jupyter Notebook Open OnDemand Application.

...

Code Block
languagebash
SPARK_MASTER=$(grep "Starting Spark master" ${SPARK_LOG_DIR/master.err} | cut -d " " -f 9)


Connect to the master using the Spark interactive shell inĀ 

Scala
Code Block
languagebash
spark-shell --master ${SPARK_MASTER}

...