Apache Spark Programming (Spark 105): 3-day Instructor-Led Public Class (Warsaw)

Overview

A three-day, on-site, instructor-led course

Location: Warsaw, Poland (the full address to be announced)

This three-day course is for data engineers, analysts, architects; software engineers; IT operations; and technical managers interested in a thorough, hands-on overview of Apache Spark. The course covers the core APIs for using Spark, fundamental mechanisms and basic internals of the framework, SQL and other high-level data access tools, as well as Spark’s streaming capabilities and machine learning APIs. Each topic includes slide and lecture content along with hands-on use of Spark through an elegant web-based notebook environment. Inspired by tools like IPython/Jupyter, notebooks allow attendees to code jobs, data analysis queries, and visualizations using their own Spark cluster, accessed through a web browser. All class code is directly usable with pure open-source Spark or any commercial Spark distribution.

After taking this class a participant will be able to:

  • Describe Spark’s fundamental mechanics.
  • Use the core Spark APIs to operate on data.
  • Articulate and implement typical use cases for Spark.
  • Build data pipelines with SparkSQL and DataFrames.
  • Analyze Spark jobs using the UIs and logs.
  • Create Streaming and Machine Learning jobs.

Modules

  • Spark ​Overview
  • RDD ​Fundamentals
  • SparkSQL ​and ​DataFrames
  • Spark ​Job ​Execution
  • Cluster ​Architectures ​for ​Spark
  • Intro ​to ​Spark ​Streaming
  • Machine ​Learning ​Basics

Cost: ​$2500 ​per ​person

Requirements:

All ​participants ​will ​need ​a ​laptop ​with ​updated ​versions ​of ​Chrome ​or ​Firefox ​(Internet ​Explorer ​and ​Safari ​are ​not ​supported)

About Databricks:

Databricks’ mission is to accelerate innovation for its customers by unifying Data Science, Engineering and Business. Databricks provides a Unified Analytics Platform powered by Apache Spark for data science teams to collaborate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production. The company also makes it easier for its users to focus on their data by providing a fully managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost of ownership. Databricks, venture-backed by Andreessen Horowitz, NEA and Battery Ventures, among others, has a global customer base that includes Salesforce, Viacom, Shell and HP.

Overview:
A three-day, on-site, instructor-led course

Location: Warsaw, Poland (the full address to be announced)

This three-day course is for data engineers, analysts, architects; software engineers; IT operations; and technical managers interested in a thorough, hands-on overview of Apache Spark. The course covers the core APIs for using Spark, fundamental mechanisms and basic internals of the framework, SQL and other high-level data access tools, as well as Spark’s streaming capabilities and machine learning APIs. Each topic includes slide and lecture content along with hands-on use of Spark through an elegant web-based notebook environment. Inspired by tools like IPython/Jupyter, notebooks allow attendees to code jobs, data analysis queries, and visualizations using their own Spark cluster, accessed through a web browser. All class code is directly usable with pure open-source Spark or any commercial Spark distribution.

After taking this class a participant will be able to:
Describe Spark’s fundamental mechanics.
Use the core Spark APIs to operate on data.
Articulate and implement typical use cases for Spark.
Build data pipelines with SparkSQL and DataFrames.
Analyze Spark jobs using the UIs and logs.
Create Streaming and Machine Learning jobs.
Modules:
Spark ​Overview
RDD ​Fundamentals
SparkSQL ​and ​DataFrames
Spark ​Job ​Execution
Cluster ​Architectures ​for ​Spark
Intro ​to ​Spark ​Streaming
Machine ​Learning ​Basics
Cost: ​$2500 ​per ​person
Requirements:
All ​participants ​will ​need ​a ​laptop ​with ​updated ​versions ​of ​Chrome ​or ​Firefox ​(Internet ​Explorer ​and ​Safari ​are ​not ​supported)

About Databricks:
Databricks’ mission is to accelerate innovation for its customers by unifying Data Science, Engineering and Business. Databricks provides a Unified Analytics Platform powered by Apache Spark for data science teams to collaborate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production. The company also makes it easier for its users to focus on their data by providing a fully managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost of ownership. Databricks, venture-backed by Andreessen Horowitz, NEA and Battery Ventures, among others, has a global customer base that includes Salesforce, Viacom, Shell and HP.

For more information, visit www.databricks.com.

Apache, Apache Spark and Spark are trademarks of the Apache Software Foundation.

Meet the trainers

Apply for training