Apache Spark Programming (Spark 105): 3-day Instructor-Led Public Class (Warsaw)


A three-day, on-site, instructor-led course

Location: Warsaw, Poland (the full address to be announced)

This three-day course is for data engineers, analysts, architects; software engineers; IT operations; and technical managers interested in a thorough, hands-on overview of Apache Spark. The course covers the core APIs for using Spark, fundamental mechanisms and basic internals of the framework, SQL and other high-level data access tools, as well as Spark’s streaming capabilities and machine learning APIs. Each topic includes slide and lecture content along with hands-on use of Spark through an elegant web-based notebook environment. Inspired by tools like IPython/Jupyter, notebooks allow attendees to code jobs, data analysis queries, and visualizations using their own Spark cluster, accessed through a web browser. All class code is directly usable with pure open-source Spark or any commercial Spark distribution.

After taking this class a participant will be able to:

  • Describe Spark’s fundamental mechanics.
  • Use the core Spark APIs to operate on data.
  • Articulate and implement typical use cases for Spark.
  • Build data pipelines with SparkSQL and DataFrames.
  • Analyze Spark jobs using the UIs and logs.
  • Create Streaming and Machine Learning jobs.


  • Spark ​Overview
  • RDD ​Fundamentals
  • SparkSQL ​and ​DataFrames
  • Spark ​Job ​Execution
  • Cluster ​Architectures ​for ​Spark
  • Intro ​to ​Spark ​Streaming
  • Machine ​Learning ​Basics
Cost: ​$2500 ​per ​person


All ​participants ​will ​need ​a ​laptop ​with ​updated ​versions ​of ​Chrome ​or ​Firefox ​(Internet ​Explorer ​and ​Safari ​are ​not ​supported)

About Databricks:

Databricks’ mission is to accelerate innovation for its customers by unifying Data Science, Engineering and Business. Databricks provides a Unified Analytics Platform powered by Apache Spark for data science teams to collaborate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production. The company also makes it easier for its users to focus on their data by providing a fully managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost of ownership. Databricks, venture-backed by Andreessen Horowitz, NEA and Battery Ventures, among others, has a global customer base that includes Salesforce, Viacom, Shell and HP.

For more information, visit www.databricks.com.

Apache, Apache Spark and Spark are trademarks of the Apache Software Foundation.

Meet the trainers


A software architect with big data processing and machine learning background. Experienced in designing, developing and deploying various solutions: from stream machine learning to isolated software sandbox. Amadeusz conducts training sessions in Apache Cassandra and Apache Spark libraries. He holds a BSc in Computer Science, as well as an Apache Spark Developer certificate.


Experienced in developing web applications using Scala for the back-end and AngularJS with TypeScript for the front-end. He is an enthusiast of clean and well-tested code. Marcin is an Amazon Web Services Associate-level Certified Solutions Architect and is on his way to obtain an Engineer’s degree in Computer Science at the Warsaw University of Technology. His thesis deals with sequential pattern mining using Spark.

Apache Spark Programming

3 day Instructor Led Public Class

Apply for training