Course objectives

The training participants will acquire general knowledge on NoSQL databases, their functionalities, applications and limitations. The training is focused on Apache Cassandra database.

In particular, the participants are to:

  • Get familiar with theoretical background of distributed database system and its implementation in Cassandra database
  • Find out what data is stored and how physical access to data is provided
  • Get familiar with the mechanisms ensuring high availability and performance and the associated compromises
  • Get familiar with data modeling and building scalable Cassandra based applications

Course parameters

3*8 hours of lectures and workshops, with a emphasis on workshops.
During the workshops, apart from simple exercises, the participants will implement a full Cassandra base application.

  1. Introduction to NoSQL databases

    • Introduction to NoSQL databases
    • CAP theorem
    • The basic parameters of NoSQL databases
    • NoSQL vs. RDBMS
    • Cassandra applications and business cases
  2. Data modeling

    • Clusters, databases, tables, rows, columns
    • Native data model
    • CQL data model
    • Partitioning and clustering keys
    • Mapping CQL to native data model
    • Data types
    • CQL commands
    • Building data models with CQL commands: distributed transactions, graph databases, event sourcing
    • Best practices
  3. Architecture and internals

    • The basic elements: node, data center, cluster, commit log, sstable
    • Data distribution and replication
    • Partitioning
    • Data integrity
    • Lightweight transactions
    • Read, write, delete
  4. Administration

    • Toolbox: nodetool, cqlsh, ccm, OpsCenter
    • Add/remove/replace a node
    • Balancing a cluster
    • Configuration
    • Configuration replication
    • Maintenance
  5. Optimization

    • Data modeling and performance
    • Delay analysis
    • Optimizing I/O
    • JVM and memory
    • Compacting


