top of page

Cassandra

  • Writer: Rohan Roy
    Rohan Roy
  • Sep 3
  • 2 min read

Apache Cassandra is a distributed, NoSQL database engineered for large-scale, real-time applications with high availability, scalability, and robust write performance.


Key Highlights


1. Core Characteristics

  • Architecture: Operates on a peer-to-peer model—no master node—ensuring there's no single point of failure.

  • Data Partitioning & Replication: Uses consistent hashing to distribute data, along with tunable consistency levels for flexible read/write settings.

  • Data Storage & Processing:

    • Write Path: Data is first written to a commit log, then to an in-memory memtable, and finally flushed to immutable SSTables on disk.

    • Read Path: Uses both the memtable and SSTables, optimized via bloom filters.


2. Use Cases

Best suited for scenarios involving:

  • IoT and sensor data

  • Time-series data

  • Web activity tracking

  • Real-time analytics


3. Pros & Cons

  • Pros:

    • High scalability (easily adds new nodes)

    • Exceptional write performance

    • Strong availability and fault tolerance

  • Cons:

    • Complex data modeling required for efficiency

    • Eventual consistency may not suit all applications

    • Operational overhead in setup and maintenance


  • Origins & Design Philosophy

    Cassandra was born at Facebook to power Inbox Search (by Avinash Lakshman & Prashant Malik), drawing inspiration from Amazon’s Dynamo and Google’s Bigtable.

  • AP (Availability & Partition Tolerance) Focus

    Classified as an AP system, Cassandra prioritizes availability and partition tolerance while providing tunable consistency options—even though it may lean toward eventual consistency over strict consistency.

  • Scalability & Fault Tolerance Mechanics

    Features like gossip protocol (for node communication), seed nodes, hinted handoff, and the Φ Accrual Failure Detector contribute to its resilient, fault-tolerant architecture.


Summary Table

Category

Summary

What

Distributed NoSQL database with peer-to-peer architecture

Strengths

Highly scalable, fault-tolerant, excels at heavy write operations

Core Design

Commit log → Memtable → SSTable flow, bloom filters for optimized reads

Use Cases

IoT, time-series, logs, web tracking, real-time analytics

Trade-offs

Eventual consistency, data modeling complexity, maintenance overhead


Recent Posts

See All
Liquibase

What is Liquibase? Liquibase  is an open-source "version control for your database." Just as Git tracks changes to your application code, Liquibase tracks changes to your database schema (tables, colu

 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating

©2025 Rohan Roy

bottom of page