Cassandra

Rohan Roy
Sep 3, 2025
2 min read

Apache Cassandra is a distributed, NoSQL database engineered for large-scale, real-time applications with high availability, scalability, and robust write performance.

Key Highlights

1. Core Characteristics

Architecture: Operates on a peer-to-peer model—no master node—ensuring there's no single point of failure.
Data Partitioning & Replication: Uses consistent hashing to distribute data, along with tunable consistency levels for flexible read/write settings.
Data Storage & Processing:
- Write Path: Data is first written to a commit log, then to an in-memory memtable, and finally flushed to immutable SSTables on disk.
- Read Path: Uses both the memtable and SSTables, optimized via bloom filters.

2. Use Cases

Best suited for scenarios involving:

IoT and sensor data
Time-series data
Web activity tracking
Real-time analytics

3. Pros & Cons

Pros:
- High scalability (easily adds new nodes)
- Exceptional write performance
- Strong availability and fault tolerance
Cons:
- Complex data modeling required for efficiency
- Eventual consistency may not suit all applications
- Operational overhead in setup and maintenance

Origins & Design Philosophy
Cassandra was born at Facebook to power Inbox Search (by Avinash Lakshman & Prashant Malik), drawing inspiration from Amazon’s Dynamo and Google’s Bigtable.
AP (Availability & Partition Tolerance) Focus
Classified as an AP system, Cassandra prioritizes availability and partition tolerance while providing tunable consistency options—even though it may lean toward eventual consistency over strict consistency.
Scalability & Fault Tolerance Mechanics
Features like gossip protocol (for node communication), seed nodes, hinted handoff, and the Φ Accrual Failure Detector contribute to its resilient, fault-tolerant architecture.

Summary Table

Category	Summary
What	Distributed NoSQL database with peer-to-peer architecture
Strengths	Highly scalable, fault-tolerant, excels at heavy write operations
Core Design	Commit log → Memtable → SSTable flow, bloom filters for optimized reads
Use Cases	IoT, time-series, logs, web tracking, real-time analytics
Trade-offs	Eventual consistency, data modeling complexity, maintenance overhead