Cassandra
- Rohan Roy

- Sep 3
- 2 min read
Apache Cassandra is a distributed, NoSQL database engineered for large-scale, real-time applications with high availability, scalability, and robust write performance.
Key Highlights
1. Core Characteristics
Architecture: Operates on a peer-to-peer model—no master node—ensuring there's no single point of failure.
Data Partitioning & Replication: Uses consistent hashing to distribute data, along with tunable consistency levels for flexible read/write settings.
Data Storage & Processing:
Write Path: Data is first written to a commit log, then to an in-memory memtable, and finally flushed to immutable SSTables on disk.
Read Path: Uses both the memtable and SSTables, optimized via bloom filters.
2. Use Cases
Best suited for scenarios involving:
IoT and sensor data
Time-series data
Web activity tracking
Real-time analytics
3. Pros & Cons
Pros:
High scalability (easily adds new nodes)
Exceptional write performance
Strong availability and fault tolerance
Cons:
Complex data modeling required for efficiency
Eventual consistency may not suit all applications
Operational overhead in setup and maintenance
Origins & Design Philosophy
Cassandra was born at Facebook to power Inbox Search (by Avinash Lakshman & Prashant Malik), drawing inspiration from Amazon’s Dynamo and Google’s Bigtable.
AP (Availability & Partition Tolerance) Focus
Classified as an AP system, Cassandra prioritizes availability and partition tolerance while providing tunable consistency options—even though it may lean toward eventual consistency over strict consistency.
Scalability & Fault Tolerance Mechanics
Features like gossip protocol (for node communication), seed nodes, hinted handoff, and the Φ Accrual Failure Detector contribute to its resilient, fault-tolerant architecture.
Summary Table
Category | Summary |
What | Distributed NoSQL database with peer-to-peer architecture |
Strengths | Highly scalable, fault-tolerant, excels at heavy write operations |
Core Design | Commit log → Memtable → SSTable flow, bloom filters for optimized reads |
Use Cases | IoT, time-series, logs, web tracking, real-time analytics |
Trade-offs | Eventual consistency, data modeling complexity, maintenance overhead |
Comments