NOSQL Cassandra Introduction

I-Hsien Huang
Sep 4, 2018 · 2 min read

This article will introduce what is NOSQL and its’ pros and cons. And a deeper analyze the NOSQL Cassandra.

Topic I : NOSQL:

What is NOSQL?
Some people might think that it is not SQL, However, it is actually “Not Only SQL”. There are six important attributes for NOSQL.
1. Schema-less
2. Shared nothing architecture
3. Elasticity: Easy to expand the capacity, loading and no downtime.
4. Sharding
5. Asynchronous replication
6. BASE instead of ACID:

ACID:

ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties of database transactions intended to guarantee validity even in the event of errors, power failures, etc.

CAP Theorem:

we all know Distributed System is impossible to fulfill all these attributes. And whenever we call the system a “Distributed System”, it is at least have P attribute.

  • Consistency: The attribute is achieved by the transaction of traditional Relational database management system(RDBMS).
  • Availability: The service can be “always on”, and return the request in the limited time whenever the clients send the request.
  • Partition Tolerance: The cluster continues to function even if there is a “partition” (communication break) between two nodes (both nodes are up, but can’t communicate). To achieve this, normally we will give up the consistency.BASE:

Basically Available、Soft state、Eventually consistent”
It is kind of extension of CAP. The core idea is that even if we can not achieve strong consistency, at least we have to achieve eventual consistency.

  • Basically Available
  • Soft state: It is acceptable for Asynchronous for some time periods.

Eventually consistent: It might not consistent at some moments, but it will be consistent in the end.

NoSQL DB Defect:

  1. Not easy to transfer, for the reason that different NOSQL have their own API. Not like RDBMS, which all have the same API “SQL”
  2. No ACID: The essential part of RDBMS.
  3. No Join

Topic II : Cassandra

  • Key-Value Storage System: High performance.
  • Distributed network service(core point): Cassandra is not a DB, but a distributed network service, which consists a lot of data nodes. Utilize P2P cycle structure.
  • High Scalability : Add a new computer to a new cluster and we can expand the capacity. Besides, we do not need to re-boot the whole process.
  • Multiple Data center: To avoid the effect when one data center crush.
  • Range Search: For the reason that it is a key-value storage system, so the client can search a certain range of data by setting the range of key.
  • Focus on AP:
  • CQL(Cassandra Query Language): It is easier method to control the data when the user is familiar with SQL

Cassandra Application:

Method I : Combine Hadoop and Cassandra. Put the data inside Hadoop and Utilize Map/Reduce process to cope with data. And then insert data to Cassandra. Example, My project “Twitter Search Engineer”

I-Hsien Huang

Written by

I am currently master degree student at University of California, Riverside, major in Computer Science. And I am looking for a SDE full-time job.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade