Introduction to Apache Cassandra

Cassandra is an open source distributed database management system that is open source and has a broad column store, NoSQL database to handle massive amounts of data on numerous commodity servers, which offers high availability and the absence of a single source of failure. This system was written using Java and is developed by Apache Software Foundation.

Avinash Lakshman & Prashant Malik first developed Cassandra in Facebook to help power Facebook’s inbox feature. Facebook Inbox Search feature. Facebook made available Cassandra as an open source project based on Google software in the month of July. In March 2009, it became an Apache Incubator Project and in February 2010 , it became an official project. Due to its impressive technical characteristics, the Cassandra client has become so well-known.

Apache Cassandra is used to manage huge amounts of structure information spread all over the world. It offers a highly reliable service that has one point of failure. Below are a few points to consider about Apache Cassandra:

It’s scalable it is fault-tolerant, reliable, and constant.
It is a column-oriented database.
The distributed design of the system is inspired by the Amazon’s Dynamo as well as its model for data is based on the Google Big table.
It was created by Facebook and is quite different from traditional database management systems.

Cassandra utilizes a Dynamo-style replicate model that does not have a single point of failure , but it adds a more robust “column family” data model. Cassandra is utilized in a variety of most renowned corporations like Facebook, Twitter, Cisco, Rackspace, eBay, Netflix and many more.

The goal of design of Cassandra is to handle large data-intensive workloads that span many nodes with no single source of failure. Cassandra is a peer-to-peer distributed system throughout its nodes. Data is distributed across all nodes in the cluster.

All nodes of Cassandra in a cluster serve the same function. Each node is distinct, but however, it is also connected by other nodes. Every node in the cluster is able to accept requests for read or write no matter where data actually situated within the cluster. If a node is down it can allow read or write requests to be fulfilled by other nodes of the network.

The characteristics of the Cassandra client:

Cassandra is now a cult favorite due to its advanced characteristics. Here are a few characteristics of Cassandra:

Easy data distribution –
It lets you transfer data to wherever you require by distributing data across several data centers.
Examples:
If there are 5 nodes such as N1, N2 and N3 4, N5, and using a partitioning algorithm, we can determine the range of tokens and distribute data in accordance with that. Each node will have a distinct token range within which data will be distributed.

Flexible data storage –
Cassandra can handle all data formats such as semi-structured, structured, as well as unstructured. It is able to dynamically adapt to your data structures according to your requirements.

Scalability elastic —
Cassandra is extremely flexible and can be expanded to include additional hardware to support many more customers and to store more data according to the requirements.

Fast write-ups —
Cassandra was developed to run on low-cost hardware that is commonly used. Cassandra can write at lightning speed and is able to store hundreds of Terabytes of datawithout sacrificing efficiency of reading.

Always on Architecture
Cassandra does not have a single point of failure and is available to business critical applications that aren’t able to afford to fail.

Fast linear-scale performance –
Cassandra is scalable linearly, so it improves your performance when you increase the amount of servers within the cluster. It maintains a quick response time.

Support for transactions Transaction support
Cassandra has properties such as Atomicity Isolation, Consistency as well as Durability (ACID) characteristics of transactions.