Cassandra In Details
Nov 16, 2015 Janaki Mahapatra, Cassandra
What is Cassandra? [caption id="attachment_843" align="alignnone" width="560"]

- Massively linearly scalable NoSQL database
- Fully distributed, with no single point of failure
- Free and open source, with deep developer support
- Highly performant, with near-linear horizontal scaling in proper use cases
- No single point of failure, due to horizontal scaling
- horizontal scaling: add commodity hardware to a clus ter
- vertical scaling: add RAM and CPUs to a specialize d high performance box
- Always on architecture: Cassandra's masterless “ring” architecture provides your application’s end with always-on access to their data, even in the event of rack, machine, or entire data center failure.
- Native Multi-Data Center Replication: Cross data center (in multiple geographies) and multi-cloud availability zone support for writes/reads.
- Fast linear-Scale Performance: Enables millisecond response times with linear scalability(double your throughput with two nodes, quadruple it with four, and so on)� to deliver response times speeds your customers have come to expect.
- Flexible Data Model The Apache Cassandra data model allows for new entities or attributes to be added over time and you’re not restricted to a rigid data model that can’ volve with the needs of the business application — such as the addition of a ew complicated data structure that may be unique to your environment, or adding a new column to a column family.
- Transparent Fault Detection and Recovery: Nodes that fail can easily be restored or replaced.
- Tunable Data Consistency: Support for strong ev entual data consistency across a widely distributed cluster.
- OpsCenter Monitoring/Management Tool: A graphical management and monitoring tool for Cassandra that provides a view of the system from a centralized dashboard.
- Runs on Commodity Hardware: Apache Cassandra i built-to-run on commodity hardware and is unparalleled in value. Don't waste another dime on disaster recovery, high-end hardware, or revenue loss due to downtime. Focus your resources on building a great application, not on maintaining an expensive backend.
- Mitigate Risks of Downtime : Apache Cassandra’s architecture is built with no single point of failure. If a node (rack, machine, or entire data center) goes down, another is available to take its place and serve read/write requests without interruption.
- Improved Customer Experience: Apache Cassandra’s high availability and superior performance gives businesses, and the mission-critical applications, the ability to provide customers with a superior user experience.
- Faster Time to Market: DataStax goes beyond standard open-source deployments by providing resources that make it easier to deliver Apache Cassandra in a single data center, or across multiple data centers, and clouds.
- Every node is identical
- Peer to peer protocol and uses Gossip Protocol to maintain and keep the list of nodes in sync
- No special host to coordinate activities
- No single point failure
- Easier to operate and maintain because all nodes are same.
- It was designed specifically from the ground up to take full advantage of multi processor/multi core machines and to runs across dozens of these machines housed in multiple data centers
- It scales consistently and seamlessly to hundreds of terabytes
- Shows exceptional performance under heavy load
- Consistently shows very fast throughput for writes per second on a basic commodity workstation
- Big data (billions of records)
- Very high velocity random reads and writes
- Flexible Sparse / wide column requirements
- No multiple secondary Index needs
- Low latency
- ecommerce inventory Cache Use Cases
- Time series / Events Use Cases
- Feed based Activities / Use Cases
- Traditional RDBMS excels when you need like ACI -compliant transactions, with rollback (eg: bank transfer)
- Secondary Indexes
- Relational Data
- Transactional (Rollback, Commit)
- Primary and Financial Records
- Stringent Security and Authorization Needs on Data
- Dynamic Queries on Columns
- Searching Column Data
- Low Latency
- Playlists and collections eg: Spotify
- Personalization and recommendation engines like eBay
- Messaging eg: instagram
- Fraud Detection eg: Barracuda
- Sensor Data eg: Zonar
- Netflix
- Intuit
- Ebay
- Expedia
- Nassa
- cisco
- sudo yum install -y python-pip;
- sudo pip install cql PyYAML;
- sudo yum install ant -y;
- sudo yum install git -y;
- cd /home
- sudo git clone https://github.com/pcmanus/ccm.git
- cd ccm
- sudo ./setup.py install;
- cd ..
- sudo ifconfig lo:2 127.0.0.2
- sudo ifconfig lo:3 127.0.0.3
- sudo ifconfig lo:4 127.0.0.4
- cd /etc/sysconfig/network-scripts/
- cp ifcfg-lo ifcfg-lo:0
- cp ifcfg-lo ifcfg-lo:1
- cp ifcfg-lo ifcfg-lo:2
- cp ifcfg-lo ifcfg-lo:3
- ccm status
- ccm node1 start
- ccm node1 stop
- ccm node1 nodetool status
- ccm node2 nodetool status
- ccm node1 cqlsh
- CREATE KEYSPACE MySampleDB WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };
- Use MySampleDB
- Now create some table and play with that