In last 2 posts of this series we learnt about Cassandra architecture and understood the cassandra Read/Write process.

If you have missed earlier posts of this series then you can read them from below links:

1:Introduction to cassandra

2: Understanding Cassandra Read/Write Mechanism

In this post we will learn about installing cassandra on Redhat Linux.

Before jumping into lab and start with installation, I want to touch down on few concepts first which will help in understanding installation process.

Bootstrapping

Bootstrapping is the process in which a newly-joining node gets the required data from the neighbors in the ring, so it can join the ring with the required data. Typically, a bootstrapping node joins the ring without any state or token and understands the ring structure after starting the gossip with the seed nodes; the second step is to choose a token to bootstrap.

During the bootstrap, the bootstrapping node will receive writes for the range that it will be responsible for after the bootstrap is completed. This additional write is done to ensure that the node doesn’t miss any new data during the bootstrap from the point when we requested the streaming to the point at which the node comes online.

Seed Nodes

The seed node designation has no purpose other than bootstrapping the gossip process for new nodes joining the cluster. Seed nodes are not a single point of failure, nor do they have any other special purpose in cluster operations beyond the bootstrapping of nodes.

In cluster formation, nodes see each other and “join”. They do not join just any node which respects the protocol, however. This would be risky: old partitioned replicas, different clusters, even malicious nodes, so on. So a cluster is defined by some initial nodes which are available at clear addresses and they become a reference for that cluster for any new nodes to join in trustable way. The seed nodes can go away after some time, the cluster will keep on.

Installation Prerequisites

In my lab I am setting up a 4 node cassandra cluster on RHEL 6 and performed following steps on each node.

1: Set a static IP on all 4 nodes and make sure all 4 nodes are reachable to each other via hostname/IP.

2: Set hostname in /etc/sysconfig/network file.

3: Update /etc/hosts file

Enter information of your cassandra nodes in /etc/hosts file.

192.168.109.70                 cassdb01             #SEED
192.168.109.71                 cassdb02            #Worker
192.168.109.72                 cassdb03             #Worker
192.168.109.73                cassdb04             #Worker

4: Open Firewall Ports

If you are using iptables, then the ports you need to open for Cassandra are 7000 and 9160. For each port you need to open, you can use the iptables command similar to this:

# iptables -A INPUT -p tcp –dport 7000 -j ACCEPT

5: Create cassandra user with sudo permissions.

You can use below script which will create a user on server with sudo permissions.

# wget https://raw.githubusercontent.com/zubayr/create_user_script/master/create_user_script.sh

# chmod 777 create_user_script.sh

# sh create_user_script.sh -s cassandra

After running the above script, make sure cassandra user/group is created on server

[root@cassdb01 ~]# cat /etc/passwd | grep cassandra
cassandra:x:501:501::/home/cassandra:/bin/bash

[root@cassdb01 ~]# cat /etc/group | grep cassandra
cassandra:x:501:

6: Install Java on server

Make sure you install oracle java (jdk or jre) version 7 or greater and JAVA_HOME set. You can install java via rpm based installer or using tar file.

In my lab, I installed java using rpm jdk-8u111-linux-x64.rpm.

Note: If you have openjdk installed on your system then please remove it before installing oracle java.

Note: Cassandra 3.0 and later require Java 8u40 or later

Verify that JAVA_HOME is set correctly and you are getting an output for java -version command

[root@cassdb01 ~]# cat .bash_profile | grep JAVA_HOME
JAVA_HOME=/usr/java/jdk1.8.0_111
PATH=$PATH:$HOME/bin/:$JAVA_HOME/bin:$CASSANDRA_HOME/bin
export PATH JAVA_HOME CASSANDRA_HOME

[root@cassdb01 ~]# java -version
java version “1.8.0_111”
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)

7: Install/Configure cassandra

Latest cassandra version can be downloaded from cassandra Home Page

Download and extract apache-cassandra tar.gz file in a directory of your choice. I used /opt as destination directory.

[root@cassdb01 opt]# tar -zxvf apache-cassandra-3.9-bin.tar.gz

[root@cassdb01 opt]# ln -s /opt/apache-cassandra-3.9 /opt/apache-cassandra

[root@cassdb01 opt]# chown cassandra:cassandra -R /opt/apache-cassandra

[root@cassdb01 opt]# chown cassandra:cassandra -R /opt/apache-cassandra-3.9

Next is to create necessary directories (for cassandra to store data)  and assign permissions on those directories.

[root@cassdb01 ~]# mkdir /var/lib/cassandra/data

[root@cassdb01 ~]# mkdir /var/log/cassandra

[root@cassdb01 ~]# mkdir /var/lib/cassandra/commitlog

[root@cassdb01 ~]# chown -R cassandra:cassandra /var/lib/cassandra/data

[root@cassdb01 ~]# chown -R cassandra:cassandra /var/log/cassandra/

[root@cassdb01 ~]# chown -R cassandra:cassandra /var/lib/cassandra/commitlog

Now we can start the cassandra service by using below command

[root@cassdb01 lib]# $CASSANDRA_HOME/bin/cassandra -f -R

On server startup, you will see below messages on command prompt which suggests that cassandra have been started without issues.

[code]

INFO 11:31:15 Starting listening for CQL clients on localhost/127.0.0.1:9042 (unencrypted)…
INFO 11:31:15 Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start it
INFO 11:31:24 Scheduling approximate time-check task with a precision of 10 milliseconds
INFO 11:31:25 Created default superuser role ‘cassandra’

[/code]

If you want to start cassandra as a service, you can use this script from github.

Change value of following variable as per your environment

[code]

CASS_HOME=/opt/apache-cassandra
CASS_BIN=$CASS_HOME/bin/cassandra
CASS_LOG=/var/log/cassandra/system.log
CASS_USER=”root”
CASS_PID=/var/run/cassandra.pid

[/code]

save the file in /etc/init.d directory. I saved my file with name cassandra.

Execute below command to add cassandra as service

[root@cassdb01 init.d]# chmod +x /etc/init.d/cassandra

[root@cassdb01 init.d]# chkconfig –add cassandra

[root@cassdb01 init.d]# chkconfig cassandra on

Start cassandra service and verify it started properly by checking the system.log file

[root@cassdb01 init.d]# service cassandra status
Cassandra is running.

last line of my system.log reads as

INFO  12:45:50 Node localhost/127.0.0.1 state jump to NORMAL

Nodetool reporting node as UP and Normal

[root@cassdb01 init.d]# $CASSANDRA_HOME/bin/nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
— Address Load Tokens Owns (effective) Host ID Rack
UN 127.0.0.1 168.62 KiB 256 100.0% 14ba62c6-59e4-404b-a6a6-30c9503ef3a4 rack1

With this installation of cassandra is completed.

In next post of this series we will make a few configuration changes and will configure other nodes in order to facilitate them to join the cassandra cluster effectively. Stay Tuned!!!

I hope this post is informational to you. Feel free to share this on social media if it is worth sharing. Be sociable 🙂

Posted in: Bigdata.
Last Modified: August 11, 2017

4 thoughts on “Learning Apache Cassandra-Part-3-Installing Cassandra on RHEL6

  1. Nisar Ahmad

    Hi Alex! Nice series of posts. Apache Cassandra is totally new concept for me…
    Thanks for knowledge sharing…

  2. Pingback: Learning Apache Cassandra-Part-4-Adding Node To Cassandra Cluster | Virtual Reality

  3. Pingback: Learning Apache Cassandra-Part-4-Adding Node To Cassandra Cluster | Virtual Reality

  4. Pingback: Learning Apache Cassandra-Part-5-Getting Familiar With Nodetool – Virtual Reality

Leave a reply