How to Install Apache Cassandra on Ubuntu 20.04

install apache cassandra on ubuntu 20.04

Apache Cassandra is a NoSQL database used for storing large amounts of data. It has a distributed architecture and is designed to manage large volumes of data with dynamic replication. It is used by thousands of companies to save and retrieve thousands of terabytes of data. Apache Cassandra is the best choice for you if you are looking for a database management system with scalability and high availability.

In this tutorial, we will go through the installation of Apache Cassandra on Ubuntu 20.04.

Prerequisites

  • An Ubuntu 20.04 VPS (we’ll be using our SSD 2 VPS plan)
  • Access to the root user account (or access to an admin account with root privileges)

Step 1: Log in to the Server & Update the Server OS Packages

First, log in to your Ubuntu 20.04 server via SSH as the root user:

ssh root@IP_Address -p Port_number

You will need to replace ‘IP_Address’ and ‘Port_number’ with your server’s respective IP address and SSH port number. Additionally, replace ‘root’ with the username of the admin account if necessary.

Before starting, you have to make sure that all Ubuntu OS packages installed on the server are up to date. You can do this by running the following commands:

apt-get update -y
apt-get upgrade -y

Step 2: Install Java

Apache Cassandra requires Java version 8 to be installed in your system. You can install it using the following command:

apt-get install openjdk-8-jdk -y

Once the installation is completed, verify the installed version of Java with the following command:

java -version

You should get the following output:

openjdk version "1.8.0_275"
OpenJDK Runtime Environment (build 1.8.0_275-8u275-b01-0ubuntu1~20.04-b01)
OpenJDK 64-Bit Server VM (build 25.275-b01, mixed mode)

Step 3: Install Apache Cassandra

By default, the Apache Cassandra is not available in the Ubuntu default repository. So you will need to add the Cassandra repository to your system.

First, install the required dependencies with the following command:

apt-get install apt-transport-https gnupg2 -y

Next, download and add the GPG key with the following command:

wget -q -O - https://www.apache.org/dist/cassandra/KEYS | apt-key add -

Next, add the Cassandra repository to the APT with the following command:

sh -c 'echo "deb http://www.apache.org/dist/cassandra/debian 311x main" > /etc/apt/sources.list.d/cassandra.list'

Next, update the repository cache and install the Apache Cassandra with the following command:

apt-get update -y
apt-get install cassandra -y

Once the Cassandra has been installed, verify the status of the Cassandra with the following command:

systemctl status cassandra

You should get the following output:

● cassandra.service - LSB: distributed storage system for structured data
     Loaded: loaded (/etc/init.d/cassandra; generated)
     Active: active (running) since Mon 2020-12-21 05:15:39 UTC; 4s ago
       Docs: man:systemd-sysv-generator(8)
      Tasks: 29 (limit: 2353)
     Memory: 1.1G
     CGroup: /system.slice/cassandra.service
             └─12029 java -Xloggc:/var/log/cassandra/gc.log -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOutOfMemor>

Dec 21 05:15:39 ubuntu2004 systemd[1]: Starting LSB: distributed storage system for structured data...
Dec 21 05:15:39 ubuntu2004 systemd[1]: Started LSB: distributed storage system for structured data.

You can also verify the Cassandra using the nodetool command-line utility:

nodetool status

You should get the following output:

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  70.71 KiB  256          100.0%            ba73f88d-9d00-49b7-aa50-baedb4ee0558  rack1

Step 4: Configure Apache Cassandra

By default, Apache Cassandra is configured to listen on localhost. You don’t need to configure Cassandra if your client and database is on the same host.

Cassandra also provides a cqlsh command-line tool to interact with Cassandra. You can launch it with the following command:

cqlsh

You should get the following output:

Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.9 | CQL spec 3.4.4 | Native protocol v4]
 Use HELP for help.
 cqlsh> 

 

Step 5: Change Cassandra Cluster Name

By default, the Cassandra cluster name is set to “Test Cluster”. You can change it using the the cqlsh utility:

cqlsh

Once connected, run the following command to change the cluster name:

cqlsh> UPDATE system.local SET cluster_name = 'MY Cluster' WHERE KEY = 'local';

Next, exit from the cqlsh shell with the following command:

cqlsh> exit

Next, you will also need to define your new cluster name in cassandra.yaml file:

nano /etc/cassandra/cassandra.yaml

Change the following line:

cluster_name: 'MY Cluster'

Save and close the file then clear the system cache with the following command:

nodetool flush system

Next, restart the Cassandra service to apply the changes:

systemctl restart cassandra

Next, verify your new cluster name with the following command:

cqlsh

You should see your new cluster name in the following output:

Connected to MY Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.9 | CQL spec 3.4.4 | Native protocol v4]
 Use HELP for help.
 cqlsh> 

Use HELP for help. cqlsh>

Congratulations! you have successfully installed Apache Cassandra on Ubuntu 20.04 VPS.

installing apache cassandra on ubuntu 20.04

Of course, you don’t have to do any of this if you use one of our Linux VPS Hosting services, in which case you can simply ask our expert Linux admins to setup this for you. They are available 24×7 and will take care of your request immediately.

PS. If you liked this post please share it with your friends on the social networks using the buttons on the left or simply leave a reply below. Thanks.

2 thoughts on “How to Install Apache Cassandra on Ubuntu 20.04”

  1. yes you are right…To start with Apache Cassandra vs Datastax Cassandra, let’s have a look at them first! Apache Cassandra can be defined as a distributed DBMS(Database Management System) that is designed to handle vast volumes of data across many data centers and the cloud

    Reply

Leave a Comment