0:00
Hello, and welcome to capacity planning and scaling,
add and remove Cassandra.
During this lesson, we'll start with a quick Cassandra refresher,
and then move on to the Cassandra ring expansion process,
how to add new Cassandra nodes and remove Cassandra notes.
In this refresher, we'll talk about
the Cassandra server registration, and node configuration.
With Apigee Edge, most components register the server references,
UUIDs inside the corresponding pods.
Cassandra is the exception to this rule.
All servers, along with references to specific keyspaces,
are actually registered to each pod.
Gateway, Central, and Analytics pods,
all contain references to Cassandra.
In the example on the left,
you can see a partial output of a management API call to describe the Gateway pod.
The JSON response contains the pod name,
and the region, and the server UUID.
But most importantly, it contains all keyspaces associated with that server and pod.
What has just been described,
is the logical server registration,
and is performed during the edge installation process.
Additionally, Cassandra uses local configuration files to describe its topology.
If nodes are added or removed,
these files must be updated.
This follows a cassandra-topology.yaml and cassandra.yaml.
Both files are located in opt/apigee/apigee-cassandra.comf.
During any update to topology,
changes to these files are managed by setup.sh.
Manual changes should never be required.
Now we've covered Cassandra configuration registration,
we can move onto the Cassandra ring expansion process.
One of the key reasons for choosing Cassandra as the underlying story technology of Edge,
was this embracement of horizontal scalability.
Normally, when expanding the Cassandra ring,
the ring is either doubled or increased in size by
an increment of the replication factor in use.
With edge, the replication factor is configured by default to three.
Expanding a regular three node Cassandra ring,
means that we're either going to double the number of nodes to six,
or we'll increase the number of nodes by 6,
9, 12 et cetera.
In the central diagram, we can see
a three node Cassandra ring being expanded to six nodes.
The original three nodes are labeled CS 1-3.
The new nodes are labeled CS A to C. New nodes are placed in between existing nodes,
with the goal being split ranges and
shared data ownership between the existing and newly added nodes.
During ring expansion, the following steps are
followed: Reconfigure the existing Cassandra nodes.
Install the Cassandra software on the new nodes using the Apigee setup utility.
Rebuild the new nodes using existing nodes.
Reconfigure the management server with the new topology.
And finally free up memory on the existing Cassandra nodes.
Let's look at the process of adding Cassandra nodes in a little more detail.
The first step, is to reconfigure the existing Cassandra nodes.
To do this, we update the response file that we use during
our original installation to include the new Cassandra nodes.
In the example, we can see that we're adding three additional nodes,
IP10 through to IP12.
When we come to the CASS_HOSTS variable,
we can see that the new IP information,
is placed between the original Cassandra hosts IP1 through IP3.
To update the existing Cassandra nodes,
simply run the apigee setup utility specifying c as the profile,
which means just install Cassandra,
and providing it with the path to the updated configuration file.
To install the new Cassandra nodes,
we follow almost the same process as we use during installation.
We download the bootstrap script,
if we didn't keep a copy of the one used during our original installation.
Next, we run the bootstrap script,
either by making the file executable witch mod,
or as in this case pre-pending bash to our command.
Once the bootstrap script is run,
we install apigee setup using the apigee service utility.
And finally we can run setup passing in c,
Cassandra only as the profile and again
providing the path to a new installation config file.
It's worth noting, that in this lesson we are only focusing on adding Cassandra nodes.
Typically during a ring expansion,
you would also be extending the Zookeeper ensemble at the same time,
since we generally see these components co-located.
Next, we rebuild the data on the new Cassandra nodes.
To do this, the region name specified in the configuration file,
is used as a source region for streaming data during the rebuild.
This step should be performed on each of the new Cassandra nodes.
Once the new nodes have been rebuilt,
the management server needs to be reconfigured to be aware of the new topology.
Again this is done using the apigee setup utility.
We call the saseb.sh script,
passing in the profile MS,
and using the new updated configuration file,
with the new Cassandra topology.
And the final step,
we run nodetool with the clean up option.
This is done to remove unwanted data,
following the addition of a new node or nodes to the cluster.
When new nodes are added,
existing nodes will loose parts of the partition range they are responsible for.
Cassandra does not automatically clean this data up,
which has an unwanted effect during rebalancing,
with Cassandra including the old data as part of the load on the node.
During the cleanup process,
there will be an increase in disk usage,
proportional to the size of the largest SS table,
as well as the accompanying disk I/O.
The addition of the new nodes is now complete.
Next, we come into removing Cassandra nodes.
First, we update the configuration file used during our original installation.
This time, we are removing the nodes marked IP10 through 12.
The new configuration file can be seen on the right.
We then use this configuration file to update
the Cassandra topology on the management server by running the apigee setup command,
passing the profile MS for the management server,
and providing the path to the updated configuration file.
Now that the management server has been updated,
we perform a rolling restart of
all edge components with the exception of the Cassandra and Zookeeper nodes.
The restart forces components to reread the latest pod wiring information,
and to remove references to the removed Cassandra nodes.
Next, we commission the Cassandra nodes being removed.
This process is performed one node at a time.
The argument decommission is given to nodetool,
which deactivates its node by streaming its data to another node.
The data is streamed to the next node in the ring.
And finally, we update the local configuration file on the remaining Cassandra nodes.
To do this, we use the apigee set up passing in the profile c for
Cassandra only and providing the path to the updated configuration file.
Optionally, we can now completely decommission
the node by uninstalling the Cassandra software,
using the instructions found by following the link shown.
This concludes capacity planning and scaling add and remove Cassandra.
For more information you can visit docs.apigee.com,
and to get involved in the community,
please go to community.apigee.com. Thank you.