In this video, we will talk about the topics that I consider one of the most important topics in the entire lesson. We will talk about Cassandra's architecture. Cassandra is a great database, it is scalable, distributed, fault-tolerant, and it's trusted by many big companies that need to store a lot of data and need to process lots of requests. The foundation of Cassandra is a technique called consistent hashing. It's used by many other distributed systems like DynamoDB. If you want to learn how to use Cassandra or DynamoDB or other similar distributed systems efficiently, you need to understand how consistent hashing works. But before we dive into those, let's quickly recap another concept called hash table. Hash table allows us to store a value with a key and retrieve the value associated with the key very efficiently. Hash table is implemented on top of an array. When we want to write a data item into our hash table, the first thing to decide is where to store it. To do this, we apply a hash function store key and then we get a modular operation to get the position in our array. The number four here is the size of an array. So, that's how we get the position of an item in the array. All items who has the same position in the array, it is stored in LinkedList in this position. It will retrieve an item from a hash table. It's very simple and we again apply a hash function to a key, we get the position in the array and we can read our value. Now, what if we want to have a distributed hash table? An if approach can be to replace an array in memory with center of hosts, but this creates a problem. What if we want to add another host or what if we want to scale up our system? In this case, it wouldn't worry because where it has a different number of hosts we will get another position in our array of hosts because with different number of machines, we will get different position in our array of hosts or distributed hash table won't work. It's not an easy problem to solve, but fortunately, there is a better solution and this better solution is called consistent hashing. Here is how it works. If we have multiple nodes in our system, they are organized in a virtual circle and this circle has a hash range. It's usually from zero to a really big number, but in our case, it can be from zero to 1,024 just for simplicity. If we want to write an item into our system, we again apply hash function to a key. This allows us to find the position of this value on the hash key range and then we simply go clockwise to find what host should store this data item. Just as with a hash-map, we'll have to go through the same steps to read value associated with a given key. Where this becomes really interesting is we want to add a host to our system. If we add a node four, now is responsible for a part of our original circle. Organised to do to properly maintain data location in our system is to copy data that node four is now responsible for from node one. This is as simple as that. It's really easy to add a new host in the system comparing to distributed hash table. This approach has one drawback because for every point on the circle, there is only one node that is responsible for this position. If we'll lose any node, we will lose part of our data. Solution to this problem is quite straightforward. All we need to do is to replicate data to other hosts as the data that node one is responsible for can be replicated to node three, data that node three is responsible for it can be replicated to node two and so on. Now, with this approach, for one to write data into our system, we first find the position on the circle and write data to our host and then replicate this data to another host to keep another copy. If we lose one of our machines, it's not a big deal because we will have a copy and this is exactly what allows Cassandra to be fault-tolerant. So, let's just quickly sum up what we've learned in this relatively short video. First of all, we've learned that Cassandra is based on consistent hashing and this is the basis of Cassandra scalability and fault-tolerant. Consistent hashing is very similar to hash-map whose the difference that is suitable for distributed systems it can be easily scaled up or scale down. As you can notice, it is a completely master-less architecture. Every single host in our system has exactly the same role. We don't need a separate masters that can fail. It allows to easily add and remove hosts from our cluster and it also allows to achieve practically infinite scalability. We can easily have huge clusters to worse petabytes of data with this architecture.