0:00

So the key idea is that instead of scaling up the number of ports per

router, we're going to scale out the topology through what's called a multi

stage switch network. This is a network with many switches.

For example, a simple two by two switch, two input, two output.

Can be either configured like this, or it can be configured.

Like that. So there are only two possible

configurations for this small two by two switch.

And if we connect many of these kind of switches which could be two by two or

larger, two by three, three by two, four by four then we form a network of

connectivity determined by the configuration inside each switch.

If this runs through multiple stages, we call that a multi-stage switch network.

It turns out that you can view large network using only small switches but

many of them. And arranged in a smart way across

multiple stages. And that's what we mean by skating out a

topology, to enable the skating of a network.

Even though, we do not scale up the number of ports for each network

[INAUDIBLE]. So here's a simple example to illustrate

that. We got on the left, the exact kind of

tree that we just drew. You've got eight servers as leaf nodes in

circles, and you've got switches in rectangles starting from small switches,

two by two, going to four by four, going to eight by eight.

Okay? So suppose, let's say that we don't want

switches larger than two by two. So this layer of switches do okay in the

tree, but this and this layer is not. So, one possibility is to replace each of

these four by four switches with two, two by two switches.

So, one four by four somehow is equivalent now to two, two by two.

For example, this is a two by two this is another two by two.

I put the two together, and now the lower layer switch will actual go talk to each

of these switches. And that collectively function as this

four by four switch. By symmetries identical on the right-hand

side. And then, the uppermost layer for this

one example is an eight by eight switch, and we show that it can be done

equivalently as four two by two switches. Okay?

Each of these is a two by two switch, and then each of them is connected one link

to the left half and the other to the right half.

Collectively, these four act as an eight by eight switch.

And by the way, each one is, indeed, two by two.

Even though you only see, two links here. Because two links need to come in.

And then they also need to go out, right? So this is, just an artifact, a visual

artifact created by folding these switches.

We'll look example of this again. But these, each of these is a two by two

switch. So now, if you look at this,

instead of having, what, all together, seven switches but some are big switches.

And we now have a collection of twelve switches.

So almost doubled in this small example, the number of switches.

But each switch if only two by two now. Instead of scaling up the port count per

switch we scale out the topology of connectivity pattern.

We will look at a special case called the Cloche network of such connectivity

pattern. Invented by Cloche in 1953, but in

general there are so many choices in their study in the field of

interconnection network which has been studied in different application context.

Starting with 1950's and 60's in actually circuit switch networks, even in

telephone network PSTN, people realized by then that hey you cannot build a large

enough mechanical or electronic switch. So you need to scale out apology with

multistage network. Then we'll study in parallel computation

in the 60's, 70's to 80's. Okay with many mainframe computers

connected together into a supercomputer then the scale got shrunken to a much

smaller one. we're looking at systems on a chip and

multi-core processor in the 90's and the last decade and then in recent years

revived again in the context of data center networking.

Connecting servers within a large data center.

All these are different cases of interconnection network, and the common

theme is that connectivity pattern is a resource that you have to carefully

design. And there are many different metrics to

compare different kinds of inter connection

networks. We'll mention two.

One is called by section bandwidth, and the other is called blocking, or non

blocking property. Now, the idea of a bisection bandwidth

has to go back to a simple graph oretic operation called a cut.

So suppose you are given a graph G, which is really a set of nodes V, a set of

lengths E. A cut is to say that I'm going to divide

this node set into two parts V1 and V2 which equals to the original set of V the

set of nodes in V1. Okay?

5:40

And a bisection is basically a, a binary cut where the two subsets are of equal

number of nodes. For example, here would be a network with

a bisection, this way. Okay, so I divide the total set of nodes

into V1 and V2. Now what about the links?

Some links are entirely inside your V1 or V2 connecting only nodes in all in V1,

all in V2, then those links are just fine.

What about the links that actually connects across V1 and V2 like these

three links? Well, we can count them, if it's an

unweighted graph. Or if it is weighted, we can add up their

weights like two, three, five. All right, then we add them up and this

is called the capacity of that cut, okay? The capacity of a cut, in this case.

Ten, in general, it is some real number representing the sum of the links

connecting acoss to, to size. In the case of a bi-section, two equal

parts, then we can say it's the capacity of that

bisection. Now there many ways to cut the given

graph into two parts or two equal parts. For example, this will be another cut.

Okay? Now these two nodes belongs to one side

and these belong to another set, side. So you can look at all the possible ways

to cut or bisect and then find out the minimum one.

Okay? The one with the smallest capacity,

that's called the minimum cut. And the resulting capacity, in the case

of a bisection to equal parts, is called bisection bandwidth.

It's the worst case connectivity, between two equal parts of a network.

7:30

For example this cut, let's say these two links cut ways one

and four, then this weight of cut gives you a capacity of eight,

this weight of cut gives a capacity of ten.

So, we just saw eight is smaller than ten.

So, the minimum cut is this minimum bisection in this case.

And a bisection bandwith is eight units. Now, there's also a famous, theorem

connecting the mean cut with the amount of flow you can squeeze from one node to

another. Let's say you want to squeeze a lot of

flows from the source to this destination through this graph.

Okay. For example I want to squeeze say 30,

well 30 is impossible because there are only 16 plus, 13 29 units that's

available to come out of the source. So let's say alright I want to squeeze 29

then, alright. And split 16 out of 16 go out this way,

13 out of 13 go out this way. So both links capacity at 40 utilized

Alright? Then what? The 13 arrives here, it can go out.

Okay, either part of this along this link with capacity 4 or along this length of

capacity 14 can all go along this length, for example.

Now the problem is 16 units arrive here and there's only one way to go out, that

way is the link with capacity only 12 which is smaller than 16 and so you

cannot squeeze all 16 units out of this node.

So 29 is not a flow that you can squeeze through from the source to the

destination. So now the questions is, what is the

maximum amount of flow that you can squeeze from source to destination for

this given topology and weights or capacity on the links.

Well, that is the famous maximum flow problem and there are several very famous

efficient algorithm to solve that. Now this is not a graph theory course of

whom will go through all of them. But I want to highlight one very elegant

theorem, assess the following. Well if you want to find out the maximum

flow through the network between two points,

it's the same as finding the minimum cut where the source lies on one side of the

cut, and destination on the other. So, source belongs to B1, the subside of

nodes, and destination belongs to B2, another subside of nodes.

It turns out that this shaded, configuration is the cut with the

smallest capacity. That is the min cut, okay?

You can convince this yourself, by cutting in different other ways.

In this min cut, we see that going from the V1 side to the V2 side, there are

three links of capacity 12, 7 and 1. So you add up together that is 23 units.

So the mean cut capacity for this graph, were source and destination lies on two

different sides of the cut is 23 units. So is it indeed possible to actually push

23 units from source to destination? Yes and here's one way to do it.

You put 11 on this link, 12 on this link. How do we get to this split into 11 and

12 this way. well that can be discovered by those

graph oretic algorithms that we're skipping.

But now, once you get to 11 here that's fine.

11 can all go on this way. 12 comes here.

11 will go here and one will go this way. So you all together got one plus 11, 12

units that fully utilizing those link. Okay.

This link's not used and the 11 arrives here.

You can split into 4 and 7. Now, you see why we took one unit off

this way. Because if all 12 unit gets to here, then

you won't be able to put them all to these two links.

So, it's not a surprise. In fact, a small example, you can eyeball

that indeed, you should put one out there.

And then the one together with 11 will saturate this link and the 11 will be

able to saturate these two links. And then once the seminar arrives here

with 12 as of to 19, but it can be supported out of this link unit of 20

capacity. So now finally, this destination gets two

things coming in one with 19, one with 4's all 23 units are received as they

were transmitted out of the source. So indeed as you can verify yourself,

there is no way to increase the max, the flow from source to destination.

So the maximum flow is 23 units, which is the same as the minimum cut size.

So that's a long detour of a fun and important classic result.

But in data center design when you scale out the topology, you want a large

bisection bandwidth okay. A large potential bottleneck to the

performance of communication within the data center.

The second criteria we use is called non blocking.

It originated from the circuit switching word for the telephone network.

There's something called non-blocking and something called rearrangably

non-blocking. Non-blocking says that, for whatever

traffic pattern arrives in the multi-stage network, which I'm going to

draw momentarily, as long a this input port and that output ports themselves

available, you should be able to find a way to pass

it through this multi-stage switched network.

So the connectivity pattern should be able to support that.

And rearrangeably non-blocking is a weaker condition, that says,

well, you may not be able to do that. But if I allow you to rearrange some of

the existing connectivity pattern. For example, instead of going this way,

say well, you should go this way. And this opens up a way for this input

code to this output. As long as you can rearrange existing

configuration and then be able to achieve non-blocking then that's also good okay.

So this is a weaker condition. Now what kind of switch network will have

blocking or rearrange for the non-blocking property?

Well only those that have rich enough connectivity patterns in them.

And indeed, in 1953, Clark invented a particular connectivity pattern and we're

going to see the kind of non-blocking conditions on the parameters for such

networks.