0:00

Hi, welcome back. We will continue.

In this section, we will continue talking about distributed query processing.

So, to give you an idea of how the general process works,

let's look at this diagram.

First, again, we have a query received in,

for example, SQL query language with declarative query language.

What you do is that the first step is called decomposition.

We're going to see what decomposition means in a few but what it does,

basically, is similar to what we do in a centralized database.

You take a query, you check for the sanity of the query,

the syntax of the query.

All these kind of issues that can come with

a query that a human can write or like a user can write.

It checks for that. That's a query decomposition.

And then out of that, you create like an algebraic query under distributed relations.

After you do that decomposition and normalization,

you do something called the data localization.

So, you localize these kind of algebraic queries into fragments.

Instead of the global schema,

you kind of localize it into the fragments and where they're stored.

And then you generate fragment queries,

and then you do a global optimization.

All of this is happening.

Nothing is optimal, nothing executed yet,

this is just the optimization and all this is done at the control site.

After you do the global optimization,

you have optimized fragments with operations communication,

and then you do local optimizations on site.

You send out these kind of plans to the local site to do local optimization sites,

and you have an optimized local query.

All of this is done also in the local sites.

So as you can see, you go through a lot of optimization steps till

you reach the final optimized plan,

and then final executed plan across local sites.

So, decomposition again is checking the sanity of the queries, as we mentioned,

and we generate from the declarative language,

generate like an algebraic form of the query,

then localization, and then optimization.

Optimization will take a look of how can you do global optimization.

But first, let's see how decomposition works.

So, in the composition again, you do normalization,

you eliminate redundancies, you do algebraic rewriting.

And this is the same exact steps that we apply in a centralized database.

2:40

So, the idea of normalization is that you

convert from a general language to a standard form,

which is relational algebra.

And we do always transform it into

relational algebra because relation algebra is the operational language.

It actually tells how the sequence of operations are running on top of database.

Again, this is done in the decomposition,

this is done on a global schema,

not necessarily on the fragment schema.

So, when you have a query like this,

you generate, you do the sanity checks,

you do everything on the query,

and then you generate this as an algebraic form of the query,

the one down there of the same query.

So you join R with S,

and then you do selection on top in conjunctive normal form like this.

So, what you do is that, you check whether this is correct or no,

you check whether there are any redundancies and you remove it.

So if you look at here, for example,

if R does not have an attribute A,

this is an issue.

In conditions also like this,

like A equal one,

and A larger than five,

this will always return false,

so this is not a correct,

so you eliminate the redundancy.

Here also, there is redundancy,

so you just eliminate this redundancy.

So, this is all in decomposition step.

Also, like common sub-expressions,

we apply selection on R here, the same selection on R,

why don't we just apply the condition on

R just once and push the conditions?

So, all these kind of algebraic rewriting is done in the decomposition staff.

After that, you go to localization which is

a very important step and now, it's different.

It does not exist in a centralized database,

it's only in distributed database.

And here is that you want to replace the global schema, the relations,

with the fragments because the data,

like you have different fragments of the relations and distributed across sites.

So what do you do, the very simple strategy to do is that you start with a query,

the algebraic form that you have,

you replace that relations by fragments.

And to do that, you replace them by using the union operation,

because if you need hold that fragments together,

it will get you back the basic relation.

Then you push the union up,

and push selection and prediction down, that's a third step.

After that, you simplify, eliminate unnecessary operations.

Example, so fragments, the fragments referred

to it with R and then the condition that represent this fragment.

If we have a query that,

this is the selection over R here,

what if we have two fragments for R,

for this query, R1 and R2,

R1 has the condition E less than 10,

and R2 has the condition E larger than or equal 10?

In that case, what you do is that you put the union between R1 and R2.

This is how we replace R,

and you create the union.

On top of the union, you do selection.

So, that's the first step, replace R with its fragments.

After that, you push the union up,

and then you push the selection down.

And in that case, the union up is pushed here,

and the selection, the selection should be here.

So, what we did is that we push the union up,

push the selection down on the fragments.

So if you look, is there anything wrong with this expression?

Yes. I mean, here, you have selection E equal three,

while the fragment actually has only E larger than or equal 10.

So, this means that all the side

needs to be eliminated which will eliminate the union as well.

And the final results of the fragmentation is going to be selection on R1,

E less than equal 10.

That will be the final result of the localization.

So, these very simple steps,

you can just follow them,

the localization algorithm follows them with any kind of query written in algebraic form.