Previously, we pointed out that algorithms are one of the four pillars of computational thinking. In this part of the course, we've looked at some basic algorithms, as well as some slightly more complicated approaches to algorithmic thinking. We also talked about analyzing the complexity of those algorithms based on the size of the input. So, let's finish up this part of the course by describing and analyzing the algorithms for our two case studies. Let's start with Erin from Penn's LGBT Center. Remember, that Erin was trying to figure out, how to schedule meetings with the 27 groups that the center supports. Each group requests some period of time when they are available to meet. Then, Erin tries to meet with as many groups as possible. Using computational thinking, Erin was able to use decomposition to break this problem into smaller subproblems. One problem was determining which group to meet, the other was removing any conflicting meeting requests. She used pattern recognition to realize that there's a repeated process of deciding which group to schedule by looking at all the requests and choosing the best one. For now, she'll choose the requests with the earliest starting time. But, we're going to come back to that in just a second. She used data representation and abstraction to represent the meeting requests as two numbers, the start and the end time. Finally, she could use computational thinking to develop an algorithm for solving this problem, which involves repeatedly choosing a request to schedule and removing any conflicting requests until she's done. It turns out that this is an example of a greedy algorithm, which we just saw in the previous lesson. Erin chooses a meeting to schedule based on the request that has the earliest starting time. Then, she removes any requests that overlap with that one. She continues scheduling by selecting the remaining request that has the earliest starting time and so on. But unfortunately, that won't give us the optimal schedule. Let's just look at seven groups in this example. Let's say that these are their start and end times. If we did this using brute force, we'd say that we do or do not schedule each time. That would lead to two to the seventh or 128 combinations. Then, we'd see which of the 128 have no overlapping requests, and then from those choose the one where we meet the most groups. We can see here that group A wants to meet at 8.00 AM and that's earliest. So, we'll pick that. That means we have to eliminate the ones that overlap with it, which are B, E and F. Then, at 1.00 PM, we'd pick C and we do eliminate D and G, and then we'd be done. But we only managed to schedule two meetings. It turns out that we were making the wrong local or greedy decision. It would be better to schedule the request that has the earliest finishing time, that way, we leave ourselves as many options as possible for scheduling after that meeting ends. So, now let's apply a greedy algorithm and choose the one that ends earliest. First, we'll choose F because that's the one that ends earliest. We can figure that out by looking for the request with the minimum ending time. We talked about looking for the minimum value earlier in this part of the course. This is a greedy selection because we're doing the thing that gives us the most remaining options. You might say, "But now, we can't meet E or A." Which is true, but if we had selected one of those, we wouldn't be able to choose B for instance. So, choosing F is best for us for now. So, we won't be able to meet with E or A because they overlap with F. But now, we'll schedule B because it ends earliest of the ones that are remaining. We won't be able to meet with C, but now we'll schedule D. Then finally, we'll choose G. Now, we're able to schedule four meetings and it turns out that this is optimal. Fortunately, we didn't have to consider all 128 combinations in order to come up with this solution. Remember, given the collection of requests, Erin wants to schedule as many meetings as possible at times that do not overlap. The algorithm that she uses finds the unscheduled requests with the earliest ending time. Schedules it and removes other requests that overlap with it and then repeats until there are no remaining requests. So, now let's look at the flowchart. It's certainly longer than ones we've looked at before, but you'll recognize lots of pieces. We have this outer loop, which keeps going as long as the collection of meeting requests is not empty. Inside that outer loop are two inner loops. The first one is looking for the meeting request with the earliest ending time. It's just like we saw earlier in this part of the course with finding the minimum value in a collection. We look at each collection in the request and if its ending time is earlier than the earliest we've seen so far, we update that as the earliest request. When we're done looking through the collection, we add that earliest request to the collection of those that have already been scheduled. Now, we have this second inner loop that is removing any overlapping requests. Just like the previous step, we look at each request in the collection. If it's starting time is before the ending time of the one we just scheduled, then we know it overlaps it so we remove it. When we're done removing overlapping requests, we go back and see if the collection of requests is empty. If so, then we're done. So, what's the complexity of this greedy algorithm? Last time, we saw a greedy algorithm for scheduling when I should recharge my laptop battery and that was linear. But, not all greedy algorithms are linear. In this case, there are some linear components. For instance, finding the request with the earliest finishing time is linear because we just look through the collection once. Finding requests that overlap with the one we just scheduled is also linear. But, how many times do we do those things? We don't just do those once each. Rather, we do those linear things, a linear number of times because we have this outer loop. This keeps repeating as long as there are more requests to schedule. So, if we doubled the number of things that is the number of requests, we don't just do twice as many steps. That is we do the inner things twice as many times, but we would do the whole thing, twice as many times too. So overall, if we doubled the number of items, we do four times as many steps because we do twice as many steps, twice as many times. This is known as quadratic or n squared complexity. Here, we see the curves for linear and quadratic complexity. Remember for linear, the number of operations is linear with respect to the input size. So, here we see this curve for y equals x. But if it's quadratic, the number of operations is based on the square of the input size. So, this curve here is y equals x squared. For linear algorithm, if we have twice as many inputs, we have twice as many operations. But if an algorithm has quadratic complexity and we have twice as many inputs, then we have four times as many operations. Let's finish with Cindy from the Working Dog Center. Remember that Cindy was trying to predict whether a dog would be suitable for disaster search and rescue based on whether dogs with similar test results, did well in search and rescue as well. Like Erin, Cindy can solve this problem using computational thinking. She could use decomposition to break this problem into three smaller problems. First, determining the similarity between the new dog and the other dogs. Second, finding the three most similar dogs. Third, doing a vote based on whether those three dogs were suitable for search and rescue. Using pattern recognition, Cindy discovered that there are repeated processes of calculating the similarity for each dog in the same manner by using the results of the three different types of tests. In solving this problem, Cindy can represent dogs using their individual test results and whether they're suitable for disaster search and rescue. Then, she could develop an algorithm that is based on the three subproblems she identified during decomposition. First, she can calculate how similar the new dog is to each of the other dogs. Then, she can find the three highest values of the similarity scores. Last, she can just count how many of those dogs were suitable for disaster search and rescue. Now, let's look at the flow chart. We've shown it here in three parts to reflect the three main parts of the algorithm. First, we need to calculate the similarity for each dog. So, we have the loop where we see if there are more dogs. If there are, we calculate its similarity by seeing how many of the three tests have the same results. When we're done with that for all the dogs, we move on to the next part of the algorithm. Now, we're going to find the three most similar dogs. We have this outer loop, which is going to execute three times in our algorithm. Inside that, we're just looking for the maximum value, which we've seen in various forums a few times now. Once we found that maximum value, we know that's the most similar dog. So, we add it to the collection of similar dogs and remove it from the collection of all dogs so that we don't double count it. Now, we found the three most similar dogs and we're almost done. In this last part, we have a loop, where we look at each of the most similar dogs. If the dog was suitable for search and rescue. Then, we increment this counter or this vote. When we've looked at all the similar dogs, we see if the vote is greater than or equal to two. Meaning the majority of similar dogs were suitable for disaster search and rescue. In that case, we predict that this new dog will be too. Otherwise, we predict that it won't be. This is actually a popular machine-learning algorithm known as K-Nearest Neighbors. For some value of k, which is usually odd like three in this case, we find the most similar or nearest elements for which we already know some piece of info and use that to predict the info for some new elements. So, what's the complexity of this algorithm? This algorithm consists of three parts, calculating the similarity of each dog, finding the three most similar dogs and conducting the vote. The first part is definitely linear. We do the calculation once for each dog. If there are twice as many dogs, it takes twice as many steps. So, that's linear. The second part is also linear. In finding the most similar dog, we look at each dog in the list once. If we had twice as many dogs, we'd need to do twice as many comparisons. Now, you might say "Wait, isn't this quadratic like the scheduling algorithm, since we have the loop within a loop?" But, it doesn't matter that we go through the list three or five or a hundred times, because it's independent of the number of dogs. So, it's linear overall. If we doubled the number of dogs, we only double the number of steps. Conducting the vote is also linear. In the worst case, if we were doing a vote over all the dogs, then we'd have to look at each dog's value and that's linear. Since all parts of the prediction algorithm are linear, the algorithm as a whole is also linear. Because again, if for instance, we doubled the number of inputs, we have to do all the parts twice as many times. So overall, we just have twice as many total operations, which means it's linear. When I was a graduate student, I taught an introduction to computer science class, and our textbook defined computer science as the study of algorithms. But, you don't have to be a computer scientist to think algorithmically or to use algorithms. As we've seen in this part of the course, there are lots of common algorithms and patterns among them. There are some basic algorithmic approaches that can be used in solving lots of different problems. Once we've used computational thinking and developed an algorithm, the next step is to express the algorithm to a computer. In order to do that, we need to know the language that the computer understands. But, in order to do that, we first need to know what the computer is capable of doing. That's what we'll see in the next part of the course.