Welcome to this module on containers and Google Kubernetes engine. We've already discussed Compute Engine, which is GCP's Infrastructure as a Service offering, with access to servers, file systems, and networking, an App Engine which is GCP's PaaS offering. Now, I'm going to introduce you to containers and Kubernetes engine, which is a hybrid that conceptually sits between the two, with benefits from both. I'll describe why you want to use containers, and how to manage them in Kubernetes engine. Let's begin by remembering that Infrastructure as a Service allows you to share compute resources with other developers by virtualizing the hardware using virtual machines. Each developer can deploy their own operating system, access the hardware, and build their applications in a self-contained environment with access to RAM, file systems, networking interfaces, and so on. But flexibility comes with a cost. The smallest unit of compute is an app with its VM. The guest OS maybe large, even gigabytes in size, and takes minutes to boot, but you have your tools of choice on a configurable system. So, you can install your favorite runtime, web server, database, middleware, configure the underlying system resources such as disk space, disk I/O, or networking, and build as you like. However, as demand for your application increases, you have to copy an entire VM, and boot the guest OS for each instance of your app, which can be slow and costly. With App Engine, you get access to programming services. So all you do, is write your code in self-contained workloads that use these services, and include any dependent libraries. As demand for your app increases, the platform scales your app seamlessly and independently by workload and infrastructure. This scales rapidly, but you won't be able to fine tune the underlying architecture to save cost. That's where containers come in. The idea of a container is to give you the independent scalability of workloads, and an abstraction layer of the OS and hardware. What you get is an invisible box around your code and its dependencies, with limited access to your own partition of the file system and hardware. It only requires a few system calls to create and starts as quickly as a process. All you need on each host is an OS kernel that supports containers, and a container runtime. In essence, you're virtualizing the OS; it scales like paths, but gives you nearly the same flexibility as IaaS. With this abstraction, your code is ultra-portable, and you can treat the OS and hardware as a black box. So, you can go from development, to staging, to production, or from your laptop to the cloud without changing or rebuilding anything. If you want to scale for example, a web server, you can do so in seconds and deploy dozens or hundreds of them, depending on the size of your workload, on a single host. Now, that's a simple example of scaling one container running a whole application on a single host. You'll likely want to build your application using lots of containers, each performing their own function like microservices. If you build them like this, and connect them with network connections, you can make them modular, deploy easily, and scale independently across a group of hosts. The host can scale up or down and start and stop the containers on-demand as demand for your application changes, or as hosts fail. A tool that helps you do this well is Kubernetes. Kubernetes makes it easy to orchestrate many containers on many hosts, scale them as microservices, and deploy rollouts and rollbacks. First, I'll show you how to build and run containers. I'll use an open source tool called Docker, that defines a format for bundling your application, its dependencies, and machine-specific settings into a container. You could use a different tool like Google Container Builder. It's up to you. Here's an example of some code you may have written. It's a Python app. It says, ''Hello world.'' Or if you hit this endpoint, it gives you the version. So, how did you get this app into Kubernetes? You may have to think about your version of Python, what dependency you have on flask, how to use your requirements.txt file, or how to install Python, and so on. So, you use a Docker file to specify how your code gets packaged into a container. For example, if you're used to using Ubuntu with all your tools, you start there. You can install Python the same way you would on your dev environment. You can take your requirements file that you know from Python, and you can use tools inside Docker or Container Builder to install your dependencies the way you want. Eventually, it produces an app, and here's how you run it. Then you use the Docker build command to build the container. This builds the container and stores it locally as a runnable image. You can save and upload the image into a container registry service, and share or download it from there. Then, you use the Docker run command to run the image. As it turns out, packaging applications is only about 5 percent of the issue, the rest has to do with application configuration, service discovery, managing updates, and monitoring. These are the components of a reliable, scalable, distributed system. Now I'll show you where Kubernetes comes in. Kubernetes is an open-source orchestrator that abstracts containers at a higher level so you can better manage and scale your applications. At the highest level, Kubernetes is a set of APIs that you can use to deploy containers on a set of nodes called a cluster. The system is divided into a set of master components that run as a control plane, and a set of nodes that run containers. In Kubernetes, a node represents a computing instance like a machine. In Google Cloud, nodes are virtual machines running in Compute Engine. You can describe a set of applications and how they should interact with each other, and Kubernetes figures how to make that happen. Now that you've built a container, you'll want to deploy it into a cluster. Kubernetes can be configured with many options and add-ons, but can be time consuming to bootstrap from the ground up. Instead, you can bootstrap Kubernetes using Kubernetes Engine, or GKE. GKE is hosted Kubernetes by Google. GKE clusters can be customized, they can support different machine types, numbers of nodes, and network settings. To start up Kubernetes on a cluster in GKE, all you do is run this command. At this point, you should have a cluster called k1 configured and ready to go. You can check its status in admin console. Then, you deploy containers on nodes using a wrapper around one or more containers called a pod. A Pod is the smallest unit in Kubernetes that you create or deploy. A Pod represents a running process on your cluster as either a component of your application, or an entire app. Generally, you only have one container per pod, but if you have multiple containers with a hard dependency, you can package them into a single Pod, and share networking and storage. The Pod provides a unique network IP, and set of ports for your containers, and options that govern how a container should run. Containers inside a pod can communicate with one another using localhost and ports that remain fixed as they are started and stopped on different nodes. One way to run a container in a pod in Kubernetes is to use the kubectl run command. We'll learn a better way later in this module, but this gets you started quickly. This starts a deployment of a container running in a pod, and the container is an image of the ngnix server. A deployment represents a group of replicas of the same pod and keeps your pod running even when nodes they run on fail. It could represent a component of an application or an entire app. In this case, it's the Nginx web server. To see the running Nginx pods, run the command, kubectl get pods. By default, pods in a deployment are only accessible inside your GKE cluster. To make them publicly available, you can connect a load balancer to your deployment by running the kubectl expose command. Kubernetes creates a service with a fixed IP for your pods, and a controller says, ''I need to attach an external load balancer with a public IP address to that service so others outside the cluster can access it.'' In GKE, the load balancer is created as a network load balancer. Any client that hits that IP address will be routed to a pod behind the service. In this case, there's only one, your simple nginx Pod. The service is an abstraction which defines a logical set of Pods and a policy by which to access them. As deployments create and destroy Pods, Pods get their own IP address, but those addresses don't remain stable over time. A Service groups a set of pods, and provides a stable endpoint or fixed IP for them. For example, if you create two sets of pods called frontend and backend, and you put them behind their own services, backend pods may change, but frontend pods are not aware of this - they simply refer to the backend service. You can run the kubectl get services command to get the public IP to hit the Nginx container remotely. To scale a deployment, run the kubectl scale command. In this case, three pods are created in your deployment, and they're placed behind the service, and share one fixed IP. You could also use autoscaling with all kinds of parameters. Here's an example of how to autoscale the deployment to between 10 and 15 pods when CPU utilization reaches 80 percent. So far, I've shown you how to run imperative commands like expose and scale. This works well to learn and test Kubernetes step by step. But the real strength of Kubernetes comes when you work in a declarative way. Instead of issuing commands, you provide a configuration file that tells Kubernetes what you want your desired state to look like, and Kubernetes figures out how to do it. Let me show you how to scale your deployment using an existing deployment configuration file. To get the file, you can run a kubectl get pods command like the following, and you'll get a deployment configuration that looks like the following. In this case, it declares you want three replicas of your nginx pod. It defines a selector field, so your deployment knows how to group specific pods as replicas, and you add a label to the Pod template, so they get selected. To run five replicas instead of three, all you do is update the configuration file, and run the kubectl comply command to use the config file. Now, look at your replicas to see their updated state, then use the kubectl "get pods" command to watch the pods come online. In this case, all five are ready and running. And check the deployment to make sure the proper number of replicas are running using either "kubectl get deployments," or "kubectl describe deployments." In this case, all five pod replicas are available. And you can still hit your endpoint like before using "kubectl get services" to get the external IP of the service, and hit the public IP from a client. At this point, you have all five copies of your Nginx pod running in GKE, and you have a single service that's proxying the traffic to all five pods. This allows you to share the load and scale your service in Kubernetes. The last question is, what happens when you want to update a new version of your app? You want to update your container to get new code out in front of users, but it would be risky to roll out all those changes at once. So, you use kubectl rollout, or change your deployment configuration file, and apply the changes using kubectl apply. New pods will be created according to your update strategy. Here's an example configuration that will create a new version of your pods one-by-one, and wait for a new pod to be available before destroying one of the old pods.