There are a number of ways we can work with data in MongoDB. As we've just seen, one way is through Compass. In addition to simply viewing a collection, we can also filter and perform a number of other operations. Here, our filter for the movie, The Big Short Note that just two documents from this collection match this filter. In fact, this is actually evidence of some dirty data because one of these is a duplicate. So we can delete the dupe. We can also update an individual document by editing its fields. Here, I'll update this doc with the runtime. From a google search, I happen to know that it's 130 minutes. So compass provides a convenient means of exploring data and it's visual. So that helps immensely in investigating many types of questions you might have about a data set. Especially as you first begin to work with it. But the bulk of the work will be in manipulating and analyzing data. For that, we want to use a combination of the Mongodb aggregation framework and query language. In this chapter, I'll provide you with an introduction to both, as we work to reshape and clean up the movies data set, and integrate it with other data sources that will set us up to build an application and analytics infrastructure. We'll do all of this from Python, using the MongoDB driver, PyMongo. The same principles would apply across Java, Node, C Sharp, R or any of the other programming languages MongoDB supports. First however, it's valuable to discuss how all the pieces work together. And I wouldn't be doing my job as an instructor if I didn't throw at least one architectural diagram your way. As you know, your MongoDB free tier cluster is running in Atlas. Atlas is a convenient platform for running MongoDB which is why we're using it for this class. You could just as easily run MongoDB from your local computer but Atlas requires very little set up on your part and all the administration is managed for you. Depending on where you are in the world, your cluster is probably running on servers physically located in an Amazon AWS data center. This means its in the cloud. I've drawn your MongoDB cluster as three servers because that's what it is. It's a replica set, meaning that three servers are working together to remain in sync, each maintaining a redundant copy of your data. One member of your replica set is always primary, meaning that it's the one that you were communicating with when writing data, and usually when reading data. If the primary stops functioning or loses it's internet connection, another member of the replica set will step in to serve as primary. Since it holds a complete copy of your data, you probably won't even notice if this happens. By default, both Compass and PyMongo are designed to direct the request to whichever node in a replica set is primary, even if the primary changes. For a course like this, it's less important but in production environments this type of high availability is essential. The free tier option in Atlas supports this because a three member replica set is the default minimum deployment for all clusters in this platform. Now, you downloaded Compass to your local computer, so it's running there and connecting to your Atlas cluster through an internet connection. We will also be developing scripts and applications written in Python that will use your free tier cluster. In order to connect to MongoDB and make use of the aggregation framework and the MongoDB query language, we need to use PyMongo. PyMongo is available in PyPi. And, you can install it using Pip or easy install. PyMongo is just a Python, library like many of the other you are constantly using, such as namPy, Sci fi, or pandas. We'll also be making extensive use of python Jupiter notebooks in this class, to enable you to work through exercises right in your web browser. This will also make use of PyMongo. But that will all be handled behind the scenes in the Coursera platform. And that about wraps it up for the big moving parts of the scripts and applications we'll be using in this class.