Mean of a dataset - Statistics of Datasets | Coursera

Mean of a dataset

Video placeholder

Loading...

Imperial College London

Mathematics for Machine Learning: PCA

Imperial College London

4 (3,062 ratings)

|

88K Students Enrolled

Course 3 of 3 in the Mathematics for Machine Learning Specialization

Enroll for Free

This intermediate-level course introduces the mathematical foundations to derive Principal Component Analysis (PCA), a fundamental dimensionality reduction technique. We'll cover some basic statistics of data sets, such as mean values and variances, we'll compute distances and angles between vectors using inner products and derive orthogonal projections of data onto lower-dimensional subspaces. Using all these tools, we'll then derive PCA as a method that minimizes the average squared reconstruction error between data points and their reconstruction. At the end of this course, you'll be familiar with important mathematical concepts and you can implement PCA all by yourself. If you’re struggling, you'll find a set of jupyter notebooks that will allow you to explore properties of the techniques and walk you through what you need to do to get on track. If you are already an expert, this course may refresh some of your knowledge. The lectures, examples and exercises require: 1. Some ability of abstract thinking 2. Good background in linear algebra (e.g., matrix and vector algebra, linear independence, basis) 3. Basic background in multivariate calculus (e.g., partial derivatives, basic optimization) 4. Basic knowledge in python programming and numpy Disclaimer: This course is substantially more abstract and requires more programming than the other two courses of the specialization. However, this type of abstract thinking, algebraic manipulation and programming is necessary if you want to understand and develop machine learning algorithms.

Skills You'll Learn

Dimensionality Reduction, Python Programming, Linear Algebra

Reviews

4 (3,062 ratings)

5 stars
51.24%
4 stars
22.37%
3 stars
12.63%
2 stars
6.66%
1 star
7.08%

NS

Jun 18, 2020

Relatively tougher than previous two courses in the specialization. I'd suggest giving more time and being patient in pursuit of completing this course and understanding the concepts involved.

JV

Apr 30, 2018

This course was definitely a bit more complex, not so much in assignments but in the core concepts handled, than the others in the specialisation. Overall, it was fun to do this course!

From the lesson

Statistics of Datasets

Principal Component Analysis (PCA) is one of the most important dimensionality reduction algorithms in machine learning. In this course, we lay the mathematical foundations to derive and understand PCA from a geometric point of view. In this module, we learn how to summarize datasets (e.g., images) using basic statistics, such as the mean and the variance. We also look at properties of the mean and the variance when we shift or scale the original data set. We will provide mathematical intuition as well as the skills to derive the results. We will also implement our results in code (jupyter notebooks), which will allow us to practice our mathematical understand to compute averages of image data sets. Therefore, some python/numpy background will be necessary to get through this course. Note: If you have taken the other two courses of this specialization, this one will be harder (mostly because of the programming assignments). However, if you make it through the first week of this course, you will make it through the full course with high probability.

Welcome to module 10:41

Mean of a dataset4:00

Taught By

Marc Peter Deisenroth
Lecturer in Statistical Machine Learning

Try the Course for Free

Explore our Catalog

Join for free and get personalized recommendations, updates and offers.