课程信息
4.7
405 个评分
88 个审阅
专项课程

第 2 门课程(共 7 门)

100% 在线

100% 在线

立即开始,按照自己的计划学习。
可灵活调整截止日期

可灵活调整截止日期

根据您的日程表重置截止日期。
高级

高级

完成时间(小时)

完成时间大约为47 小时

建议:6-10 hours/week...
可选语言

英语(English)

字幕:英语(English)

您将获得的技能

Data AnalysisFeature ExtractionFeature EngineeringXgboost
专项课程

第 2 门课程(共 7 门)

100% 在线

100% 在线

立即开始,按照自己的计划学习。
可灵活调整截止日期

可灵活调整截止日期

根据您的日程表重置截止日期。
高级

高级

完成时间(小时)

完成时间大约为47 小时

建议:6-10 hours/week...
可选语言

英语(English)

字幕:英语(English)

教学大纲 - 您将从这门课程中学到什么

1
完成时间(小时)
完成时间为 6 小时

Introduction & Recap

This week we will introduce you to competitive data science. You will learn about competitions' mechanics, the difference between competitions and a real life data science, hardware and software that people usually use in competitions. We will also briefly recap major ML models frequently used in competitions....
Reading
8 个视频 (总计 46 分钟), 7 个阅读材料, 6 个测验
Video8 个视频
Meet your lecturers2分钟
Course overview7分钟
Competition Mechanics6分钟
Kaggle Overview [screencast]7分钟
Real World Application vs Competitions5分钟
Recap of main ML algorithms9分钟
Software/Hardware Requirements5分钟
Reading7 个阅读材料
Welcome!10分钟
Week 1 overview10分钟
Disclaimer10分钟
Explanation for quiz questions10分钟
Additional Materials and Links10分钟
Explanation for quiz questions10分钟
Additional Material and Links10分钟
Quiz5 个练习
Practice Quiz8分钟
Recap8分钟
Recap12分钟
Software/Hardware6分钟
Graded Soft/Hard Quiz8分钟
完成时间(小时)
完成时间为 2 小时

Feature Preprocessing and Generation with Respect to Models

In this module we will summarize approaches to work with features: preprocessing, generation and extraction. We will see, that the choice of the machine learning model impacts both preprocessing we apply to the features and our approach to generation of new ones. We will also discuss feature extraction from text with Bag Of Words and Word2vec, and feature extraction from images with Convolution Neural Networks....
Reading
7 个视频 (总计 73 分钟), 4 个阅读材料, 4 个测验
Video7 个视频
Overview6分钟
Numeric features13分钟
Categorical and ordinal features10分钟
Datetime and coordinates8分钟
Handling missing values10分钟
Bag of words10分钟
Word2vec, CNN13分钟
Reading4 个阅读材料
Explanation for quiz questions10分钟
Additional Material and Links10分钟
Explanation for quiz questions10分钟
Additional Material and Links10分钟
Quiz4 个练习
Feature preprocessing and generation with respect to models8分钟
Feature preprocessing and generation with respect to models8分钟
Feature extraction from text and images8分钟
Feature extraction from text and images8分钟
完成时间(小时)
完成时间为 29 分钟

Final Project Description

This is just a reminder, that the final project in this course is better to start soon! The final project is in fact a competition, in this module you can find an information about it....
Reading
1 个视频 (总计 4 分钟), 2 个阅读材料
Video1 个视频
Reading2 个阅读材料
Final project10分钟
Final project advice #110分钟
2
完成时间(小时)
完成时间为 2 小时

Exploratory Data Analysis

We will start this week with Exploratory Data Analysis (EDA). It is a very broad and exciting topic and an essential component of solving process. Besides regular videos you will find a walk through EDA process for Springleaf competition data and an example of prolific EDA for NumerAI competition with extraordinary findings....
Reading
8 个视频 (总计 80 分钟), 2 个阅读材料, 1 个测验
Video8 个视频
Building intuition about the data6分钟
Exploring anonymized data15分钟
Visualizations11分钟
Dataset cleaning and other things to check7分钟
Springleaf competition EDA I8分钟
Springleaf competition EDA II16分钟
Numerai competition EDA6分钟
Reading2 个阅读材料
Week 2 overview10分钟
Additional material and links10分钟
Quiz1 个练习
Exploratory data analysis12分钟
完成时间(小时)
完成时间为 2 小时

Validation

In this module we will discuss various validation strategies. We will see that the strategy we choose depends on the competition setup and that correct validation scheme is one of the bricks for any winning solution. ...
Reading
4 个视频 (总计 51 分钟), 3 个阅读材料, 2 个测验
Video4 个视频
Validation strategies7分钟
Data splitting strategies14分钟
Problems occurring during validation20分钟
Reading3 个阅读材料
Validation strategies10分钟
Comments on quiz10分钟
Additional material and links10分钟
Quiz2 个练习
Validation8分钟
Validation8分钟
完成时间(小时)
完成时间为 5 小时

Data Leakages

Finally, in this module we will cover something very unique to data science competitions. That is, we will see examples how it is sometimes possible to get a top position in a competition with a very little machine learning, just by exploiting a data leakage. ...
Reading
3 个视频 (总计 26 分钟), 3 个阅读材料, 3 个测验
Video3 个视频
Leaderboard probing and examples of rare data leaks9分钟
Expedia challenge9分钟
Reading3 个阅读材料
Comments on quiz10分钟
Additional material and links10分钟
Final project advice #210分钟
Quiz1 个练习
Data leakages8分钟
3
完成时间(小时)
完成时间为 3 小时

Metrics Optimization

This week we will first study another component of the competitions: the evaluation metrics. We will recap the most prominent ones and then see, how we can efficiently optimize a metric given in a competition....
Reading
8 个视频 (总计 83 分钟), 3 个阅读材料, 2 个测验
Video8 个视频
Motivation8分钟
Regression metrics review I14分钟
Regression metrics review II8分钟
Classification metrics review20分钟
General approaches for metrics optimization6分钟
Regression metrics optimization10分钟
Classification metrics optimization I7分钟
Classification metrics optimization II6分钟
Reading3 个阅读材料
Week 3 overview10分钟
Comments on quiz10分钟
Additional material and links10分钟
Quiz2 个练习
Metrics12分钟
Metrics12分钟
完成时间(小时)
完成时间为 4 小时

Advanced Feature Engineering I

In this module we will study a very powerful technique for feature generation. It has a lot of names, but here we call it "mean encodings". We will see the intuition behind them, how to construct them, regularize and extend them. ...
Reading
3 个视频 (总计 27 分钟), 2 个阅读材料, 2 个测验
Video3 个视频
Regularization7分钟
Extensions and generalizations10分钟
Reading2 个阅读材料
Comments on quiz10分钟
Final project advice #310分钟
Quiz1 个练习
Mean encodings8分钟
4
完成时间(小时)
完成时间为 3 小时

Hyperparameter Optimization

In this module we will talk about hyperparameter optimization process. We will also have a special video with practical tips and tricks, recorded by four instructors....
Reading
6 个视频 (总计 86 分钟), 4 个阅读材料, 2 个测验
Video6 个视频
Hyperparameter tuning II12分钟
Hyperparameter tuning III13分钟
Practical guide16分钟
KazAnova's competition pipeline, part 118分钟
KazAnova's competition pipeline, part 217分钟
Reading4 个阅读材料
Week 4 overview10分钟
Comments on quiz10分钟
Additional material and links10分钟
Additional materials and links10分钟
Quiz2 个练习
Practice quiz6分钟
Graded quiz8分钟
完成时间(小时)
完成时间为 4 小时

Advanced feature engineering II

In this module we will learn about a few more advanced feature engineering techniques....
Reading
4 个视频 (总计 22 分钟), 2 个阅读材料, 2 个测验
Video4 个视频
Matrix factorizations6分钟
Feature Interactions5分钟
t-SNE5分钟
Reading2 个阅读材料
Comments on quiz10分钟
Additional Materials and Links10分钟
Quiz1 个练习
Graded Advanced Features II Quiz12分钟
完成时间(小时)
完成时间为 10 小时

Ensembling

Nowadays it is hard to find a competition won by a single model! Every winning solution incorporates ensembles of models. In this module we will talk about the main ensembling techniques in general, and, of course, how it is better to ensemble the models in practice. ...
Reading
8 个视频 (总计 92 分钟), 4 个阅读材料, 4 个测验
Video8 个视频
Bagging9分钟
Boosting16分钟
Stacking16分钟
StackNet14分钟
Ensembling Tips and Tricks14分钟
CatBoost 17分钟
CatBoost 27分钟
Reading4 个阅读材料
Validation schemes for 2-nd level models10分钟
Comments on quiz10分钟
Additional materials and links10分钟
Final project advice #410分钟
Quiz2 个练习
Ensembling8分钟
Ensembling12分钟
4.7
88 个审阅Chevron Right
职业方向

33%

完成这些课程后已开始新的职业生涯
工作福利

83%

通过此课程获得实实在在的工作福利

热门审阅

创建者 MSMar 29th 2018

Top Kagglers gently introduce one to Data Science Competitions. One will have a great chance to learn various tips and tricks and apply them in practice throughout the course. Highly recommended!

创建者 MMNov 10th 2017

This course is fantastic. It's chock full of practical information that is presented clearly and concisely. I would like to thank the team for sharing their knowledge so generously.

讲师

Avatar

Dmitry Ulyanov

Visiting lecturer
HSE Faculty of Computer Science
Avatar

Alexander Guschin

Visiting lecturer at HSE, Lecturer at MIPT
HSE Faculty of Computer Science
Avatar

Mikhail Trofimov

Visiting lecturer
HSE Faculty of Computer Science
Avatar

Dmitry Altukhov

Visiting lecturer
HSE Faculty of Computer Science
Avatar

Marios Michailidis

Research Data Scientist
H2O.ai

关于 National Research University Higher School of Economics

National Research University - Higher School of Economics (HSE) is one of the top research universities in Russia. Established in 1992 to promote new research and teaching in economics and related disciplines, it now offers programs at all levels of university education across an extraordinary range of fields of study including business, sociology, cultural studies, philosophy, political science, international relations, law, Asian studies, media and communications, IT, mathematics, engineering, and more. Learn more on www.hse.ru...

关于 Advanced Machine Learning 专项课程

This specialization gives an introduction to deep learning, reinforcement learning, natural language understanding, computer vision and Bayesian methods. Top Kaggle machine learning practitioners and CERN scientists will share their experience of solving real-world problems and help you to fill the gaps between theory and practice. Upon completion of 7 courses you will be able to apply modern machine learning methods in enterprise and understand the caveats of real-world data and settings....
Advanced Machine Learning

常见问题

  • 注册以便获得证书后,您将有权访问所有视频、测验和编程作业(如果适用)。只有在您的班次开课之后,才可以提交和审阅同学互评作业。如果您选择在不购买的情况下浏览课程,可能无法访问某些作业。

  • 您注册课程后,将有权访问专项课程中的所有课程,并且会在完成课程后获得证书。您的电子课程证书将添加到您的成就页中,您可以通过该页打印您的课程证书或将其添加到您的领英档案中。如果您只想阅读和查看课程内容,可以免费旁听课程。

还有其他问题吗?请访问 学生帮助中心