Loading...

1.2 Hadoop Framework

Course video 3 of 29

In this module, we will learn about the MapReduce paradigm, and how it can be used to write distributed programs that analyze data represented as key-value pairs. A MapReduce program is defined via user-specified map and reduce functions, and we will learn how to write such programs in the Apache Hadoop and Spark projects. TheMapReduce paradigm can be used to express a wide range of parallel algorithms. One example that we will study is computation of the TermFrequency – Inverse Document Frequency (TF-IDF) statistic used in document mining; this algorithm uses a fixed (non-iterative) number of map and reduce operations. Another MapReduce example that we will study is parallelization of the PageRank algorithm. This algorithm is an example of iterative MapReduce computations, and is also the focus of the mini-project associated with this module.

关于 Coursera

课程、专项课程和在线学位均由全世界一流大学和教育机构的顶尖授课教师教授。

Community
Join a community of 40 million learners from around the world
Certificate
Earn a skill-based course certificate to apply your knowledge
Career
Gain confidence in your skills and further your career