Important note: The second assignment in this course covers the topic of Graph Analysis in the Cloud, in which you will use Elastic MapReduce and the Pig language to perform graph analysis over a moderately large dataset, about 600GB. In order to complete this assignment, you will need to make use of Amazon Web Services (AWS). Amazon has generously offered to provide up to $50 in free AWS credit to each learner in this course to allow you to complete the assignment. Further details regarding the process of receiving this credit are available in the welcome message for the course, as well as in the assignment itself. Please note that Amazon, University of Washington, and Coursera cannot reimburse you for any charges if you exhaust your credit.
提供方
课程信息
学生职业成果
67%
60%
33%
学生职业成果
67%
60%
33%
提供方

华盛顿大学
Founded in 1861, the University of Washington is one of the oldest state-supported institutions of higher education on the West Coast and is one of the preeminent research universities in the world.
教学大纲 - 您将从这门课程中学到什么
Visualization
Statistical inferences from large, heterogeneous, and noisy datasets are useless if you can't communicate them to your colleagues, your customers, your management and other stakeholders. Learn the fundamental concepts behind information visualization, an increasingly critical field of research and increasingly important skillset for data scientists. This module is taught by Cecilia Aragon, faculty in the Human Centered Design and Engineering Department.
Privacy and Ethics
Big Data has become closely linked to issues of privacy and ethics: As the limits on what we *can* do with data continue to evaporate, the question of what we *should* do with data becomes paramount. Motivated in the context of case studies, you will learn the core principles of codes of conduct for data science and statistical analysis. You will learn the limits of current theory on protecting privacy while still permitting useful statistical analysis.
Reproducibility and Cloud Computing
Science is facing a credibility crisis due to unreliable reproducibility, and as research becomes increasingly computational, the problem seems to be paradoxically getting worse. But reproducibility is not just for academics: Data scientists who cannot share, explain, and defend their methods for others to build on are dangerous. In this module, you will explore the importance of reproducible research and how cloud computing is offering new mechanisms for sharing code, data, environments, and even costs that are critical for practical reproducibility.
审阅
来自COMMUNICATING DATA SCIENCE RESULTS的热门评论
Great and useful first week about visualization, although I wish it would cover more material . The ethics and cloud computing felt somewhat incomplete, but useful as well.
The information from the last assignment is split into Forums and Tasks description. This is very easy to fix and not doing it shows passivity from the organizers
Too little people participated and long peer review time. But the course content is good.
关于 大规模数据科学 专项课程
Learn scalable data management, evaluate big data technologies, and design effective visualizations.

常见问题
我什么时候能够访问课程视频和作业?
我订阅此专项课程后会得到什么?
Is financial aid available?
完成课程后,我会获得大学学分吗?
还有其他问题吗?请访问 学生帮助中心。