Working with Big Data

提供方
Coursera Project Network
在此指导项目中,您将:

Process a large dataset from NOAA showing hourly precipitation rates for a ten year period from the state of Wisconsin

Clock2 hours
Intermediate中级
Cloud无需下载
Video分屏视频
Comment Dots英语(English)
Laptop仅限桌面

By the end of this project, you will set up an environment for Big Data Development using Visual Studio Code, MongoDB and Apache Spark. You will then use the environment to process a large dataset from NOAA showing hourly precipitation rates for a ten year period from the state of Wisconsin. MongoDB is a widely used NoSQL database well suited for very large datasets or Big Data. It is highly scalable and adaptable as well. Apache Spark is used for efficient in-memory processing of Big Data.

您要培养的技能

  • PySpark Queries
  • Mongodb
  • Python Programming
  • Big Data
  • PySpark

分步进行学习

在与您的工作区一起在分屏中播放的视频中,您的授课教师将指导您完成每个步骤:

  1. Set up Apache Spark and MongoDB Environment.

  2. Create a Python PySpark program to read CSV data.

  3. Use Spark SQL to query in-memory data.

  4. Configure Apache Spark to connect to MongoDB.

  5. Persist data using Spark and MongoDB.

指导项目工作原理

您的工作空间就是浏览器中的云桌面,无需下载

在分屏视频中,您的授课教师会为您提供分步指导

常见问题

常见问题

还有其他问题吗?请访问 学生帮助中心