Name: Getting and Cleaning Data
Brand: Johns Hopkins University
Availability: OnlineOnly
Rating: 4.5 (1348 reviews)

Back to Getting and Cleaning Data

Learner Reviews & Feedback for Getting and Cleaning Data by Johns Hopkins University

4.5

stars

8,048 ratings

About the Course

Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data....

Top reviews

May 2, 2020

This course provides an introduction of some important concepts and tools on a very important aspect of data science: cleaning and organizing data before any analysis. A must for any data scientist.

Feb 1, 2016

Easy, mostly instructive Course. The Assignments and quizzes are quite good, and illustrates the lessons very well.

See the videos for general presentation, but use the energy on the excersizes.

Filter by:

51 - 75 of 1,307 Reviews for Getting and Cleaning Data

By Paul Y

•

May 13, 2022

Once again, the projects are way beyond the skill level or tasking in the previous lessons.

The lectures are ALL powerpoint; the instructor does not open R.

The lectures are based on R as it was 10-15 years ago, when it was 3.0. Nobody at Hopkins can update the lectures, or do an errata sheet?

I can't believe my company is spending money for this.

By Mohamed A

•

Mar 23, 2021

It was difficult to understand and no enough exercises, in addition, the questions are very hard to answer and need a lot of digging to find the correct way to answer, personally, I am not happy with this course and not intend to continue the DS program because materials are old and not reflect the exam questions.

By Christian B

•

Nov 8, 2019

No idea what they want for the project and the discussion forum is clogged with people asking for peer reviews. The previous courses at least provided you with a understanding of what the final product should be, in this case it's make tidy data, but with no idea on how that data should look.

By Greg Y

•

Apr 28, 2022

This course is terrible. Outdated videos (recorded in 2013!), impossible quiz questions, installing huge numbers of packages into R with no real support or feedback. Terrible waste of time.

By Najib B

•

Mar 25, 2021

Online course design at its worst.

By Ramalakshmanan S P

•

Feb 23, 2016

Thanks for this wonderful session on Getting and Cleaning Data. I would like to convey my sincere thanks to Professors Roger D. Pend, Brian D. Caffo and Jeff Leek and my fellow learners for their excellent help in completing the projet to generate Tidy dataset. I would like to name Mr. Luis Sandino for his help and effort in putting a help Guide for this assignment. I follwed it and got the assignment completed. The step by step procedure helped me and other fellow learnerrs to complete the assignment on time.

Though this course is over, still we have the doubt on the dimension of the tidy dataset, whether it is 180 by 68 or 180 by 88 as the total number of "mean' variables considered are varying. Request mentors or TAs to help us arrive at the correct dimension and help us understand the reason behind the same.

This course has witnessed the need for support from TAs and mentors. Their help and support was very valuable in understanding the subject.

Thanks to Coursera, my Professors, mentors and TAs of this course for their insight, guidance, support and effort.

Wishing Coursera and Professors all the best and Success.

The SWIRL component for learning the subject is the best and wish SWIRL support for all the heavy courses. Special thanks to those who made SWIRL course material possible for Data Scientisit's toolbox.

With Best Wishes,

S. Ramalakshmanan

By Carlos C

•

Sep 10, 2020

Excellent course to build upon the knowledge from the "R Programming" course. Learning to use functions from the Tidyverse packages is an essential tool if you want to learn Data Science in R. In my opinion, most of the time these are stronger and easier to use compared to Pandas, Numpy, etc., from Python. Despite the bad reviews at the top with lots of upvotes, I do think this was a great course overall. People tend to complain and don't assume responsibility to work and find solutions if they don't understand something. My humble advice is that, if you wish to immerse in the Data Science field, you should accustom yourself to researching a lot, going to other forums like StackOverflow if an error appears, etc. Thanks to Jeff Leek, Roger Peng and the others from Johns Hopkins University!

By Pouria E T

•

Dec 20, 2016

Thank you for giving me opportunity to learn. These material (or this class) would have been super difficult, if it was taught through the same traditional channels based on my academical experiences. Yet, the materials were presented in such an amazing way that I wasn't taken over by the difficulty of the presented subjects, rather I was getting more focused to learn more and to be challenged. Thank you for letting me get 3 free online certificates. It means a lot to me and it has given me hope through this difficult time. I feel accomplished. It's a great feeling and it the best and the only gift that I have received and would probably receive this holiday.

By Alfonso R R

•

Dec 8, 2016

I learned so much of R with this course. Thanks Johns Hopkins. Thanks Coursera.

The course final project was so challenging that made research R tools I did know they existed. Such as generating MD files from RMD markdown notebooks, so I could mix live code with text. That's how I produced my CodeBook.md. Then I learned that there are a bunch of libraries for pretty-printing tables. I discovered even more about dplyr. And also learned how to return multiple objects from a function.

You can really write papers with all these tools in R and getting expertise about knitr and pandoc.

Thank you Jeff and team for putting together such a quality course.

By David B

•

Feb 27, 2017

Before taking "Getting and Cleaning Data", I had no prior R programming experience aside from completing the R programming course in the data science specialization on Coursera. I found this course to be challenging and that it covered quite a bit of ground in terms of the "getting data" more so than the cleaning data. After completing this course, I feel like I learned quite a bit more R programming and the basic knowledge for obtaining data from a variety of sources/formats and cleaning it up to make it look nice and tidy. Overall, I rate this course very positively!

By Óscar V

•

Dec 1, 2015

The course is great and useful. In my personal experience, this course were so important as R programming course, since on this course one get the essence of R and the hardest process when deal whith real cases. I could see that the videos has ensured about velocity and audibility; when I took it, it was difficult to heard and has a so high velocity.

I want contribute as beta tester, and will try to follow all the course, at my own pace giving feedback in thankful to you for the opportunity you gave me to learn free.

By Joe D

•

Apr 15, 2019

Forums! Use the forums. Read them before you start the week's lectures because they often include pinned topics that correct minor errors like broken links and outdated commands, as well as interesting and thoughtful supplementary material. Overall this was a very enjoyable course, Dr. Leek's lectures are straightforward and full of useful examples. I learned about just some of the power of "The Tidyverse" through this course and I'm very grateful for that.

By Chetan T

•

Oct 19, 2018

The journey through the entire course was quite exceptional for me. It was great to hone the skills of programming and especially in this digital world where data is key for every analysis, inference, prediction and what not! When everyone looks at neat and tidy data that one can rely on, it is extremely important to understand and know the finer nuances of what it takes to get a nice and efficient dataset and that is the essence of this course.

By Whitchurch R

•

Jan 29, 2020

This was an awesome course.

I really liked the final project.

Especially creating a Codebook as well as tidying up the data.

I feel I went too much in-depth into creating the codebook as well as the readme file. But in hindsight it was totally worth it.

My advice to future learners. push yourselves to the limit when doing the final project. You will definitely learn much much more by putting in 110% into these hard projects.

By Alexis C

•

Aug 11, 2017

Did not like this class when I was taking it, but now (just completed course 7) I realize how very important this class is. "Messy data" use to sound like a buzz phrase to me that people used when they could not generate valuable insights from data made available. Now I realize that that the base R functions and packages highlighted in this class are extremely useful when you need to clean up data in a reproducible way.

By José A R N

•

Oct 20, 2016

My name is Jose Antonio from Brazil. I am looking for a new Data Scientist career.

Please, take a look at my LinkedIn profile: https://www.linkedin.com/in/joseantonio11

I did this course to get new knowledge about Data Science and better understand the technology and your practical applications.

The course was excellent and the classes well taught by teachers.

Congratulations to Coursera team and Instructors.

By Yusuf E

•

Dec 14, 2017

The level of difficulty of this course is on par with R Programming. For the first time in the specialization you will find yourself scouring the forum for tips and suggestions on how to proceed when you get stuck in the quizzes. Fortunately, the mentors are really helpful when it comes to answering questions or clearing obscurities. I really liked this course, in fact much more than R Programming.

By Antonios D

•

Nov 14, 2016

This course it's a great job! There is too much information in here and a great amount of knowledge. I would like to say that in my point of view the current lesson should be updated in more different data sets examples that gives the students the opportunity of learning different kind of ways to manypulate some data. There are some standard ways so it would be great if you expand this.

By Chris B

•

Nov 22, 2016

It is sometimes daunting and difficult, but now I do understand so much more about downloading files from remote sites and getting them ready for analysis. What I should have done is look to the final project so as get a better understanding of what the project entailed. I also should have done more work replicating the code used in the lessons so as to appreciate how it worked.

By Debayan D

•

Jul 25, 2017

The Course Project was daunting at first, but I reviewed my notes over and over again, tried reading from the site where the raw data was made available and constructed images of how the TIDY data should look like. This is a very important course in this specialization. The course has given me an abstract sense of what to expect and what to do while cleaning data.

By Li G

•

Jan 12, 2017

Very helpful and pragmatic.

This course gives a general idea on how to get and clean data in r, and specifically taught me how to use "dplyr" and "tidyr".

The assignment is very helpful, too. It forced me to use the knowledge I learned in this course, might be a little bit of hard for a beginner though. Nevertheless, you can still achieve a 100% score!

By Zhiming

•

Sep 15, 2017

I am very happy to go through this subject not because of the certification but I learned the steps to import and clean the data. Although this subject is no rocket science, a lot of the data available on the web will require the knowledge that I learned in this subject to enhance the integrity of the data that anyone can download from the web.

By Anthony S

•

Nov 2, 2016

Learned a lot! I have now dedicated more time to becoming a data analyst, and eventually a data scientist. The materials used in the videos were helpful and current (for me at least, 30 years young). I have started doing more learning on the kaggle platform as well as doing some hands-on Hadoop related training. Thanks to the professors!

By Carlos A M S

•

Oct 19, 2017

This course is fantastic! Through it was possible concretely to apply the concepts of BigData through the tool proposed for the course. Due to various difficulties I had to leave. But I'm coming back with all my might. Congratulations to all teachers who make no effort to pass on knowledge in a substantial and substantial way.

By Rodney J

•

Jun 5, 2017

This is a terrific course on obtaining data from various sources and then cleaning the raw data obtained to form useful tidy data sets. The course material learned is reinforced using a very interesting peer-reviewed project based on accelerometer and gyroscopic data from collected from typical human activity.