Analyze Text Data with Yellowbrick

4.4
67 个评分
提供方
Coursera Project Network
3,829 人已注册
在此指导项目中,您将:

Use visual diagnostic tools from Yellowbrick to steer your machine learning workflow

Vectorize text data using TF-IDF

Cluster documents using embedding techniques and appropriate metrics

Clock2 hours
Intermediate中级
Cloud无需下载
Video分屏视频
Comment Dots英语(English)
Laptop仅限桌面

Welcome to this project-based course on Analyzing Text Data with Yellowbrick. Tasks such as assessing document similarity, topic modelling and other text mining endeavors are predicated on the notion of "closeness" or "similarity" between documents. In this course, we define various distance metrics (e.g. Euclidean, Hamming, Cosine, Manhattan, etc) and understand their merits and shortcomings as they relate to document similarity. We will apply these metrics on documents within a specific corpus and visualize our results. By the end of this course, you will be able to confidently use visual diagnostic tools from Yellowbrick to steer your machine learning workflow, vectorize text data using TF-IDF, and cluster documents using embedding techniques and appropriate metrics. This course runs on Coursera's hands-on project platform called Rhyme. On Rhyme, you do projects in a hands-on manner in your browser. You will get instant access to pre-configured cloud desktops containing all of the software and data you need for the project. Everything is already set up directly in your internet browser so you can just focus on learning. For this project, you’ll get instant access to a cloud desktop with Python, Jupyter, Yellowbrick, and scikit-learn pre-installed. Notes: - You will be able to access the cloud desktop 5 times. However, you will be able to access instructions videos as many times as you want. - This course works best for learners who are based in the North America region. We’re currently working on providing the same experience in other regions.

您要培养的技能

Data ScienceNatural Language ProcessingMachine LearningPython ProgrammingData Visualization (DataViz)

分步进行学习

在与您的工作区一起在分屏中播放的视频中,您的授课教师将指导您完成每个步骤:

  1. Introduction and Loading the Corpus

  2. Vectorizing the Documents

  3. Clustering Similar Documents with Squared Euclidean Distance And Euclidean Distance

  4. Manhattan (aka “Taxicab” or “City Block”) Distance

  5. Bray Curtis Dissimilarity and Canberra Distance

  6. Cosine Distance

  7. What Metrics Not to Use

  8. Omitting Class Labels - Using KMeans Clustering

指导项目工作原理

您的工作空间就是浏览器中的云桌面,无需下载

在分屏视频中,您的授课教师会为您提供分步指导

审阅

来自ANALYZE TEXT DATA WITH YELLOWBRICK的热门评论

查看所有评论

常见问题

常见问题

还有其他问题吗?请访问 学生帮助中心