Quantitative Text Analysis and Textual Similarity in R

提供方
Coursera Project Network
在此指导项目中,您将:

Tokenize the dataset and convert the data into a document feature matrix Calculate cosine similarity across documents and plot the output

Clock1 hour
Beginner初级
Cloud无需下载
Video分屏视频
Comment Dots英语(English)
Laptop仅限桌面

By the end of this project, you will learn about the concept of document similarity in textual analysis in R. You will know how to load and pre-process a data set of text documents by converting the data set into a corpus and document feature matrix. You will know how to calculate the cosine similarity between documents and explore and plot the output of your calculation.

您要培养的技能

  • cosine similarity
  • Text Analysis
  • Document Similarity
  • Data Visualization (DataViz)
  • Text Corpus

分步进行学习

在与您的工作区一起在分屏中播放的视频中,您的授课教师将指导您完成每个步骤:

  1. Load textual data into R and turn it into a corpus object and understand the concept of calculating document similarity in textual analysis

  2. Extract meta-data from text document filenames and subset the data frame to exclude unwanted data

  3. Tokenize and clean the dataset and convert the data into a document feature matrix

  4. Calculate cosine similarity across documents and plot the output

指导项目工作原理

您的工作空间就是浏览器中的云桌面,无需下载

在分屏视频中,您的授课教师会为您提供分步指导

常见问题

常见问题

还有其他问题吗?请访问 学生帮助中心