[MUSIC] Hi, in this module, I want to introduce you to some new sources of data that are getting a lot of attention, and are becoming increasingly important in contemporary social science research. The first is online interactions. Interactions on the internet have become an important source of data for social research. The data on connections and interactions is particularly rich. So if you think about a social media site such as Twitter or. You have information on who people follow, whose tweets they forward to other people. If you're thinking about personal dating sites, then you have information on interactions in the form of views of people's profiles, efforts to contact them, and so forth. These data are also voluminous. So when you think about a social media site like Twitter or a dating website. Then you may be talking about millions of transactions or interactions over the course of a few months, accumulating very rapidly. The main limitation, however, of such online interaction data is that its relationship to the real world is not clear. So as anyone who has spent any time at all on the Internet will tell you, the way that people present themselves on the Internet and the way they interact with other people, may be very different from the way that they present themselves in every day life and interact with people. Think of the columns section of a major newspaper, and how different those people might be if you met them in real life. But I'll give you some examples of how data from online interactions are being used. One is that data from online social media sites and so forth are being used in studies of networks. People are looking at data from Twitter and other sites to understand the structure of social networks at least on the Internet. And examine how these social networks evolve and change over time. What are the national boundaries of social networks? What are the regional concentrations of social networks within countries? What are the social boundaries between different groups on the Internet? Another area that's been getting a lot of attention is the study of relationships. So there are people making use of data from online dating sites to examine things about, for example, what people look for in a partner? How do they present themselves? And then if two people meet each other at a website, what factors influence the chances that their relationship progresses or ends? A big area in terms of online interaction analysis is looking at consumption behavior. So a lot of the major shopping sites make use of the fact that they can carry out experiments where they change the appearance of their website so they can change prices and so forth. To see what sort of factors affects the chances that the consumer will make a purchase or not make a purchase. They can change the web colors of a website, they can change the format of the advertisements and so forth, and then see if they actually affect people's consumption behavior. Finally there are some very interesting studies recently of censorship. So there are certain governments that routinely censor posts on social media sites. People at various universities have collected these posts and analyzed which ones are most likely to be censored. Thereby gaining insight into the priorities that governments have when they are making decisions about what sorts of discussion to allow online, and what sorts of discussions to prevent. Another growing area, actually an older area within social science, but one that's received new attention and had new life breathed into it by advances in technology is content analysis. Advances in technology now make it easier to conduct quantitative content analysis. I briefly introduced qualitative content analysis previously. But there is a comparable approach, generally known as quantitative content analysis. Where people look at texts that may include media, newspapers, books, publications, and visual and audio content. And now we are in a situation where the mass digitization of such content and increases in computing power that allow us to sweep through such content very quickly, make it possible to analyze very large corpuses of such materials. Web content can also be analyzed. So common approaches in this area include the examination of trends, categories, and other patterns in the usage and the content of specific terms in various media, whether it's online or in print. Let me give you some examples. One of the big areas in terms of content analysis these days is what is referred to as sentiment analysis. This is especially important knowledge for studying online traffic, and this has a lot of commercial applications. So brands, companies and so forth, are very interested in understanding the tone of the discussion when their product, their brand is being discussed online. Political candidates, similarly, are interested in looking at online traffic to figure out whether the discussion is positive or negative. Representation, so there's a lot of interest in studies where people look at especially television and movies to look at how particular groups are represented in those movies. So people look at the representation of people of color, African-Americans, Asian-Americans, other groups in American TV shows and movies, and generally show that such groups are under-represented in such movies. This is a basic form of content analysis, looking at the representation of particular groups, particular ideas and so forth, in various forms of media. Finally, there's the overall question of emphasis when we think about a online discussion or some form of media. In the news, the TV, there's a lot of interest in broadly characterizing the basic points, the basic foci of the discussion. When people are talking about, for example, medical care, what are the themes of the discussions when they talk about medical care. This is distinct from our discussion of sentiment earlier. Another growing source of data for social science research is administrative and business records. This is distinct from the administrative and archival data that I talked about earlier. Essentially, routine interactions between individuals and government agencies and businesses, now generate enormous amounts of data on a daily basis. I'll give you some examples in just a moment. The resulting records differ from the administrative microdata discussed earlier, in that they are the byproduct of other interactions. So the administrative and archival microdata that we talked about earlier, was produced deliberately for record keeping purposes by bureaucracies that were seeking to monitor or control populations. In this case, we have material that is simply being generated as a byproduct of various kinds of interactions, between individuals and government agencies and business. We generally refer to these data as digital traces, that is the traces that are left behind as we interact in an online or electronic world. When we link these data, and again I'll show you some examples in just a second. When we link these data to more detailed data on individuals that might be available in administrative or archival databases, or survey data, then they become especially useful. So let me give you some examples. Visits to websites. So everyday, probably we visit many websites, on our phone, on our computers, and so forth. And we move from one website to another. Obviously, there's a lot of commercial activity that involves harvesting the data on our patterns of visiting websites, and then hopefully, on the point of view of marketers, linking that to other information about potential consumers to help market products to them. Again, this all comes out of what we call these digital traces, these tiny records that are left behind when we go from one website to another. Mobile phone usage, whenever you turn on your phone. You're leaving a trail behind when you move from one cellphone tower to another, you make a call, you send data. Those are all generating records that remain behind on the computers, the servers, various service providers. That data can be analyzed, and in fact in some cases it already is being analyzed for all sorts of purposes. People can study traffic on highways by looking at the mobile phones that are moving along the highway, changing from one tower to another. Utilization of government services. So when people interact with a bureaucracy, whether it's healthcare, education and so forth. Again that leaves behind various traces that can be collected, analyzed and linked to other data. And finally, of course, there are commercial transactions. When you, make a purchase, especially a purchase with something like a credit card or a debit card, or you use a membership card. That data can all be linked together, and perhaps linked to other data about you, to construct a profile of your purchasing habits. And then obviously that data might be very useful to marketers. But there are sometimes also some academic research related to that as well. So overall, there's a lot of exciting new developments in the world of data for social science research, much of it related to the rise of our online world. And going forward in the next few decades, there'll be a lot of opportunities to make use of these data for you to conduct social science research.