One of the later steps in data management is evaluating whether you might want to create secondary variables. Secondary variables are variables that include information from two or more primary variables. We can create secondary variables by using a mathematical or logical operation on two or more variables. In this case we want to know the number of cigarettes smoked per month. We know the number of days smoked each month and a new variable, USFREQMO gives us an estimate of the quantity smoked. If we want to estimate the total number of cigarettes that participants smoked per month, it would make sense to multiply these two variables and get a product that represents the number of cigarettes per day times the number of days smoked per month. Let's call this new variable, NUMCIGMO_EST, which stands for the estimate of the number of cigarettes smoked per month. So the syntax would be NUMCIGMO_EST = USFREQMO*S3AQ3C1;. If we add this line of code to the program and ask SAS to show us the frequency distribution. The frequency distribution for this new variable NUMCIGMO_EST is a quantitative variable that ranges from 1 to 2,940 with 9 missing observations. So how can we check to make sure that this new secondary variable was created as we intended? For this, we're going to learn a new SAS procedure. The procedure or proc is called print, and it allows us to view data for each observation individually, according to whichever variables we'd like to see. So after the data step and after the PROC SORT; statement, in the section of the program where we ask SAS for specific results or output we're going to add the following syntax. PROC PRINT;VAR indicating variable, followed by a list of variables that you would like to examine for each observation or individual. Since we want to see if the new variable, number of cigarettes smoked per month, is indeed the product of multiplying frequency per month and number of cigarettes per day. We include the following variables, USFREQMO S3AQ3C1 and the new secondary variable NUMCIGMO_EST. As always, in the statement with the semicolon, and after saving and running the program, first check the log for errors, and then view the results. For the print procedure, the output looks a bit different. The rows represent individual observations similar to the data set itself, and the columns show the values for the specific variables. Looking over these values for a handful of individuals, we can see that in fact, our new secondary variable does indeed show the value of number of days smoked per month, times number of cigarettes smoked per day. Remember that whenever you're conducting data management, it's important to find a way to check for errors at each step of the process. >> So what if you want to combine more than two variables? A good example of this would be creating a single secondary variable to characterize ethnicity from a number of separate primary variables available in the Add Health data. >> In Add Health, race or ethnicity are measured by a series of questions coded one if yes and zero if no. Since adolescents in this sample could have indicated more than one race or ethnicity, we could decide to characterize those adolescents who indicate multiple racial or ethnic groups separately from those who'd be characterized with a single ethnicity. To accomplish this, the first thing to do is sum the variables to get a new variable, which we'll call NUMETHNIC. That indicates the number of race or ethnicity variables that were endorsed. The syntax would include the new variables name =SUM (of, and then each of the variables listed with only a space between them followed by a closed parens and a semi colon. Next we add logic statements that create a single secondary variable that characterizes each adolescent's ethnicity. So if NUMETHNIC GE 2 then the final ethnicity variable which is called ETHNICITY, is equal to 1, indicating multiple racial or ethnic groups endorsed. Else, if H1GI4=1, THEN ETHNICITY=2; which means only Hispanic or Latino ethnicity was endorsed. Else if H1GI6A=1 THEN ETHNICITY=3; only black or African American ethnicity was endorsed. Else, if H1GI6B=1 THEN ETHNICITY=4; only American Indian or Native American. Else, if H1GI6C=1 THEN ETHNICITY=5; Asian or Pacific Islander only. Else, if H1GI6D=1 THEN ETHNICITY=6; for white ethnicity only. When we add the syntax for data management of ethnicity, request a print as well as frequency tables for the primary and secondary ethnicity variables, and save and RUN the program. Here's what we get. We have our print output which shows the values of the primary and secondary ethnicity variables for each observation. In this case, each adolescent. At the end of the results we have frequency tables for each of these variables. This program is available as text below the screen. We suggest you copy this text and paste it into a SAS program window, and select RUN. Examine the output and consider how this kind of data management might be useful in answering your question with your data set. This example should help you make the needed connections between data management decisions, and syntax.