Technical Tip
Here is a Technical Tip from Dr. Ron Cody, one of our busiest presenters. Ron was a Professor of Biostatistics at the Robert Wood Johnson Medical School in New Jersey for 26 years and is now a private consultant and writer. He has been a SAS user since the late 70's and is the author of several books with SAS Press. In Burlington Ron will be doing 2 pre-conference training classes as well as a paper presentation.
A Novel Way to Count Values
by Ron Cody
Let's look at a common task. You have a data set containing the responses to five questions (variables Q1-Q5). You would like to count the number of Y's and N's (ignoring case) for each subject in the questionnaire. A traditional approach to this problem is to create an array of the five variables, initialize two counters to zero (one to count the Y's and one to count the N's) and use a DO loop to cycle through the five questions, incrementing the appropriate counter depending on whether the response is a Y or an N. The program below demonstrates this approach:
*Data set Questionnaire containing Y and N responses to 5 questions;
data Questionnaire;
input Subj (Q1-Q5)(: $1.);
array Q[5];
Num_Y = 0;
Num_N = 0;
do i = 1 to 5;
if upcase(Q[i]) eq 'Y' then Num_Y + 1;
else if upcase(Q[i]) eq 'N' then Num_N + 1;
end;
drop i;
datalines;
1 y y n n y
2 N N n Y
3 N n n n n
4 y Y n N y
5 y y y y y
;
Here is a listing of the resulting data set:
Sub j | Q1 | Q2 | Q3 | Q4 | Q5 | Num_Y | Num_N | 1 | y | y | n | n | t | 3 | 2 | 2 | n | n | n | y | 3 | 1 | 3 | 4 | y | y | n | n | y | 3 | 2 | 5 | y | y | y | y | y | 5 | 0 |
This is a simple, straightforward program. However, using the COUNTC and CATS functions, you can greatly simplify the solution as shown next:
*Solution using the Count and Cats functions;
data Questionnaire;
input Subj (Q1-Q5)(: $1.);
Num_Y = countc(cats(of Q1-Q5),'Y','i');
Num_N = countc(cats(of Q1-Q5),'N','i');
datalines;
1 y y n n y
2 N N n Y
3 N n n n n
4 y Y n N y
5 y y y y y
;
The CATS function (I like to pronounce it Cat - S) concatenates each of the arguments after stripping off the leading and trailing blanks (the "S" stands for strip). You now have a string of Y's and N's. The first argument to the COUNTC function is the string you want to examine. The second argument is the character you want to count, the third (optional) argument is a modifier-the 'i' modifier says to ignore case. The result is a very quick and easy way to count the number of Y's and N's in the five questions.
Using the CATS function allows you to combine responses from multiple variables and convert them to a single string.
This tip is one of many that can be found in my latest book, Cody's Collection of Popular Programming Tasks and How to Tackle Them.
|