News from NorthEast SAS Users Group

September 2010

News and Technical Information

From the NorthEast SAS Users Group

In This Issue

Conference highlights

Calling all volunteers

Fantastic presentations

Technical Tip

Get the latest info

From our Conference Chair

The NESUG team has been rather quiet in public lately as we've been busy preparing for this year's conference. There's still lots to do but things are coming together nicely. We have a great line-up of formal and informal presentations and discussions, posters, workshops and social activities planned for Sunday through Wednesday, November 14 - 17. There should be something for every SAS user, whether you're new to the world of SAS or have been using it since the beginning of SAS, whether this is your first or your twenty-third NESUG.

I know that around this time of year, I get a bit of a back-to-school feeling and I'm especially hungry to learn something new. NESUG has satisfied that need for me for many years. While I may be too busy in Baltimore to pick up new SAS techniques myself this year, I hope that you will be able to participate in NESUG and take away new techniques to apply to your work.

Read on to learn about the very busy days of NESUG content we're planning. Check www.nesug.org for updates and schedule details as they become available. A detailed schedule of which presentations are happening at what time should be available early in October.

And remember that there will be some time for you to enjoy Baltimore outside of the conference hotel. The Harborplace area is great. There's some Tourist Information on the NESUG website or you can Visit Baltimore to learn more. Also, I know that many of our attendees know the Baltimore Harbor better than I do. Do you have restaurant or entertainment recommendations in the area for your fellow NESUGers? If so, please share them at our Facebook page.

See you in Baltimore!

Lisa Eckler
NESUG 2010 Conference Chair

Conference highlights

We'll start out on Sunday with some optional, extra fee events: We have 3 half-day workshops on Sunday plus an afternoon walking tour to Mount Vernon.

From then on, the presentations and activities are open to all registered attendees at no extra cost:

There's a Warm-Up Session on Sunday afternoon which is a great start for those who haven't been to NESUG before or haven't been for a while. This session will help you get oriented to what's happening during the conference and how to get the most benefit out of your next three days.
The Solutions Center will open on Sunday afternoon and will also be open on Monday and Tuesday and Wednesday morning.
The Opening Session will include dinner and a keynote address by Dave Thomas of New Marketing Labs, "Social Media: What SAS Pros Need to Know."
Breakout presentatons, posters and workshops, Code Clinic, Collaboration Cafe and access to SAS folks in the Solution Center will happen on Monday and Tuesday and until noon on Wednesday.
During the lunch break on Monday, join us for "Spotlight on SAS" to hear what's new and exciting at SAS.
We'll have sign-up sheets so that you can connect with like-minded attendees to have dinner on Monday at area restaurants if you're looking for company. (Dinner will be at your own expense.)
The NESUG Game Show will be back -- by popular demand -- on Monday evening.
On Tuesday evening, the NESUG Coffeehouse will provide an opportunity to relax with your fellow attendees and reflect on what you've learned so far.
At Closing Session, held at noon on Wednesday, you can hear about plans for NESUG 2011.

Remember:

You can take advantage of an Early Registration Discount for NESUG 2010 until September 25. Look for more information or register now.
Rooms at the Renaissance Harborplace will be available at a discounted rate if you reserve by October 14. Click here to reserve.

Calling all volunteers

NESUG is an all-volunteer organization so we always need your help.
We particularly need people to help with various operations tasks. If you are arriving early or are already in Baltimore, come over on Saturday and help us with the "Bagging Party." This is when we take all of our registration materials and stuff them into the tote bags that you get when you register. It's not rocket science but it's fun. Join the crowd, meet fellow attendees, and help us out.

Click here for more volunteer info and to sign up.

Fantastic presentations

Here are just a few of the great presentations we have scheduled:

To FREQ, Perchance to MEANS
Christopher J. Bost, MDRC

Put Off by Writing Tedious PUT Statements? Automate It! Let SAS® Do It for You
Dan Bretheim, Towers Watson

SAS Newbies! You Need to Learn SAS Graph! A Step by Step Guide to Stunning Bars, Charts and Pies
Jonathan Bartlett and George Obsekov, American College of Radiology

Introduction to Statistical Graphics Procedures
Selvaratnam Sridharma, Census Bureau

SAS® Code and Macros: How They Interact
Bruce F. Gilsen, Federal Reserve Board

Compact Approaches to Working with and Reporting on Missing Data
John H. King, Ouachita Clinical Data Services, Inc.

Mike S. Zdeb, U@Albany School of Public Health

How to Think through the SAS® DATA Step
Ian Whitlock, Whitlock Consulting

Click here for a full list of the nearly 200 presentations and workshops.

Technical Tip
Using Hash Tables to Flag Duplicates

This issue's technical tip comes from Ed Heaton. Ed is a Senior Consultant with our Newsletter advertiser, Data and Analytic Solutions.

We often need to think about duplicate keys in our data. This example will show you the most common way to identify duplicate keys and then another much more efficient way using hash tables.

Consider the National Drug Code (NDC). This is a composite key consisting of the Labeler Code, the Product Code, and the Package Code. Let's go through our dataset and flag all rows that have a duplicate key so we can then decide which we want to keep and what we want to do with the others. Or maybe we just want to find the median of some of the values in the duplicated rows and set an imputation flag. We have lots of options once we know which rows are duplicates.

Typically, we code something like the following to flag duplicate keys.

Proc sort data=DataWithPossibleDups out=_data_ ;

    By LblCode, ProdCode, PkgCode ;
Run ;
Data DataWithFlaggedDups ;
    Set _last_ ;
    By LblCode, ProdCode, PkgCode ;
    If not ( first.PkgCode and last.PkgCode ) then flg_dupNDC = 1 ;
Run ;

While this code is short and easy to understand, there are two problems:

The sort runs in O(n log n) time. That is, the run time increases exponentially with the number of rows we are sorting.
The order of our output dataset is different from the order of our input dataset. If we have to restore the order of our output dataset, we have another expensive sort. If the input dataset does not have the variables needed to restore the order, we will have to run the data through a DATA step first to create a sorting variable.

Here's another way. We can use hash tables. These are direct-lookup tables in memory that have methods to work with the tables. We will use the add() and check() methods. Yes, the code is longer and more complicated, but it runs in O(2n) time and we have not changed the order of our data.

Data DataWithFlaggedDups ;
/* In order to set the flg_dupNDC flag, we need two hash
    tables: one to check for duplicates and one to store the
    duplicate keys we find. */
    If 0 then set DataWithPossibleDups ;
    Declare hash distinctKeys() ;
    distinctKeys.defineKey( "LblCode" , "ProdCode" , "PkgCode" ) ;
    distinctKeys.defineDone() ;
    Declare hash dupKeys() ;
    dupKeys.defineKey( "LblCode" , "ProdCode" , "PkgCode" ) ;
    dupKeys.defineDone() ;
/* Now we need to load these two hash tables. */
    Do i=1to nObs ;
     /* Walk through our data file using the point= dataset option. */
        Set DataWithPossibleDups nObs=nObs point=i ;
     /* Load the key in the distinctKeys hash table and, if the key was
        already there, load it in the dupKeys hash table. */
        If ( distinctKeys.add() ne 0 ) then rc = dupKeys.add() ;
        Drop rc ;
    End ;
    Do i=1to nObs ;
     /* Walk through our data file again, using the point= dataset
        option. */
        Set DataWithPossibleDups nObs=nObs point=i ;
     /* Set flg_dupNDC to 1 if the key is in the dupKeys hash object.
        Since we have never returned to the top of the DATA step, we
        will need to explicitely set flg_dupNDC to missing if the key is
        not in the dupKeys hash table. */
        If dupKeys.check()
            then flg_dupNDC = . ;
            Else flg_dupNDC = 1 ;
        Output ;
    End ;
/* Don't forget to force a stop or you will be in an infinite loop! */
    Stop ;
Run ;

Now that we have flagged the duplicate keys, we can apply other business rules to determine which rows we want to keep.

Get the latest information

Remember to visit the NESUG website and check the NESUG Blog and Facebook page for the latest conference information.

NESUG Blog

NESUG Facebook page

Thanks to Our Newsletter Advertisers

Kforce Clinical Research provides customized, scalable outsourcing solutions. We offer flexible clinical services for all phases and therapeutic areas of site and study management, data services, pharmacovigilance and regulatory affairs. An intelligent partner, we support our clients as they focus on their core competency, the science of creating life-changing medicines.

Visit the Kforce website

Visit the DAS website

NorthEast SAS Users Group