Parameter January 2009

Newsletter of the Chicago Chapter of the American Statistical Association

Volume 53 Number 5

January 2009

In this issue

January Luncheon

Chicago Chapter ASA Workshop Announcement

Did you know?

Interesting Article...

A Letter from the Editor- Family Dinner Conversation

Editor

Happy New Year!

The CCASA wishes you a successful 2009!

January Luncheon

Luncheon Announcement
Noon to 1:30PM
TUESDAY January 20th, 2009
The East Bank Club
500 N Kingsbury, Chicago 60610

Please join us for another exciting talk in the CCASA's 2008-2009 Luncheon program!

Our January speakers are Max Berniker and Ian Stevenson, who specialize in physical medicine and rehabilitation at Northwestern University. Their talk is entitled: People are very good at statistics-when they do not think they are actually doing statistics.

Abstract:
The world is complex and variable, and our perception of it is noisy. In a recent study we have formalized motor adaptation as the process of optimally inferring changes in the world and our bodies given our observed motor errors. In the first part of our talk, we will present results that demonstrate our approach. This model makes predictions that are consistent with a wide range of experimental data from numerous research groups. What's more, this approach offers a principled explanation for motor adaptation and generalization as the result of an inference strategy for the nervous system.

In the second half of our talk, we will briefly discuss how Bayesian statistics can be used to understand how the brain "works". Neurons communicate with each other using spikes of electrochemical activity (essentially point processes). Recently, we have been using Generalized Linear Models with regularization to understand what causes a neuron to spike and how neurons interact with one another. Inferential statistics is becoming an increasingly important tool in understanding these complex, high-dimensional systems.

The February luncheon, will be held on February 17, 2009, and the speaker will be Krishnan Saranthan from United Airlines. Dr Saranthan will present a talk on the use of statistical modeling in airline operations.

Plans for our future luncheons will be included in our upcoming announcements and in the Parameter. Lunch is $30 for CCASA members, $35 for nonmembers. Nonmembers, join the chapter for a year for only $15 and get the discount plus all the other benefits of membership!
As usual, the Lucille Derrick Fund will purchase a limited number of tickets for students who wish to attend. If you are a student and would like to take advantage of this offer, please register online below, and contact Lou Fogg, expressing your interest.

Click here to register online

For any questions or concerns, please contact:
Lou Fogg, VP for Luncheons
Phone: 312-942-6239 or E-mail: louis_fogg@rush. edu

Please Spread the Word!

Chicago Chapter ASA Workshop Announcement

Short Course on Longitudinal Data Analysis
Presenter:
Don Hedeker, University of Illinois-Chicago
Sponsored by the Chicago Chapter Of the American Statistical Organization.

Date: Friday, March 20th, 2009

Location: The University of Chicago's Booth School of Business

The Gleacher Center
450 North Cityfront Plaza Drive
Chicago, Illinois 60611-4316

Time: 8:30am-5pm

Course summary
The course will provide an introduction to longitudinal data analysis using mixed effects regression models, drawing on material from the book Longitudinal Data Analysis (Wiley, 2006) using the new SuperMix statistical software program (a 15-day trial edition of SuperMix is available at www.ssicentral.com/supermix/index.html). The focus will be on application of these models, with direct application illustrated using SuperMix.

In particular, the basic mixed-effects regression model for continuous outcomes will be introduced and described, including use of polynomials for expressing change across time, the multilevel representation of the mixed model, treatment of time-invariant and time-varying covariates, and modeling of the variance-covariance structure of the repeated measures.
It will be shown how these models can allow for missing data across time in terms of the outcome variable, thus permitting analysis of subjects who have incomplete data across time.
Finally, because categorical outcomes are common in many research areas, description and application of mixed-effects logistic regression models will also be covered.

Attendees are encouraged to download the trial software onto their laptops prior to course, and to bring their laptops with them to the course.

Registration information to follow.

Did you know?

As I'm sure many of you know, the use of R as the statistical tool of choice has grown dramatically in the past couple of years. The New York Times had an interesting piece in the business section last Wednesday touting the power of R. The article is below for your perusal.

Data Analysts Captivated by R's Power

By Ashlee Vance
Published: January 6, 2009
New York Times

To some people R is just the 18th letter of the alphabet. To others, it's the rating on racy movies, a measure of an attic's insulation or what pirates in movies say.

R is also the name of a popular programming language used by a growing number of data analysts inside corporations and academia. It is becoming their lingua franca partly because data mining has entered a golden age, whether being used to set ad prices, find new drugs more quickly or fine-tune financial models. Companies as diverse as Google, Pfizer, Merck, Bank of America, the InterContinental Hotels Group and Shell use it.

But R has also quickly found a following because statisticians, engineers and scientists without computer programming skills find it easy to use.

"R is really important to the point that it's hard to overvalue it," said Daryl Pregibon, a research scientist at Google, which uses the software widely. "It allows statisticians to do very intricate and complicated analyses without knowing the blood and guts of computing systems."

It is also free. R is an open-source program, and its popularity reflects a shift in the type of software used inside corporations. Open-source software is free for anyone to use and modify. I.B.M., Hewlett-Packard and Dell make billions of dollars a year selling servers that run the open-source Linux operating system, which competes with Windows from Microsoft. Most Web sites are displayed using an open-source application called Apache, and companies increasingly rely on the open-source MySQL database to store their critical information. Many people view the end results of all this technology via the Firefox Web browser, also open-source software.

R is similar to other programming languages, like C, Java and Perl, in that it helps people perform a wide variety of computing tasks by giving them access to various commands. For statisticians, however, R is particularly useful because it contains a number of built-in mechanisms for organizing data, running calculations on the information and creating graphical representations of data sets.

Some people familiar with R describe it as a supercharged version of Microsoft's Excel spreadsheet software that can help illuminate data trends more clearly than is possible by entering information into rows and columns.

What makes R so useful - and helps explain its quick acceptance - is that statisticians, engineers and scientists can improve the software's code or write variations for specific tasks. Packages written for R add advanced algorithms, colored and textured graphs and mining techniques to dig deeper into databases.

Close to 1,600 different packages reside on just one of the many Web sites devoted to R, and the number of packages has grown exponentially. One package, called BiodiversityR, offers a graphical interface aimed at making calculations of environmental trends easier

Another package, called Emu, analyzes speech patterns, while GenABEL is used to study the human genome.

The financial services community has demonstrated a particular affinity for R; dozens of packages exist for derivatives analysis alone.

"The great beauty of R is that you can modify it to do all sorts of things," said Hal Varian, chief economist at Google. "And you have a lot of prepackaged stuff that's already available, so you're standing on the shoulders of giants."

R first appeared in 1996, when the statistics professors Ross Ihaka and Robert Gentleman of the University of Auckland in New Zealand released the code as a free software package.

According to them, the notion of devising something like R sprang up during a hallway conversation. They both wanted technology better suited for their statistics students, who needed to analyze data and produce graphical models of the information. Most comparable software had been designed by computer scientists and proved hard to use.

Lacking deep computer science training, the professors considered their coding efforts more of an academic game than anything else. Nonetheless, starting in about 1991, they worked on R full time. "We were pretty much inseparable for five or six years," Mr. Gentleman said. "One person would do the typing and one person would do the thinking."

Some statisticians who took an early look at the software considered it rough around the edges. But despite its shortcomings, R immediately gained a following with people who saw the possibilities in customizing the free software.

John M. Chambers, a former Bell Labs researcher who is now a consulting professor of statistics at Stanford University, was an early champion. At Bell Labs, Mr. Chambers had helped develop S, another statistics software project, which was meant to give researchers of all stripes an accessible data analysis tool. It was, however, not an open-source project.

The software failed to generate broad interest and ultimately the rights to S ended up in the hands of Tibco Software. Now R is surpassing what Mr. Chambers had imagined possible with S.

"The diversity and excitement around what all of these people are doing is great," Mr. Chambers said.

While it is difficult to calculate exactly how many people use R, those most familiar with the software estimate that close to 250,000 people work with it regularly. The popularity of R at universities could threaten SAS Institute, the privately held business software company that specializes in data analysis software. SAS, with more than $2 billion in annual revenue, has been the preferred tool of scholars and corporate managers.

"R has really become the second language for people coming out of grad school now, and there's an amazing amount of code being written for it," said Max Kuhn, associate director of nonclinical statistics at Pfizer. "You can look on the SAS message boards and see there is a proportional downturn in traffic."

SAS says it has noticed R's rising popularity at universities, despite educational discounts on its own software, but it dismisses the technology as being of interest to a limited set of people working on very hard tasks.

"I think it addresses a niche market for high-end data analysts that want free, readily available code," said Anne H. Milley, director of technology product marketing at SAS. She adds, "We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet."

But while SAS plays down R's corporate appeal, companies like Google and Pfizer say they use the software for just about anything they can. Google, for example, taps R for help understanding trends in ad pricing and for illuminating patterns in the search data it collects. Pfizer has created customized packages for R to let its scientists manipulate their own data during nonclinical drug studies rather than send the information off to a statistician.

The co-creators of R express satisfaction that such companies profit from the fruits of their labor and that of hundreds of volunteers.

Mr. Ihaka continues to teach statistics at the University of Auckland and wants to create more advanced software. Mr. Gentleman is applying R-based software, called Bioconductor, in work he is doing on computational biology at the Fred Hutchinson Cancer Research Center in Seattle.

"R is a real demonstration of the power of collaboration, and I don't think you could construct something like this any other way," Mr. Ihaka said. "We could have chosen to be commercial, and we would have sold five copies of the software."

Interesting Article...

Here's another piece highlighting the importance of improving our math and science education to keep our country competitive. Thomas Friedman wrote in the New York Times Opinion Page on Sunday expressing his views on the upcoming $1 trillion economic stimulus package. It's an interesting read and once again emphasizes the critical need for quantitative professionals.

An excerpt:

"You see, even before the current financial crisis, we were already in a deep competitive hole - a long period in which too many people were making money from money, or money from flipping houses or hamburgers, and too few people were making money by making new stuff, with hard-earned science, math, biology and engineering skills.

The financial crisis just made the hole deeper, which is why our stimulus needs to be both big and smart, both financially and educationally stimulating. It needs to be able to produce not only more shovel-ready jobs and shovel-ready workers, but more Google-ready jobs and Windows-ready and knowledge-ready workers."

Click here to read the article in it's entirety.

A Letter from the Editor- Family Dinner Conversation

I recently posted on my blog, and wanted to share it with you all as well. Please share your comments. I am interested to hear your feedback!

Just a few nights ago, during our family dinner, the topic of the eighth-grade social order arose. It seems my 13-year-old twins, Jay and Becky, have a pretty clear understanding of where they (and all of their classmates) stand in the pecking order.

Becky said she resides somewhere in the middle -abercrombie jeans and Ugg boots are clear plusses, but being in the advanced math group lowers her overall score. Jay said he's on "the lower end" and Becky did nothing to dispute this or buttress her twin.

Jay has been known as a "math geek" since second grade. He is a terrible dresser, combs his hair once a month (whether it needs it or not), plays competitive chess and piano for the jazz band, is two grades ahead in math (where he is the top student, definitely a social blunder), and programs his calculator for fun. Apparently all that's keeping him from plummeting to the very bottom of the social heap are decent soccer skills and some talent in track.

Fortunately, I was armed with information from a timely Wall Street Journal piece called Doing the Math to Find Good Jobs, published the very day of our family discussion. The Journal reported that the best job in the U.S. is (drum roll, please) mathematician! In even more good news, two closely related fields came in second and third - actuary and statistician. These standings are based in part on favorable working conditions - an indoor environment free of toxic fumes, with no heavy lifting required. The quantitative sciences also score high in terms of pay, low stress levels (really?) and a good work-life balance.

I was able to reassure my "math geek" son that though it may seem like he's on the bottom social rung of eighth grade, with hard work and a little luck, his skills and talents will give him a quick elevator ride to the top of the job stratum as an adult. As Bill Gates once said: "Be nice to nerds. Chances are you'll end up working for one." He should know.

Let this reassure you as well, my analytical friends, and revel in your career choice!

My best wishes to you and yours for a healthy and prosperous 2009

Linda

Editor

Editor: Linda Burtch (312) 629-2400

PARAMETER, newsletter of the Chicago Chapter of the American Statistical Association, is published 10 times a year as a service to its members. To submit material for publication, contact the Editor, Linda Burtch, email: lburtch@smithhanley.com

PARAMETER provides a job listing service by publishing Positions Available and Positions Wanted, the latter being free to Chapter members. Companies may list positions for $75. Contact the Editor for more information.

For additional information about Chicago Chapter ASA, please visit us on the web at: www.ChicagoASA.org Also, visit the National ASA web site www.amstat.org.

Email change of address to: suzanne.niemi@walgreens.com

email: newsletter@chicagoasa.org

web: http://www.chicagoasa.org