Product Development Systems & Solutions Inc.-click to homepage
News from PDSS Inc.
"Leading the Future in Product Development" 
October 2010- Vol 3, Issue 10
In This Issue
Re-Expression of Data--Getting it Straight!
Greetings!  
This month, Skip (President of PDSS Inc.) discovers a valuable lesson for practitioners of Critical Parameter Management (CPM) in, of all places, his son's high school Advanced Placement (AP) statistics text book! Read on for the whole story--and the lesson! 
-Carol
Re-Expression of Data--Getting it Straight! 
My son is a senior in high school this year and is taking 6 Advanced Placement (AP) courses. A heavy load for a senior, but the subjects are so good he found them hard to pass up. Much to my delight, one of the courses is AP Statistics! Those of you who know me personally know I am mildly passionate about the topic of applied statistics. His other math course is AP Calculus. I told him I would spend as much time as he wanted for any and all forms of help and discussion on his statistics and that his mother, an electrical engineer, would be handling the dy/dx stuff.

 

With that division of labor settled, I scurried off with his text book and read it cover-to-cover in one evening (Stats: Modeling the World 2nd Ed. (AP Edition), by Bock, Velleman & DeVeaux, Pearson / Addison Wesley, 2007). I did a lot of speed reading, but looked over every single page in the book. As I made my way through, I was reminded of all the great and familiar topics in a basic statistics course - ripping good stuff indeed! But when I came to Chapter 10, I was stopped dead in my tracks. The chapter was titled "The Re-expression of Data - Getting Things Straight!" With surprise and delight, I carefully read every word. Then I ran off to my book case and pulled out three different versions of university-level statistics texts (...yes I have three!). Not one of them had this material on the re-expression of data to "linearize" curved sample data so it could be easily modeled and explained by Y=mx+b.  I am fully capable of ordering up a dandy Box-Cox Transform, but that is just letting Minitab have all the fun behind the screen! One should really know, however, exactly what's going on behind the screen.

 

In easy-to-understand terms, this high school text explained all about creating scatter plots of X input vs. Y sample output data, conducting linear regression, fitting of linear models, the ability to see just how much of the data actually underwrite the integrity of the linear model by way of the Coefficient of Determination (R2), and the measurement of the residuals around the best fit line and their desirable property of random scatter depicting normality of their distribution. It just pulled all this great stuff together and set the stage to answer the question of what to do with sample data that is "bent" (curved).

 

Re-expressing sample data (Y variables) to make curved scatter plots reasonably straight and residuals that possess undesirable "smiley" patterns into random, normally distributed patterns is not an earth-shaking discovery. What was amazing about this chapter, in a stats book for 17 year-old Facebooking, texting "mathematicians", was its thorough coverage and integration of pretty much everything you would ever want to know about properly fitting curved data to linear models with a methodical precision and clarity that even Homer Simpson could understand!

 

I was pleased that kids in our U.S. high schools are being introduced to concepts that none of my 3 college-level statistics books contained. The following is an outline of this chapter, which also happens to be pertinent to all of us who practice Critical Parameter Management (CPM).

  1. The paradigm of how we express standard measures, such as miles per gallon (MPG) in the U.S. while in Europe they use Liters per 100 miles, is put under the microscope. These metrics of fuel efficiency are essentially the reciprocal of one another. If you plot MPG vs. Vehicle Weight you get a curved set of data and a distinct bend in the residuals. Now you have to know how to properly regress a non-linear model to explain this relationship. The text shows how the paradigms we use to express our concepts can make the data either curved or straight depending on which variable we place in the numerator and which is in the denominator. They show how curved data and straight line data could sometimes be just a consequence of how we express something such as fuel efficiency in relation to a car's weight. My son now says "Hey Dad, units matter!" It makes me smile like those doggone residuals!

 

  1. Set goals! 17 year-olds, who drink orange juice right out of the bottle, are setting goals!

Goal 1: Make the distribution of the dependent variable, Y more symmetric.

Goal 2: Make the spread of several groups of sample data more alike.

Goal 3: Make the form of a scatter plot of sample data more nearly linear.

Goal 4: Make the scatter of a scatter plot of sample data spread out evenly rather than following a fan shape.

 

  1. The Ladder of Powers is a step-by-step progression of re-expressions (transforms) we can apply to each curved data point to make them more and more linear as we try them out. If we over-do it we can stop and recognize that we have improved the straightness of the data and randomized the residuals to a point of reasonable normality. Then all we are left to do is tell everyone who cares what the transform term is that works this magic on our curved data. Some people think it is evil if you change data and don't tell them. The rungs of the Ladder of Powers are:

Rung 1: Take the Square of your curved data points

Rung 2: Take the Square Root of your curved data points

Rung 3: Take the Log10 or Lne of your curved data points (it makes no difference at all which)

Rung 4: Take -1/Square Root of your curved data points (the - sign preserves the direction)

Rung 5: Take -1/Y (Reciprocal) of your curved data points (the - sign preserves the direction)

Rung 6: Take -1/Square of your curved data points (the - sign preserves the direction)

Each step is followed with nice words of advice for what numerical conditions best suit this type of transform. Our texting teens are actually learning there are conditions and constraints in how one uses this unique form of numerical magic. Now my son's telling me what I can and cannot do with MY data. Sheesh.

 

  1. To top it all off, students get stern warnings on what to expect and what can go wrong and screw up all this fun:

a)   Don't expect your model to be perfect.

b)   Don't choose a model based upon R2 alone!

c)   Beware of multiple modes and watch out for scatter plots that turn around!

d)   This magic doesn't work on things that have inflections, minimums or maximums as curves change direction!

e)   Watch out for negative data values!

f)    Watch out for data values far from 1!

g)   Don't stray far from the Ladder of Powers. 

I could not have given my son this advice, but his stats teacher is very cool and he's sure I would like him. I want to show his teacher my fine set of catapults!

Now, getting back to CPM - these simple re-expression techniques will help you understand and explain your most dominant Critical Adjustment Parameters (CAPs) that you need to move your mean onto your desired targets. As my son would say "Dad, I'm glad we had this little talk."  

Hey, I'm getting smarter every day!

Is there a topic you'd like us to write about? Have a question? We appreciate your feedback and suggestions! Simply "reply-to" this email. Thank you!
 
Sincerely,
Carol Biesemeyer
Business Manager and Newsletter Editor
Product Development Systems & Solutions Inc.
About PDSS Inc.
Product Development Systems & Solutions (PDSS) Inc.  is a professional services firm dedicated to assisting companies that design and manufacture complex products.  We help our clients accelerate their organic growth and achieve sustainable competitive advantage through functional excellence in product development and product line management.
 
Copyright 2010, PDSS Inc.
Join Our Mailing List!
 
See PDSS Inc.'s Archived E-Newsletters