Mastering Work Intake: From Chaos to Predictable Delivery Chapter 8 shatters myths!  Once upon a time in a land far, far away, I believed a few of these myths. Ok, it was just about five years ago that I was dissuaded from the last of my misbeliefs. I believe that many of my misconceptions are founded in the collision of words between classical statistics and words Shewhart used for Process Behavior Charts (PBC). 

The first myth I had to unlearn was that when Shewhart uses the word sigma in Upper and Lower Natural Control Limits it is NOT the same thing as the global standard deviation (take that STDEV function in Excel). One of the reasons it is easy to confuse the two is that the standard deviation is represented by the Greek letter sigma. When I asked Alexa for the definition of sigma it reminded me that you should make how you are using the term clear. Many normal statistics users make assumptions and do not make how they using the term clear.

The second myth is that extreme observations must be removed from the sample. I went back to an analysis I did a bit over ten years ago and looked at the observations I had removed in a productivity study. Reviewing the context, one of the items was not comparable and should not have been included in the study. The extreme observation was in reality a signal; a signal that I blithely removed. There are several options for scenarios where you have “bad” data. The only time I now consider removal is if the observation is not derived from the process being studied and is not comparable. Would we remove the change in labor force participation caused by COVID-19 or recognize it as a signal? In the case of labor force participation rate, after the COVID recovery, it was time to reset the baseline rather than excluding the data. To quote Vacanti, “First, assuming you have enough data for the calculation of solid limits (see Chapter 5), then any value that is extreme enough to affect the average central line on your PBC would be a fairly clear indication of signal from context alone–whether limits were calculated or not.” 

The third myth I had to unlearn is the one I unlearned the longest time ago when I became suspicious of averages. I embraced medians on scatter plots rather than averages (also known as the mean) as an outcome of the first Actionable Agile book. The assumption that averages and standard deviations make is that the data is both homogeneous (independent and identically distributed) and normally distributed. XmR charts make NONE of those assumptions. Real life is rarely that well organized. PBCs make “no assumption about the underlying distribution of your data.” The assumption of normality is rarely correct when considering software-centric efforts. As an example, I grabbed the first data set I found on my laptop and plotted the data using a histogram plot and an X-Y Scatterplot.  

This is not a normal distribution.

In this case, making inferences based on the median or the average would probably not generate huge disparages. The choice of descriptive statistics should be done to best match the distribution of the data. Remember to understand the assumptions those statistics make or you will be making decisions based on bad statistical logic.

The final myth I had to unlearn was that drawing a line, curve, or other probability distribution through data is rarely useful for predicting behavior. Vacanti notes that this is known as Quetelet Fallacy. More on this fallacy in a later blog entry.  Quoting Vacanti (who is paraphrasing Wheeler), “Your data was not produced by a probability model. Your data was produced by a process.”

This might be the most consequential chapter in this book and certainly worth a second or third read.

Buy a copy and get reading – Actionable Agile Metrics Volume II, Advanced Topics in Predictability.  

Week 1: Re-read Logistics and Preface https://bit.ly/4adgxsC

Week 2: Wilt The Stilt and Definition of Variation https://bit.ly/4aldwGN

Week 3: Variation and Predictability  – https://bit.ly/3tAVWhq 

Week 4: Process Behavior Charts Part 1https://bit.ly/3Huainr

Week 5: Process Behavior Charts Part 2https://bit.ly/424O5Wc 

Week 6: How Much Data?https://bit.ly/47GVP24 

Week 7: Detecting Signalshttps://bit.ly/3SjwfdO