Log-Log Plotting: Making Curves into Lines
The learning curve equation Y = T1 × Xb is nonlinear in its natural form. Plotting raw hours (Y-axis) against unit number (X-axis) produces a curve that is difficult to assess visually. The solution is a logarithmic transformation of both axes.
Taking the natural log of both sides: ln(Y) = ln(T1) + b × ln(X). This is the equation of a straight line where ln(T1) is the y-intercept and b is the slope. On log-log paper, a perfect learning curve is a straight line with a negative slope.
Given data: T1 = 10,000 hours, 85% learning rate (b = –0.2345)
| Unit (X) | Hours (Y) | ln(X) | ln(Y) |
|---|---|---|---|
| 1 | 10,000 | 0.000 | 9.210 |
| 5 | 6,931 | 1.609 | 8.844 |
| 10 | 5,828 | 2.303 | 8.671 |
| 25 | 4,575 | 3.219 | 8.428 |
| 50 | 3,848 | 3.912 | 8.255 |
| 100 | 3,237 | 4.605 | 8.083 |
The ln(Y) values decrease linearly with ln(X). The slope is –0.2345, which is exactly b. The y-intercept is 9.210, which is ln(10,000) = ln(T1). On a log-log plot, these six points fall on a perfectly straight line.
Linear Regression on Transformed Data
Once the data is log-transformed, ordinary least squares (OLS) regression finds the best-fit line through the points. The regression output gives you two parameters:
| Parameter | Regression Output | Learning Curve Meaning | How to Convert |
|---|---|---|---|
| Intercept (a) | ln(T1) | Theoretical first unit hours | T1 = ea |
| Slope (b) | Learning exponent | Rate of improvement | Learning rate = 2b |
Given: 30 completed units with actual touch labor hours. After log-transforming and running OLS regression:
- Intercept (a) = 9.152 → T1 = e9.152 = 9,420 hours
- Slope (b) = –0.2107 → Learning rate = 2–0.2107 = 86.4%
- R² = 0.91
- Standard error of b = 0.018
Interpretation: The data supports an 86.4% unit learning curve with T1 of 9,420 hours. The R² of 0.91 indicates a strong fit. The standard error of the slope means the 95% confidence interval for the learning rate is approximately 83.9% to 88.8%.
⚠️ Regression on Logs ≠ Regression on Raw Data
OLS regression on log-transformed data minimizes the sum of squared errors in log-space, not in hours-space. This means the regression gives proportionally equal weight to early and late units. If you need to minimize absolute hour errors (e.g., for budgeting), you may need weighted regression or nonlinear least squares. For most learning curve applications, log-space regression is appropriate and standard.
R² Interpretation and Residual Analysis
R² tells you what fraction of the variance in ln(Y) is explained by the linear model. It does not tell you whether the model is correct. A structured pattern in the residuals — even with high R² — indicates the model is missing something.
| R² Range | Interpretation | Action |
|---|---|---|
| 0.90–1.00 | Excellent fit; data follows learning curve closely | Check residuals for patterns; if clean, use the model with confidence |
| 0.80–0.90 | Good fit; some scatter but trend is clear | Investigate scatter sources; widen confidence intervals for forecasts |
| 0.70–0.80 | Moderate fit; significant unexplained variation | Look for disruptions, rate changes, data quality issues; consider segmented model |
| Below 0.70 | Poor fit; simple learning curve does not describe this data | Do not use the model without addressing the cause; consider multi-variable or segmented approaches |
Always plot the residuals (actual minus predicted in log-space) against unit number. If the residuals show a U-shape, S-shape, or step function, the single-line model is inadequate. Common patterns and their causes:
| Residual Pattern | Likely Cause | Solution |
|---|---|---|
| U-shaped | Learning rate changed over the production run | Fit a segmented curve with different rates for early and late production |
| Step change | Configuration change, rate change, or disruption at a specific unit | Identify the break unit; fit separate curves before and after |
| Increasing scatter | Data quality degrades for recent units (incomplete data) or product mix changes | Verify data completeness; normalize for configuration |
| Cyclical | Seasonal workforce changes, lot-boundary effects | Add a lot or seasonal variable to the model |
Confidence Intervals for Forecasts
A point estimate without a confidence interval is incomplete. Every learning curve forecast should include a range that reflects uncertainty in both T1 and the learning rate.
From regression: T1 = 9,420 hours (SE = 380), b = –0.2107 (SE = 0.018), n = 30 units
Forecast for unit 100:
| Scenario | T1 | Learning Rate | Unit 100 Hours |
|---|---|---|---|
| Point estimate | 9,420 | 86.4% | 3,280 |
| 95% upper bound | 10,180 | 88.8% | 3,940 |
| 95% lower bound | 8,660 | 83.9% | 2,720 |
The 95% confidence range for unit 100 is 2,720 to 3,940 hours — a spread of ±19% around the point estimate. This range narrows as you add more data points and widens as you forecast further from the data range.
When the Data Does Not Fit a Straight Line
Sometimes the log-log plot is clearly not a straight line. This does not mean learning curves are wrong — it means the simple single-slope model is insufficient. Common situations and their remedies:
| Situation | What You See | Model Approach |
|---|---|---|
| Breakpoint (rate change) | Slope changes at a specific unit number | Piecewise regression: fit two lines with a common breakpoint. Use the Chow test or Bayesian information criterion (BIC) to determine if the break is statistically significant. |
| Plateau (learning limit) | Curve flattens, hours stop decreasing | Stanford-B model: Y = T1 × (X + B)b, where B represents equivalent prior experience. Or set a floor and truncate the curve. |
| Multiple populations | Two distinct clusters of points | Separate the data by configuration, production line, or work content. Fit independent curves to each population. |
| Initial instability | First 5–10 units scatter widely, then settle | Exclude the early units from regression (they include startup effects outside the learning pattern). Report T1 from the regression, not from unit 1 actuals. |
⚠️ Do Not Force-Fit a Bad Model
If the data clearly shows a breakpoint or plateau, forcing a single straight line through all points produces a “compromise” slope that is wrong for both the early and late portions. The resulting forecast will underestimate early units and overestimate late units (or vice versa). Always let the data tell you the model — do not impose a model on the data.
🎯 The Bottom Line
Log-log plotting transforms the learning curve into a straight line for visual assessment and linear regression. OLS on log-transformed data gives you T1 and the learning rate directly. R² above 0.85 indicates a good fit, but always check residuals for patterns. Confidence intervals must accompany every forecast — a point estimate without a range is an invitation for misuse. When the data does not fit a straight line, use breakpoint analysis, the Stanford-B model, or data segmentation rather than forcing a bad fit. Next: Rate Adjustment & Disruption Modeling — what happens to the curve when production stops, surges, or loses its workforce.
Stop reading, start modeling
Model your process flow, run simulations, optimize staffing with TOC math, and test your knowledge with 107 interactive checks — all in one platform.