New to this topic?
We recommend reading these guides first to get the most out of this one:
Log-Log
Transform to Linear
R²≥0.85
Good Fit Threshold
±CI
Confidence Interval
Outlier Threshold

Log-Log Plotting: Making Curves into Lines

The learning curve equation Y = T1 × Xb is nonlinear in its natural form. Plotting raw hours (Y-axis) against unit number (X-axis) produces a curve that is difficult to assess visually. The solution is a logarithmic transformation of both axes.

Taking the natural log of both sides: ln(Y) = ln(T1) + b × ln(X). This is the equation of a straight line where ln(T1) is the y-intercept and b is the slope. On log-log paper, a perfect learning curve is a straight line with a negative slope.

📊 Log-Log Transform Example Step by Step

Given data: T1 = 10,000 hours, 85% learning rate (b = –0.2345)

Unit (X)Hours (Y)ln(X)ln(Y)
110,0000.0009.210
56,9311.6098.844
105,8282.3038.671
254,5753.2198.428
503,8483.9128.255
1003,2374.6058.083

The ln(Y) values decrease linearly with ln(X). The slope is –0.2345, which is exactly b. The y-intercept is 9.210, which is ln(10,000) = ln(T1). On a log-log plot, these six points fall on a perfectly straight line.

Linear Regression on Transformed Data

Once the data is log-transformed, ordinary least squares (OLS) regression finds the best-fit line through the points. The regression output gives you two parameters:

ParameterRegression OutputLearning Curve MeaningHow to Convert
Intercept (a)ln(T1)Theoretical first unit hoursT1 = ea
Slope (b)Learning exponentRate of improvementLearning rate = 2b
📊 Regression Example from Production Data Real-World Application

Given: 30 completed units with actual touch labor hours. After log-transforming and running OLS regression:

  • Intercept (a) = 9.152 → T1 = e9.152 = 9,420 hours
  • Slope (b) = –0.2107 → Learning rate = 2–0.2107 = 86.4%
  • R² = 0.91
  • Standard error of b = 0.018

Interpretation: The data supports an 86.4% unit learning curve with T1 of 9,420 hours. The R² of 0.91 indicates a strong fit. The standard error of the slope means the 95% confidence interval for the learning rate is approximately 83.9% to 88.8%.

⚠️ Regression on Logs ≠ Regression on Raw Data

OLS regression on log-transformed data minimizes the sum of squared errors in log-space, not in hours-space. This means the regression gives proportionally equal weight to early and late units. If you need to minimize absolute hour errors (e.g., for budgeting), you may need weighted regression or nonlinear least squares. For most learning curve applications, log-space regression is appropriate and standard.

R² Interpretation and Residual Analysis

R² tells you what fraction of the variance in ln(Y) is explained by the linear model. It does not tell you whether the model is correct. A structured pattern in the residuals — even with high R² — indicates the model is missing something.

R² RangeInterpretationAction
0.90–1.00Excellent fit; data follows learning curve closelyCheck residuals for patterns; if clean, use the model with confidence
0.80–0.90Good fit; some scatter but trend is clearInvestigate scatter sources; widen confidence intervals for forecasts
0.70–0.80Moderate fit; significant unexplained variationLook for disruptions, rate changes, data quality issues; consider segmented model
Below 0.70Poor fit; simple learning curve does not describe this dataDo not use the model without addressing the cause; consider multi-variable or segmented approaches

Always plot the residuals (actual minus predicted in log-space) against unit number. If the residuals show a U-shape, S-shape, or step function, the single-line model is inadequate. Common patterns and their causes:

Residual PatternLikely CauseSolution
U-shapedLearning rate changed over the production runFit a segmented curve with different rates for early and late production
Step changeConfiguration change, rate change, or disruption at a specific unitIdentify the break unit; fit separate curves before and after
Increasing scatterData quality degrades for recent units (incomplete data) or product mix changesVerify data completeness; normalize for configuration
CyclicalSeasonal workforce changes, lot-boundary effectsAdd a lot or seasonal variable to the model

Confidence Intervals for Forecasts

A point estimate without a confidence interval is incomplete. Every learning curve forecast should include a range that reflects uncertainty in both T1 and the learning rate.

📊 Building a Confidence Interval Practical Method

From regression: T1 = 9,420 hours (SE = 380), b = –0.2107 (SE = 0.018), n = 30 units

Forecast for unit 100:

ScenarioT1Learning RateUnit 100 Hours
Point estimate9,42086.4%3,280
95% upper bound10,18088.8%3,940
95% lower bound8,66083.9%2,720

The 95% confidence range for unit 100 is 2,720 to 3,940 hours — a spread of ±19% around the point estimate. This range narrows as you add more data points and widens as you forecast further from the data range.

When the Data Does Not Fit a Straight Line

Sometimes the log-log plot is clearly not a straight line. This does not mean learning curves are wrong — it means the simple single-slope model is insufficient. Common situations and their remedies:

SituationWhat You SeeModel Approach
Breakpoint (rate change)Slope changes at a specific unit numberPiecewise regression: fit two lines with a common breakpoint. Use the Chow test or Bayesian information criterion (BIC) to determine if the break is statistically significant.
Plateau (learning limit)Curve flattens, hours stop decreasingStanford-B model: Y = T1 × (X + B)b, where B represents equivalent prior experience. Or set a floor and truncate the curve.
Multiple populationsTwo distinct clusters of pointsSeparate the data by configuration, production line, or work content. Fit independent curves to each population.
Initial instabilityFirst 5–10 units scatter widely, then settleExclude the early units from regression (they include startup effects outside the learning pattern). Report T1 from the regression, not from unit 1 actuals.

⚠️ Do Not Force-Fit a Bad Model

If the data clearly shows a breakpoint or plateau, forcing a single straight line through all points produces a “compromise” slope that is wrong for both the early and late portions. The resulting forecast will underestimate early units and overestimate late units (or vice versa). Always let the data tell you the model — do not impose a model on the data.

🎯 The Bottom Line

Log-log plotting transforms the learning curve into a straight line for visual assessment and linear regression. OLS on log-transformed data gives you T1 and the learning rate directly. R² above 0.85 indicates a good fit, but always check residuals for patterns. Confidence intervals must accompany every forecast — a point estimate without a range is an invitation for misuse. When the data does not fit a straight line, use breakpoint analysis, the Stanford-B model, or data segmentation rather than forcing a bad fit. Next: Rate Adjustment & Disruption Modeling — what happens to the curve when production stops, surges, or loses its workforce.

🏭
Free Process Modeler
Map your production flow, find bottlenecks & optimize staffing. No login required.
Try It Free →
πŸ’Ύ
Save your learning progress PRO
Track quiz scores, earn badges, and pick up where you left off.
Upgrade →
Free forever · No credit card

Stop reading, start modeling

Model your process flow, run simulations, optimize staffing with TOC math, and test your knowledge with 107 interactive checks — all in one platform.

Open Workbench →