Why use log-log plotting for learning curves?

The learning curve equation Y = T1 × X^b is a power function. Taking the logarithm of both sides gives: ln(Y) = ln(T1) + b × ln(X). This is a linear equation in ln(X) and ln(Y), so a learning curve that follows the power law will appear as a straight line on a log-log plot. This makes it easy to visually assess fit, identify breakpoints, and spot outliers. If the data is not a straight line on log-log paper, the simple learning curve model does not fit and you need a more complex approach.

What R-squared value indicates a good learning curve fit?

For production learning curves, R² above 0.85 is generally considered a good fit. R² between 0.70 and 0.85 is acceptable but warrants investigation of the scatter sources. Below 0.70, the simple learning curve model is not capturing the data pattern well — look for disruptions, rate changes, configuration changes, or data quality issues. Note that R² alone does not validate the model: a high R² with a clear non-linear pattern in the residuals means the model is wrong despite a good numerical fit.

How do you handle outliers in learning curve data?

First, investigate every outlier to determine its cause — data error, disruption, configuration change, or genuine anomaly. Never discard data without understanding why it deviates. If the cause is identified (e.g., a strike during that unit), either adjust the data point or exclude it with documentation. If no cause is found, include it. Running regression with and without suspected outliers and comparing results is standard practice. Report both results to decision-makers.

Unit & Cumulative Average Curves: Plotting, Regression, Confidence

New to this topic?

We recommend reading these guides first to get the most out of this one:

Learning Curve Theory Aerospace Learning Curve Application HPU Data Collection & Validation

Log-Log

Transform to Linear

R²≥0.85

Good Fit Threshold

±CI

Confidence Interval

2σ

Outlier Threshold

Log-Log Plotting: Making Curves into Lines

The learning curve equation Y = T1 × X^b is nonlinear in its natural form. Plotting raw hours (Y-axis) against unit number (X-axis) produces a curve that is difficult to assess visually. The solution is a logarithmic transformation of both axes.

Taking the natural log of both sides: ln(Y) = ln(T1) + b × ln(X). This is the equation of a straight line where ln(T1) is the y-intercept and b is the slope. On log-log paper, a perfect learning curve is a straight line with a negative slope.

📊 Log-Log Transform Example Step by Step

Given data: T1 = 10,000 hours, 85% learning rate (b = –0.2345)

Unit (X)	Hours (Y)	ln(X)	ln(Y)
1	10,000	0.000	9.210
5	6,931	1.609	8.844
10	5,828	2.303	8.671
25	4,575	3.219	8.428
50	3,848	3.912	8.255
100	3,237	4.605	8.083

The ln(Y) values decrease linearly with ln(X). The slope is –0.2345, which is exactly b. The y-intercept is 9.210, which is ln(10,000) = ln(T1). On a log-log plot, these six points fall on a perfectly straight line.

Linear Regression on Transformed Data

Once the data is log-transformed, ordinary least squares (OLS) regression finds the best-fit line through the points. The regression output gives you two parameters:

Parameter	Regression Output	Learning Curve Meaning	How to Convert
Intercept (a)	ln(T1)	Theoretical first unit hours	T1 = e^a
Slope (b)	Learning exponent	Rate of improvement	Learning rate = 2^b

📊 Regression Example from Production Data Real-World Application

Given: 30 completed units with actual touch labor hours. After log-transforming and running OLS regression:

Intercept (a) = 9.152 → T1 = e^9.152 = 9,420 hours
Slope (b) = –0.2107 → Learning rate = 2^–0.2107 = 86.4%
R² = 0.91
Standard error of b = 0.018

Interpretation: The data supports an 86.4% unit learning curve with T1 of 9,420 hours. The R² of 0.91 indicates a strong fit. The standard error of the slope means the 95% confidence interval for the learning rate is approximately 83.9% to 88.8%.

⚠️ Regression on Logs ≠ Regression on Raw Data

OLS regression on log-transformed data minimizes the sum of squared errors in log-space, not in hours-space. This means the regression gives proportionally equal weight to early and late units. If you need to minimize absolute hour errors (e.g., for budgeting), you may need weighted regression or nonlinear least squares. For most learning curve applications, log-space regression is appropriate and standard.

R² Interpretation and Residual Analysis

R² tells you what fraction of the variance in ln(Y) is explained by the linear model. It does not tell you whether the model is correct. A structured pattern in the residuals — even with high R² — indicates the model is missing something.

R² Range	Interpretation	Action
0.90–1.00	Excellent fit; data follows learning curve closely	Check residuals for patterns; if clean, use the model with confidence
0.80–0.90	Good fit; some scatter but trend is clear	Investigate scatter sources; widen confidence intervals for forecasts
0.70–0.80	Moderate fit; significant unexplained variation	Look for disruptions, rate changes, data quality issues; consider segmented model
Below 0.70	Poor fit; simple learning curve does not describe this data	Do not use the model without addressing the cause; consider multi-variable or segmented approaches

Always plot the residuals (actual minus predicted in log-space) against unit number. If the residuals show a U-shape, S-shape, or step function, the single-line model is inadequate. Common patterns and their causes:

Residual Pattern	Likely Cause	Solution
U-shaped	Learning rate changed over the production run	Fit a segmented curve with different rates for early and late production
Step change	Configuration change, rate change, or disruption at a specific unit	Identify the break unit; fit separate curves before and after
Increasing scatter	Data quality degrades for recent units (incomplete data) or product mix changes	Verify data completeness; normalize for configuration
Cyclical	Seasonal workforce changes, lot-boundary effects	Add a lot or seasonal variable to the model

Confidence Intervals for Forecasts

A point estimate without a confidence interval is incomplete. Every learning curve forecast should include a range that reflects uncertainty in both T1 and the learning rate.

📊 Building a Confidence Interval Practical Method

From regression: T1 = 9,420 hours (SE = 380), b = –0.2107 (SE = 0.018), n = 30 units

Forecast for unit 100:

Scenario	T1	Learning Rate	Unit 100 Hours
Point estimate	9,420	86.4%	3,280
95% upper bound	10,180	88.8%	3,940
95% lower bound	8,660	83.9%	2,720

The 95% confidence range for unit 100 is 2,720 to 3,940 hours — a spread of ±19% around the point estimate. This range narrows as you add more data points and widens as you forecast further from the data range.

When the Data Does Not Fit a Straight Line

Sometimes the log-log plot is clearly not a straight line. This does not mean learning curves are wrong — it means the simple single-slope model is insufficient. Common situations and their remedies:

Situation	What You See	Model Approach
Breakpoint (rate change)	Slope changes at a specific unit number	Piecewise regression: fit two lines with a common breakpoint. Use the Chow test or Bayesian information criterion (BIC) to determine if the break is statistically significant.
Plateau (learning limit)	Curve flattens, hours stop decreasing	Stanford-B model: Y = T1 × (X + B)^b, where B represents equivalent prior experience. Or set a floor and truncate the curve.
Multiple populations	Two distinct clusters of points	Separate the data by configuration, production line, or work content. Fit independent curves to each population.
Initial instability	First 5–10 units scatter widely, then settle	Exclude the early units from regression (they include startup effects outside the learning pattern). Report T1 from the regression, not from unit 1 actuals.

⚠️ Do Not Force-Fit a Bad Model

If the data clearly shows a breakpoint or plateau, forcing a single straight line through all points produces a “compromise” slope that is wrong for both the early and late portions. The resulting forecast will underestimate early units and overestimate late units (or vice versa). Always let the data tell you the model — do not impose a model on the data.

🎯 The Bottom Line

Log-log plotting transforms the learning curve into a straight line for visual assessment and linear regression. OLS on log-transformed data gives you T1 and the learning rate directly. R² above 0.85 indicates a good fit, but always check residuals for patterns. Confidence intervals must accompany every forecast — a point estimate without a range is an invitation for misuse. When the data does not fit a straight line, use breakpoint analysis, the Stanford-B model, or data segmentation rather than forcing a bad fit. Next: Rate Adjustment & Disruption Modeling — what happens to the curve when production stops, surges, or loses its workforce.

Keep Learning

📖

Guide

Rate Adjustment & Disruption Modeling

📖

Guide

HPU Forecasting for Proposals & EACs

🏭

Free Process Modeler

Map your production flow, find bottlenecks & optimize staffing. No login required.

Try It Free →

💾