Statistician Requires N=47 Loaves for Crumb Porosity P-Value; Dataset Currently Yields Inconclusive 0.0847 Significance
Callum Dresdner, former clinical trial biostatistician, has baked 31 loaves toward a predetermined sample size of 47, at which point he expects to confirm — or formally fail to confirm — that 82 percent hydration produces measurably more open crumb than 80 percent. His current p-value is 0.0847. His daughter is now a co-investigator.

The crumb structure analysis for Trial 31 was logged at 6:14 a.m. on a Tuesday, sixteen minutes after the loaf reached ambient temperature and approximately forty seconds after Callum Dresdner photographed all four cross-sections under the 5600K ring light he uses for every observation in the dataset. The ImageJ porosity read came back at 37.2 percent open cell area — two points above the dataset mean, one point below the upper control limit, and nowhere near enough to move the p-value.
"We're at 0.0847," Dresdner said, pulling up his R console. "The null hypothesis is still viable."
The null hypothesis, for context, is that 82 percent hydration produces the same crumb porosity as 80 percent in a country-style miche baked at 480°F with 18 minutes of covered steam. The alternative hypothesis — which Dresdner has held since February on the basis of what he calls "strong prior literature from Hamelman and intuitive Bayesian priors" — is that the two-point hydration difference produces a measurable, reproducible effect on alveolar distribution. An 82 percent dough does extend the gluten network further during the final oven spring, the additional free water accelerating CO₂ mobility through the crumb matrix in the first eight minutes of bake. Whether that registers as statistical significance is, apparently, a separate question.
He needs a p-value below 0.05 to say so. He has baked 31 loaves.
His power analysis, conducted in November using a pilot dataset of seven loaves, specified N=47 for 80 percent power at an estimated effect size of Cohen's d = 0.6, assuming a two-tailed independent samples t-test with α = 0.05. The pilot estimate has since proved optimistic. Effect size in the full dataset has drifted to d = 0.41, which requires — Dresdner recalculated this on a Sunday while his 82 percent hydration loaf was in its overnight retard — N=72 for equivalent power.
He has not updated the target. He is aware of the issue.
Dresdner spent eleven years as a biostatistician designing Phase II oncology trials, which is to say he spent eleven years in an environment where an underpowered study was not a personal failing but a protocol deviation with regulatory consequences. The habit did not transfer cleanly. His starter, a five-year-old stiff levain maintained at 65 percent hydration, was named Periwinkle by his daughter before he relabeled the jar P-001 in the lab notebook. Temperature holds at 73°F ± 1.2°F via an Inkbird IBS-TH2 Pro sensor mounted at jar height. Bulk fermentation terminates at 30 percent volume increase confirmed by aliquot jar, not by feel.
"Feel is not a reproducible endpoint," he said.
The independent variable — hydration, 80 versus 82 percent — is varied by randomized block across baking sessions to control for seasonal ambient humidity, which Dresdner records every Saturday morning in a file he calls the Environmental Covariates Log. This file has 19 weeks of entries. It has never been used in a formal analysis. He describes it as insurance.
The dependent variable is quantified via ImageJ thresholding on a standardized cross-section photograph. Each loaf yields four measurements, averaged. Inter-rater reliability — Dresdner tested himself twice on the same images, two weeks apart — is r = 0.91, which he considers acceptable but not ideal. He has discussed blinding the analysis, which would require a second person to photograph the cross-sections while he left the room. His wife Simone has declined twice.
The 31-loaf dataset shows a mean porosity of 35.4 percent for the 82 percent condition and 33.1 percent for the 80 percent condition, a raw difference of 2.3 percentage points that looks meaningful on a bar chart and is, statistically speaking, not there. The 95 percent confidence interval for the difference runs from −0.4 to 5.0. Dresdner printed this interval and taped it to the cabinet above his proofing setup.
"The interval doesn't exclude a real effect," he said. "It just doesn't confirm one."
He has sixteen loaves to go, assuming the effect size does not continue to drift, which he has modeled in three scenarios. Scenarios A and B resolve into significance by loaf 47. Scenario C requires renegotiating the alpha threshold, which he has not ruled out but describes as a decision that would need to be very carefully documented. Scenario C is, by his own admission, the most likely.
His daughter Rowan, who is nine and who named P-001 before the relabeling, asked last week whether she could be the cross-section photographer. Dresdner said he would need to assess her inter-rater reliability first. He sent her home with five already-analyzed images and a printed scoring rubric. She returned the sheets forty minutes later. Her correlation with the reference scores was r = 0.94.
He promoted her to co-investigator at dinner. She asked if that meant she got to name the loaves.
The loaves are identified by trial number, he told her.
She said she was calling Trial 32 Gerald.
He said he would note that in the metadata.
AI-generated satirical fiction. Not real news.
Comments
Loading comments...