Skip to main content

The Baker's Bulletin

Back to Articles

Statistician Requires N=847 Loaves for Hydration P-Value; Current Dataset Yields P=0.0847, 'Still Inconclusive'

Applied biostatistician Marcus Vreeland has baked 412 loaves across a 14-month pre-registered hydration RCT and reports that the crumb porosity data, while 'directionally consistent with H1,' does not yet meet the pre-registered α=0.05 threshold required to conclude anything. He expects to reach N=847 by August.

4 min read
The Baker's Bulletin
Statistician Requires N=847 Loaves for Hydration P-Value; Current Dataset Yields P=0.0847, 'Still Inconclusive'
At 6:14am on a Tuesday in February, Marcus Vreeland removed Trial Loaf #412 from a pre-heated 5.5-quart Le Creuset at 500°F and placed it on a wire cooling rack beside a printed copy of his trial registration, a micrometer caliper, and a laminated sheet containing the power analysis he'd run in October 2024 establishing that distinguishing a 0.2-percentage-point hydration difference at 80% power would require a minimum sample of 847 loaves. The loaf was extraordinary. The crust knocked hollow. The crumb — open, irregular, shatteringly glutinous — would have drawn a comment thread on any of the major bread forums. Vreeland photographed the cross-section at the standardized 65-minute post-bake mark, uploaded it to the scoring spreadsheet, and received a porosity rating of 7.3. He updated the master dataset and ran the t-test. "The p-value came in at 0.0847," he said, with the careful neutrality of a man who has been through this before. "That's directionally consistent with H1. The effect is real. I just can't publish the effect." Vreeland spent eleven years as a senior biostatistician at a contract research organization before leaving what he describes as "the fraudulent p-hacking culture of pharmaceutical trials" to pursue what he calls "genuinely clean science." He began the Hydration Efficacy Study — formally pre-registered in a Google Doc titled HES-v3.2_FINAL_FINAL — in January 2025. The study tests a single experimental manipulation: whether 82.7% hydration produces statistically superior open crumb development versus an 82.5% control, as measured by the porosity score. The starter is named Null Hypothesis. He feeds it 1:1:1 at 8am and 8pm, maintains ambient temperature at 73.4°F using an Inkbird IBS-TH2 Pro with a secondary probe zip-tied to the jar, and logs every feed in a conditional-formatted Google Sheet that highlights deviations greater than 0.3°F in amber. The levain builds at 78.1°F. Bulk fermentation runs 4.5 hours at 76°F. The process has not changed since trial commencement. "If I change the process mid-trial, I invalidate the dataset," he said. "That's a protocol violation. You'd have to restart from enrollment." His wife Priya, who holds a master's degree in public health and initially found the trial "conceptually interesting," estimates she has eaten between 380 and 400 of the study loaves. "The bread is genuinely excellent," she said. "That's the part that's hard to explain to people. He's not failing. Every loaf is excellent. He just can't tell you which one is *statistically* excellent." The power analysis has been revised three times. The original target of 312 loaves derived from a 14-loaf pilot that Vreeland now characterizes as "severely underpowered and honestly embarrassing." The second version raised the target to 611 after he concluded his pilot effect size estimate was optimistic. The current figure of 847 emerged from a call with a former CRO colleague who works on clinical endpoint validation and agreed that for a continuous porosity variable with this variance profile, you really do need to go bigger. "He understood immediately why 612 would be insufficient," Vreeland said. As of this writing, the dataset contains 412 observations. The confidence interval is wide. Vreeland knows precisely how wide — he has it memorized in both 95% and 90% variants because he briefly considered lowering the threshold and then pre-registered a justification document for why he would not. He has also pre-registered a follow-on study, HES-Extension-v1.0, contingent on achieving significance in the primary trial, which will examine whether a two-minute reduction in autolyse duration interacts with the hydration effect. Preliminary power calculations suggest 600 loaves minimum. He has not ruled out 900. Null Hypothesis peaked at 5:47am and began its slow recession toward baseline. Vreeland was awake. He logged it. He cross-referenced the humidity reading — 62.4% RH — against the 90-day moving average, noted a non-significant deviation, and initialed the observation log. Trial Loaf #413 went into bulk fermentation at 7am. "The effect is there," he said, examining the Loaf #412 cross-section under a USB microscope at 40× magnification, adjusting the focal ring with the deliberate patience of a man examining evidence he is confident will eventually vindicate him. "I can see it. I just can't prove it yet." He paused. "Which is exactly what you'd expect at N=412."

Comments

Loading comments...

AI-generated satirical fiction. Not real news.

100 AI-generated satirical newspapers

© 2026 winkl

*winkl intentionally contains content that may be completely and utterly ridiculous.