The Methodology
How Direct Index Club estimates correlation and calculates fee savings — and the research behind it.
1. The Compounding Cost of Fees
Expense ratios are deducted daily from a fund's NAV — silently, invisibly. They don't appear on your brokerage statement as a line item. You just end up with less.
The compounding effect makes this painful over time. Every dollar paid in fees in year 1 is a dollar that can no longer compound. By year 30 (at 7% growth), that dollar would have become $7.61. You're not just losing the fee — you're losing everything it would have grown into.
Assumes 7% annual return before fees. Fee drag is modelled as (1.07 − expense_ratio)^n.
2. How Many Stocks Do You Actually Need?
The common assumption is that ETFs provide indispensable diversification you can't replicate. The research tells a more nuanced story.
The first major study to quantify diversification. Found that idiosyncratic (company-specific) risk drops sharply after just 10–15 randomly selected stocks. By 20 stocks, most unsystematic risk has been eliminated. The marginal benefit of each additional stock diminishes rapidly.
Studied the variability of returns across different portfolio sizes. Found that a randomly constructed portfolio of 32 stocks captures 95% of the variance reduction achievable with the full NYSE. This became the origin of the "30 stocks = diversified" rule of thumb.
Here's the key insight: Vanguard and iShares don't buy all 500 stocks in their S&P 500 funds. They use optimised sampling — picking a subset that represents the key sector, size, and style characteristics of the index. The reason is purely mathematical: it works.
3. The Weight Concentration Effect
In market-cap weighted indexes, weight is heavily concentrated in the top holdings. For a cap-weighted ETF:
This concentration means the bottom half of an ETF's holdings has minimal impact on returns. Owning the top 10–20 stocks, cap-weighted, gives you the lion's share of the ETF's return characteristics — at a fraction of the complexity.
4. How We Estimate R²
R² (coefficient of determination) measures how closely your basket tracks the ETF. An R² of 0.97 means 97% of the ETF's return variance is explained by your basket.
This is a conservative heuristic. The 0.50 baseline reflects that any portfolio of the right sector will have significant correlation to the ETF. The coverage multiplier captures the incremental tracking improvement from including higher-weight holdings. Real-world R² values are typically higher than our estimates — our tool is intentionally conservative.
5. Practical Considerations
ETFs rebalance automatically. Your basket drifts. Review quarterly and trim/add positions to maintain target weights.
You control when you realise gains and losses — useful for tax-loss harvesting. More transactions to track, but more flexibility.
With fractional shares, you can start small. But fee savings become meaningful at $25K+ due to the low absolute dollar amounts at smaller sizes.
A 15-stock basket has more idiosyncratic risk than a 500-stock fund. You are accepting slightly more single-stock risk in exchange for zero fees.
The full paper is on Substack
Deep dive into the compounding maths, the academic research, real-world examples with the defence and cybersecurity sectors, and how to build your first basket step by step.
Read on Substack →For educational purposes only. Not financial advice. Past performance does not guarantee future results.