What is direct indexing?

Direct indexing is the practice of owning the individual stocks that make up an index or ETF, rather than buying the ETF itself. This eliminates the fund's expense ratio (annual management fee), which can save thousands of dollars over time through compounding.

How many stocks do I need to replicate an ETF?

Research shows that 15-20 stocks representing the top holdings of a sector ETF achieves R² > 0.95 correlation to the full fund. For broad market ETFs like SPY, 50 stocks achieves R² ≈ 0.97-0.98.

How much money do I save by direct indexing?

It depends on the ETF. SPY (0.09%/yr) saves $90/yr on $100K — modest. ITA defence ETF (0.40%/yr) saves $400/yr, which compounds to $88,000 over 30 years on a $100K investment.

Is direct indexing free?

Yes — Direct Index Club is completely free to use. No account required, no signup, no fees. Most major brokers also offer zero-commission stock trading, making the basket itself free to build.

What ETFs can I replicate?

Direct Index Club currently supports SPY (S&P 500), QQQ (Nasdaq-100), ITA (US Defence), XLV (Healthcare), XLE (Energy), and HACK (Cybersecurity), with more being added.

← Back to builder

The Methodology

How Direct Index Club estimates correlation and calculates fee savings — and the research behind it.

Read the full analysis on Substack

Deep dive: the compounding cost of fees, the research on portfolio replication, and how to build your first basket.

Read on Substack →

1. The Compounding Cost of Fees

Expense ratios are deducted daily from a fund's NAV — silently, invisibly. They don't appear on your brokerage statement as a line item. You just end up with less.

The compounding effect makes this painful over time. Every dollar paid in fees in year 1 is a dollar that can no longer compound. By year 30 (at 7% growth), that dollar would have become $7.61. You're not just losing the fee — you're losing everything it would have grown into.

$100,000 invested at 7% annual return — value after N years

Expense Ratio	Example	10yr	20yr	30yr	Lost vs DIY (30yr)
0.00%	DIY Basket	$196,715	$386,968	$761,226	—
0.03%	VOO	$196,164	$384,804	$754,849	-$6,377
0.09%	SPY	$195,067	$380,510	$742,249	-$18,976
0.10%	XLV / XLE	$194,884	$379,799	$740,169	-$21,056
0.20%	QQQ	$193,069	$372,756	$719,677	-$41,549
0.40%	ITA (Defence)	$189,484	$359,041	$680,325	-$80,901
0.60%	HACK (Cyber)	$185,959	$345,806	$643,056	-$118,169
0.75%	ARKK	$183,354	$336,185	$616,408	-$144,818

Assumes 7% annual return before fees. Fee drag is modelled as (1.07 − expense_ratio)^n.

2. How Many Stocks Do You Actually Need?

The common assumption is that ETFs provide indispensable diversification you can't replicate. The research tells a more nuanced story.

Evans & Archer (1968)

The first major study to quantify diversification. Found that idiosyncratic (company-specific) risk drops sharply after just 10–15 randomly selected stocks. By 20 stocks, most unsystematic risk has been eliminated. The marginal benefit of each additional stock diminishes rapidly.

Fisher & Lorie (1970)

Studied the variability of returns across different portfolio sizes. Found that a randomly constructed portfolio of 32 stocks captures 95% of the variance reduction achievable with the full NYSE. This became the origin of the "30 stocks = diversified" rule of thumb.

Optimised Sampling (modern ETF practice)

Here's the key insight: Vanguard and iShares don't buy all 500 stocks in their S&P 500 funds. They use optimised sampling — picking a subset that represents the key sector, size, and style characteristics of the index. The reason is purely mathematical: it works.

3. The Weight Concentration Effect

In market-cap weighted indexes, weight is heavily concentrated in the top holdings. For a cap-weighted ETF:

Top 10 holdingsSPY: ~35%ITA: ~80%

Top 15 holdingsSPY: ~45%ITA: ~90%+

Bottom halfSPY: ~10%ITA: ~5%

This concentration means the bottom half of an ETF's holdings has minimal impact on returns. Owning the top 10–20 stocks, cap-weighted, gives you the lion's share of the ETF's return characteristics — at a fraction of the complexity.

4. How We Estimate R²

R² (coefficient of determination) measures how closely your basket tracks the ETF. An R² of 0.97 means 97% of the ETF's return variance is explained by your basket.

// R² estimation formula
coverage = selectedWeight / totalHoldingsWeight
R² = min(0.50 + coverage × 0.49, 0.99)

This is a conservative heuristic. The 0.50 baseline reflects that any portfolio of the right sector will have significant correlation to the ETF. The coverage multiplier captures the incremental tracking improvement from including higher-weight holdings. Real-world R² values are typically higher than our estimates — our tool is intentionally conservative.

5. Practical Considerations

🔄

Rebalancing

ETFs rebalance automatically. Your basket drifts. Review quarterly and trim/add positions to maintain target weights.

💸

Tax efficiency

You control when you realise gains and losses — useful for tax-loss harvesting. More transactions to track, but more flexibility.

📏

Minimum size

With fractional shares, you can start small. But fee savings become meaningful at $25K+ due to the low absolute dollar amounts at smaller sizes.

⚠️

Concentration risk

A 15-stock basket has more idiosyncratic risk than a 500-stock fund. You are accepting slightly more single-stock risk in exchange for zero fees.

📝

The full paper is on Substack

Deep dive into the compounding maths, the academic research, real-world examples with the defence and cybersecurity sectors, and how to build your first basket step by step.

Read on Substack →

For educational purposes only. Not financial advice. Past performance does not guarantee future results.