Rethinking yfinance's default MultiIndex format

Table of Contents
MultiIndex
DataFrame structure in Pandas, and why I prefer this approach over the default column-based structure that yfinance provides.When I first started working with financial data in Python, I quickly discovered that the default DataFrame structure from yfinance wasn’t ideal for the kind of analysis I wanted to do.
The solution?
A simple reshaping trick that makes working with multiple stock tickers dramatically easier.
Read on to learn more.
This tutorial is part 2 in my series on getting started with fintech and market analysis with Python:
- How to download market data with yfinance and Python
- Rethinking yfinance’s default MultiIndex format (this tutorial)
- How to plot candlestick charts with Python and mplfinance
- How to compute Simple Moving Averages (SMAs) for trading with Python and Pandas
- Finding consecutive integer groups in arrays with Python and NumPy
- Computing slope of series with Pandas and SciPy
- Market stage detection with Python and Pandas
- Implementing TradingView’s Stochastic RSI indicator in Python
- Introduction to position sizing
- Risk/Reward analysis and position sizing with Python
Configuring your development environment #
Before we dive in, let’s set up our Python environment with the packages we’ll need:
$ pip install numpy pandas yfinance
The packages we’re installing include:
- numpy: The fundamental package for scientific computing in Python
- pandas: Provides high-performance, easy-to-use data structures and data analysis tools
- yfinance: A library that lets us download market data from Yahoo Finance
With our development environment configured, we can move on to writing some actual code.
Configuring our date range and ticker #
To start, let’s import our required Python packages and configure a few key variables:
# import the necessary packages
from datetime import timedelta
from datetime import datetime
import yfinance as yf
import pandas as pd
# set the start and end dates for our market data
end_date = datetime(year=2025, month=3, day=1)
start_date = end_date - timedelta(days=365)
# set the name of the ticker we want to download market data for
ticker = ["NVDA", "AAPL"]
The start_date
and end_date
parameters define the timeframe for our historical data request, spanning one year of trading data.
We’re using two popular tech stock tickers (NVIDIA and Apple) for this example (but you could use any valid ticker symbols).
Understanding yfinance’s default multi-index structure #
Let’s start by downloading some data using yfinance and examining its default structure:
# download daily market data
df_orig = yf.download(
tickers=ticker,
start=start_date,
end=end_date,
interval="1d",
auto_adjust=True,
progress=False
)
df_orig
Which should give us a DataFrame that looks like the following:
Price | Close | High | Low | Open | Volume | |||||
---|---|---|---|---|---|---|---|---|---|---|
Ticker | AAPL | NVDA | AAPL | NVDA | AAPL | NVDA | AAPL | NVDA | AAPL | NVDA |
Date | ||||||||||
2024-03-01 | 178.815674 | 82.248108 | 179.681580 | 82.269104 | 176.546390 | 79.405174 | 178.706190 | 79.969964 | 73488000 | 479135000 |
2024-03-04 | 174.277115 | 85.205002 | 176.068644 | 87.662079 | 172.973259 | 83.687574 | 175.322168 | 84.098415 | 81510100 | 615616000 |
2024-03-05 | 169.320496 | 85.935760 | 171.231471 | 86.068720 | 168.822845 | 83.389600 | 169.957487 | 85.241988 | 95132400 | 520639000 |
2024-03-06 | 168.325180 | 88.670860 | 170.435227 | 89.694526 | 167.887245 | 87.001411 | 170.256065 | 87.993089 | 68587700 | 582520000 |
2024-03-07 | 168.205750 | 92.638550 | 169.927615 | 92.736517 | 167.698152 | 89.572556 | 168.355038 | 90.128373 | 71765100 | 608119000 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2025-02-24 | 247.100006 | 130.268021 | 248.860001 | 138.577254 | 244.419998 | 130.068042 | 244.929993 | 136.547442 | 51326400 | 251381100 |
2025-02-25 | 247.039993 | 126.618355 | 250.000000 | 130.188026 | 244.910004 | 124.428561 | 248.000000 | 129.968045 | 48013300 | 271428700 |
2025-02-26 | 240.360001 | 131.267929 | 244.979996 | 133.717701 | 239.130005 | 128.478192 | 244.330002 | 129.978054 | 44433600 | 322553800 |
2025-02-27 | 237.300003 | 120.138954 | 242.460007 | 134.997581 | 237.059998 | 119.998968 | 239.410004 | 134.987587 | 41153600 | 443175800 |
2025-02-28 | 241.839996 | 124.908508 | 242.089996 | 125.078491 | 230.199997 | 116.389295 | 236.949997 | 118.009141 | 56833400 | 389091100 |
Let’s examine the parameters I’m using with yf.download()
:
- A list of ticker symbols passed to
tickers
- Daily data frequency with
interval="1d"
- Price adjustments for splits and dividends via
auto_adjust=True
- Quiet mode with
progress=False
to avoid cluttering the output
The structure of this DataFrame is what I want to focus on — let’s examine the row index first:
# show the original index structure
df_orig.index
This gives us:
DatetimeIndex(['2024-03-01', '2024-03-04', '2024-03-05', '2024-03-06',
'2024-03-07', '2024-03-08', '2024-03-11', '2024-03-12',
'2024-03-13', '2024-03-14',
...
'2025-02-14', '2025-02-18', '2025-02-19', '2025-02-20',
'2025-02-21', '2025-02-24', '2025-02-25', '2025-02-26',
'2025-02-27', '2025-02-28'],
dtype='datetime64[ns]', name='Date', length=250, freq=None)
As we can see, the row index is a simple DatetimeIndex
with one row per trading day.
Now, let’s look at the columns:
# show the original columns
df_orig.columns
The output reveals the column structure which is considerably more complex than the row structure:
MultiIndex([( 'Close', 'AAPL'),
( 'Close', 'NVDA'),
( 'High', 'AAPL'),
( 'High', 'NVDA'),
( 'Low', 'AAPL'),
( 'Low', 'NVDA'),
( 'Open', 'AAPL'),
( 'Open', 'NVDA'),
('Volume', 'AAPL'),
('Volume', 'NVDA')],
names=['Price', 'Ticker'])
Here we see that the columns are organized as a MultiIndex
with two levels:
- The first level represents the price type (Open, High, Low, Close, Volume)
- The second level represents the ticker symbol (AAPL, NVDA)
We can access all closing prices for both stocks using:
df_orig["Close"]
Note how both the closing prices for APPL and NVDA are included in the output:
Ticker | AAPL | NVDA |
---|---|---|
Date | ||
2024-03-01 | 178.815674 | 82.248108 |
2024-03-04 | 174.277115 | 85.205002 |
2024-03-05 | 169.320496 | 85.935760 |
2024-03-06 | 168.325180 | 88.670860 |
2024-03-07 | 168.205750 | 92.638550 |
... | ... | ... |
2025-02-24 | 247.100006 | 130.268021 |
2025-02-25 | 247.039993 | 126.618355 |
2025-02-26 | 240.360001 | 131.267929 |
2025-02-27 | 237.300003 | 120.138954 |
2025-02-28 | 241.839996 | 124.908508 |
Alternatively, we can grab just the closing prices for Apple:
df_orig["Close"]["AAPL"]
Which gives us us a Series rather than a DataFrame:
Date
2024-03-01 178.815674
2024-03-04 174.277115
2024-03-05 169.320496
2024-03-06 168.325180
2024-03-07 168.205750
...
2025-02-24 247.100006
2025-02-25 247.039993
2025-02-26 240.360001
2025-02-27 237.300003
2025-02-28 241.839996
Name: AAPL, Length: 250, dtype: float64
Pros and cons of yfinance’s default multi-index structure #
While the default yfinance structure works fine for a few tickers, it has some serious limitations:
- The number of columns in the DataFrame increases by 5× as a function of the number of tickers
- With just 2 tickers, we already have 10 columns (5 price types × 2 tickers)
- If you’re tracking 100 stocks, you’d end up with 500 columns, making the DataFrame unwieldy
- Visual debugging becomes nearly impossible with that many columns
- Computing technical indicators (like moving averages) across multiple stocks becomes more complex than it needs to be
That said, the column-based MultiIndex does have certain advantages. It’s excellent for grabbing all closing prices across tickers in a single operation, making it easier to compute cross-sectional metrics like correlations between stocks or sector performance on a given day.
If your analysis primarily focuses on comparing multiple stocks at specific time points, this structure can be quite convenient.
However, for the types of analysis many traders want to do (including myself) — calculating technical indicators, identifying patterns for individual stocks, or running backtests — the column-based approach quickly becomes unwieldy.
Most technical indicators operate on time series data for a single instrument, not across instruments, making the row-based structure I’m about to show you far more practical.
Updating yfinance’s multi-index structure to use row indexes #
I prefer a row-based MultiIndex approach for several reasons:
- It limits the number of columns to just 5 (OHLCV) regardless of how many tickers you track
- The row index now has two levels: ticker and date
- This approach makes it incredibly easy to use Pandas’
groupby
operations for per-ticker calculations - Technical indicators and visualizations become much simpler to implement
Here’s how to transform the default column-based structure to a row-based structure:
# clone the original dataframe
df = df_orig.copy()
# restructure the default multi-index yfinance dataframe by converting from
# wide to long format, renaming the indices, ensuring the columns are provided
# in OHLCV order, reordering the index such that date is first and symbol is
# second, and finally sorting the index itself
df = df.stack(level="Ticker", future_stack=True)
df.index.names = ["Date", "Symbol"]
df = df[["Open", "High", "Low", "Close", "Volume"]]
df = df.swaplevel(0, 1)
df = df.sort_index()
df
Note how the DataFrame structure has changed:
Price | Open | High | Low | Close | Volume | |
---|---|---|---|---|---|---|
Symbol | Date | |||||
AAPL | 2024-03-01 | 178.706190 | 179.681580 | 176.546390 | 178.815674 | 73488000 |
2024-03-04 | 175.322168 | 176.068644 | 172.973259 | 174.277115 | 81510100 | |
2024-03-05 | 169.957487 | 171.231471 | 168.822845 | 169.320496 | 95132400 | |
2024-03-06 | 170.256065 | 170.435227 | 167.887245 | 168.325180 | 68587700 | |
2024-03-07 | 168.355038 | 169.927615 | 167.698152 | 168.205750 | 71765100 | |
... | ... | ... | ... | ... | ... | ... |
NVDA | 2025-02-24 | 136.547442 | 138.577254 | 130.068042 | 130.268021 | 251381100 |
2025-02-25 | 129.968045 | 130.188026 | 124.428561 | 126.618355 | 271428700 | |
2025-02-26 | 129.978054 | 133.717701 | 128.478192 | 131.267929 | 322553800 | |
2025-02-27 | 134.987587 | 134.997581 | 119.998968 | 120.138954 | 443175800 | |
2025-02-28 | 118.009141 | 125.078491 | 116.389295 | 124.908508 | 389091100 |
Here’s what each line is doing:
df.stack(level="Ticker", future_stack=True)
converts from wide to long format, pivoting the ticker column from the column axis to the indexdf.index.names = ["Date", "Symbol"]
renames the index levels for claritydf = df[["Open", "High", "Low", "Close", "Volume"]]
ensures the columns are in the standard OHLCV orderdf.swaplevel(0, 1)
reorders the index so thatSymbol
comes first, thenDate
df.sort_index()
sorts the index for efficient access
Let’s examine our new index structure:
# show the updated index structure
df.index
Which gives us:
MultiIndex([('AAPL', '2024-03-01'),
('AAPL', '2024-03-04'),
('AAPL', '2024-03-05'),
('AAPL', '2024-03-06'),
('AAPL', '2024-03-07'),
('AAPL', '2024-03-08'),
('AAPL', '2024-03-11'),
('AAPL', '2024-03-12'),
('AAPL', '2024-03-13'),
('AAPL', '2024-03-14'),
...
('NVDA', '2025-02-14'),
('NVDA', '2025-02-18'),
('NVDA', '2025-02-19'),
('NVDA', '2025-02-20'),
('NVDA', '2025-02-21'),
('NVDA', '2025-02-24'),
('NVDA', '2025-02-25'),
('NVDA', '2025-02-26'),
('NVDA', '2025-02-27'),
('NVDA', '2025-02-28')],
names=['Symbol', 'Date'], length=500)
Now, our rows are indexed first by Symbol
, then by Date
, creating a hierarchical structure.
The columns are now much simpler:
# as well as the updated columns
df.columns
Index(['Open', 'High', 'Low', 'Close', 'Volume'], dtype='object', name='Price')
Just the five standard OHLCV columns, regardless of how many tickers we’re tracking.
Accessing individual tickers using the new row indexes #
If we want to access just the data for a single ticker, we can use the xs
(cross-section) method:
# display *just* the subset of AAPL market data
df.xs("AAPL")
Which will give us the OHLCV values for AAPL:
Price | Open | High | Low | Close | Volume |
---|---|---|---|---|---|
Date | |||||
2024-03-01 | 178.706190 | 179.681580 | 176.546390 | 178.815674 | 73488000 |
2024-03-04 | 175.322168 | 176.068644 | 172.973259 | 174.277115 | 81510100 |
2024-03-05 | 169.957487 | 171.231471 | 168.822845 | 169.320496 | 95132400 |
2024-03-06 | 170.256065 | 170.435227 | 167.887245 | 168.325180 | 68587700 |
2024-03-07 | 168.355038 | 169.927615 | 167.698152 | 168.205750 | 71765100 |
... | ... | ... | ... | ... | ... |
2025-02-24 | 244.929993 | 248.860001 | 244.419998 | 247.100006 | 51326400 |
2025-02-25 | 248.000000 | 250.000000 | 244.910004 | 247.039993 | 48013300 |
2025-02-26 | 244.330002 | 244.979996 | 239.130005 | 240.360001 | 44433600 |
2025-02-27 | 239.410004 | 242.460007 | 237.059998 | 237.300003 | 41153600 |
2025-02-28 | 236.949997 | 242.089996 | 230.199997 | 241.839996 | 56833400 |
Performing data analysis operations on our new multi-index DataFrame #
The real power of this structure becomes apparent when we start doing analysis.
Let’s compute a 50-day moving average on the closing prices:
# notice how easy it is to analyze OHLCV data for each of the symbols with
# this structure by (1) first grouping on the symbol, followed by (2) applying
# our analysis method via 'transform'
df["50MA"] = df.groupby(level="Symbol")["Close"].transform(
lambda x: x.rolling(window=50).mean()
)
df
Which will add a new 50MA
column to our DataFrame:
Price | Open | High | Low | Close | Volume | 50MA | |
---|---|---|---|---|---|---|---|
Symbol | Date | ||||||
AAPL | 2024-03-01 | 178.706190 | 179.681580 | 176.546390 | 178.815674 | 73488000 | NaN |
2024-03-04 | 175.322168 | 176.068644 | 172.973259 | 174.277115 | 81510100 | NaN | |
2024-03-05 | 169.957487 | 171.231471 | 168.822845 | 169.320496 | 95132400 | NaN | |
2024-03-06 | 170.256065 | 170.435227 | 167.887245 | 168.325180 | 68587700 | NaN | |
2024-03-07 | 168.355038 | 169.927615 | 167.698152 | 168.205750 | 71765100 | NaN | |
... | ... | ... | ... | ... | ... | ... | ... |
NVDA | 2025-02-24 | 136.547442 | 138.577254 | 130.068042 | 130.268021 | 251381100 | 134.377040 |
2025-02-25 | 129.968045 | 130.188026 | 124.428561 | 126.618355 | 271428700 | 134.208256 | |
2025-02-26 | 129.978054 | 133.717701 | 128.478192 | 131.267929 | 322553800 | 134.047671 | |
2025-02-27 | 134.987587 | 134.997581 | 119.998968 | 120.138954 | 443175800 | 133.703902 | |
2025-02-28 | 118.009141 | 125.078491 | 116.389295 | 124.908508 | 389091100 | 133.517319 |
This is where the row-based MultiIndex really shines:
- We group the DataFrame by
Symbol
, creating separate groups for each ticker - We select just the
Close
price column for all tickers - We apply the
transform
method to compute a 50-day moving average for each group - The result is stored in a new
50MA
column that preserves the original index structure
Let’s verify the calculation for AAPL:
# let's investigate the computed 50MA for AAPL
df.xs("AAPL")["50MA"]
Date
2024-03-01 NaN
2024-03-04 NaN
2024-03-05 NaN
2024-03-06 NaN
2024-03-07 NaN
...
2025-02-24 240.485810
2025-02-25 240.476652
2025-02-26 240.359467
2025-02-27 240.151713
2025-02-28 240.031363
Name: 50MA, Length: 250, dtype: float64
The first 49 days show NaN values because a 50-day moving average needs at least 50 days of data. Once we have enough data points, the moving average starts calculating properly.
We can confirm that our groupby approach gives the same result as calculating the moving average directly on AAPL’s closing prices:
# now, let's *manually* compute the 50MA for AAPL (i.e., on a series rather
# than a multi-index dataframe)
df.xs("AAPL")["Close"].rolling(window=50).mean()
Date
2024-03-01 NaN
2024-03-04 NaN
2024-03-05 NaN
2024-03-06 NaN
2024-03-07 NaN
...
2025-02-24 240.485810
2025-02-25 240.476652
2025-02-26 240.359467
2025-02-27 240.151713
2025-02-28 240.031363
Name: Close, Length: 250, dtype: float64
And formally verify they’re identical:
# verify our calculations match
pd.testing.assert_series_equal(
df.xs("AAPL")["50MA"],
df.xs("AAPL")["Close"].rolling(window=50).mean(),
check_names=False
)
No error means the values are identical, so our calculation indeed correct.
Exercises #
Before we wrap up, try these exercises to reinforce what you’ve just learned:
DataFrame scaling test: Download data for 5 different tickers and observe how the DataFrame structure changes and scales with both the default column-based
MultiIndex
and our row-based approach. Note the differences in column count and overall DataFrame shape.Technical indicator calculation: Add a 20-day moving average to the DataFrame using the same approach I detailed earlier in this article. Create a visualization comparing both the 20-day and 50-day moving averages against the closing price.
Signal generation: Create a new column that indicates when the 20-day MA crosses above the 50-day MA. Can you also add a column to identify when the 20-day MA crosses below the 50-day MA. Moving average crossovers are typically used as common buy/sell signals.
RSI implementation (Advanced): Calculate the relative strength index (RSI) for each ticker using the row-based
MultiIndex
structure. You may need to query Google or your favorite LLM to help you learn more about implementing RSI. Test your implementation with different lookback periods (14 days is standard, but try 9 and 21 as well).
Remember, the best way to master these DataFrame manipulations is through hands-on practice—try implementing each example yourself to develop a deeper understanding of how MultiIndex structures can enhance your market analysis workflows.
Final thoughts #
Restructuring yfinance’s default DataFrame to use a row-based MultiIndex
provides significant advantages when working with multiple stock tickers:
- The number of columns stays constant regardless of how many tickers you track
- Group-by operations make it easy to compute technical indicators per ticker
- The structure scales gracefully as you add more tickers to your analysis
- It’s much easier to visually inspect and debug your data
This approach has been my go-to method for organizing stock market data in Python, and I hope it helps streamline your quantitative analysis workflows too.
In the next tutorial, we’ll look at how to visualize this data using candlestick charts with mplfinance
, a stock data plotting library built on matplotlib
.