Rethinking yfinance's default MultiIndex format

Table of Contents

In this tutorial, I’ll show you how to organize stock market data using a row-based MultiIndex DataFrame structure in Pandas, and why I prefer this approach over the default column-based structure that yfinance provides.

When I first started working with financial data in Python, I quickly discovered that the default DataFrame structure from yfinance wasn’t ideal for the kind of analysis I wanted to do.

The solution?

A simple reshaping trick that makes working with multiple stock tickers dramatically easier.

Read on to learn more.

This tutorial is part 2 in my series on getting started with fintech and market analysis with Python:

How to download market data with yfinance and Python
Rethinking yfinance’s default MultiIndex format (this tutorial)
How to plot candlestick charts with Python and mplfinance
How to compute Simple Moving Averages (SMAs) for trading with Python and Pandas
Finding consecutive integer groups in arrays with Python and NumPy
Computing slope of series with Pandas and SciPy
Market stage detection with Python and Pandas
Implementing TradingView’s Stochastic RSI indicator in Python
Introduction to position sizing
Risk/Reward analysis and position sizing with Python

Configuring your development environment #

Before we dive in, let’s set up our Python environment with the packages we’ll need:

$ pip install numpy pandas yfinance

The packages we’re installing include:

numpy: The fundamental package for scientific computing in Python
pandas: Provides high-performance, easy-to-use data structures and data analysis tools
yfinance: A library that lets us download market data from Yahoo Finance

With our development environment configured, we can move on to writing some actual code.

Configuring our date range and ticker #

To start, let’s import our required Python packages and configure a few key variables:

# import the necessary packages
from datetime import timedelta
from datetime import datetime
import yfinance as yf
import pandas as pd

# set the start and end dates for our market data
end_date = datetime(year=2025, month=3, day=1)
start_date = end_date - timedelta(days=365)

# set the name of the ticker we want to download market data for
ticker = ["NVDA", "AAPL"]

The start_date and end_date parameters define the timeframe for our historical data request, spanning one year of trading data.

We’re using two popular tech stock tickers (NVIDIA and Apple) for this example (but you could use any valid ticker symbols).

Understanding yfinance’s default multi-index structure #

Let’s start by downloading some data using yfinance and examining its default structure:

# download daily market data
df_orig = yf.download(
    tickers=ticker,
    start=start_date,
    end=end_date,
    interval="1d",
    auto_adjust=True,
    progress=False
)
df_orig

Which should give us a DataFrame that looks like the following:

Price	Close		High		Low		Open		Volume
Ticker	AAPL	NVDA	AAPL	NVDA	AAPL	NVDA	AAPL	NVDA	AAPL	NVDA
Date
2024-03-01	178.815674	82.248108	179.681580	82.269104	176.546390	79.405174	178.706190	79.969964	73488000	479135000
2024-03-04	174.277115	85.205002	176.068644	87.662079	172.973259	83.687574	175.322168	84.098415	81510100	615616000
2024-03-05	169.320496	85.935760	171.231471	86.068720	168.822845	83.389600	169.957487	85.241988	95132400	520639000
2024-03-06	168.325180	88.670860	170.435227	89.694526	167.887245	87.001411	170.256065	87.993089	68587700	582520000
2024-03-07	168.205750	92.638550	169.927615	92.736517	167.698152	89.572556	168.355038	90.128373	71765100	608119000
...	...	...	...	...	...	...	...	...	...	...
2025-02-24	247.100006	130.268021	248.860001	138.577254	244.419998	130.068042	244.929993	136.547442	51326400	251381100
2025-02-25	247.039993	126.618355	250.000000	130.188026	244.910004	124.428561	248.000000	129.968045	48013300	271428700
2025-02-26	240.360001	131.267929	244.979996	133.717701	239.130005	128.478192	244.330002	129.978054	44433600	322553800
2025-02-27	237.300003	120.138954	242.460007	134.997581	237.059998	119.998968	239.410004	134.987587	41153600	443175800
2025-02-28	241.839996	124.908508	242.089996	125.078491	230.199997	116.389295	236.949997	118.009141	56833400	389091100

Let’s examine the parameters I’m using with yf.download():

A list of ticker symbols passed to tickers
Daily data frequency with interval="1d"
Price adjustments for splits and dividends via auto_adjust=True
Quiet mode with progress=False to avoid cluttering the output

For more details on downloading market data with yfinance, check out the first tutorial in this series.

The structure of this DataFrame is what I want to focus on — let’s examine the row index first:

# show the original index structure
df_orig.index

This gives us:

DatetimeIndex(['2024-03-01', '2024-03-04', '2024-03-05', '2024-03-06',
               '2024-03-07', '2024-03-08', '2024-03-11', '2024-03-12',
               '2024-03-13', '2024-03-14',
               ...
               '2025-02-14', '2025-02-18', '2025-02-19', '2025-02-20',
               '2025-02-21', '2025-02-24', '2025-02-25', '2025-02-26',
               '2025-02-27', '2025-02-28'],
              dtype='datetime64[ns]', name='Date', length=250, freq=None)

As we can see, the row index is a simple DatetimeIndex with one row per trading day.

Now, let’s look at the columns:

# show the original columns
df_orig.columns

The output reveals the column structure which is considerably more complex than the row structure:

MultiIndex([( 'Close', 'AAPL'),
            ( 'Close', 'NVDA'),
            (  'High', 'AAPL'),
            (  'High', 'NVDA'),
            (   'Low', 'AAPL'),
            (   'Low', 'NVDA'),
            (  'Open', 'AAPL'),
            (  'Open', 'NVDA'),
            ('Volume', 'AAPL'),
            ('Volume', 'NVDA')],
           names=['Price', 'Ticker'])

Here we see that the columns are organized as a MultiIndex with two levels:

The first level represents the price type (Open, High, Low, Close, Volume)
The second level represents the ticker symbol (AAPL, NVDA)

We can access all closing prices for both stocks using:

df_orig["Close"]

Note how both the closing prices for APPL and NVDA are included in the output:

Ticker	AAPL	NVDA
Date
2024-03-01	178.815674	82.248108
2024-03-04	174.277115	85.205002
2024-03-05	169.320496	85.935760
2024-03-06	168.325180	88.670860
2024-03-07	168.205750	92.638550
...	...	...
2025-02-24	247.100006	130.268021
2025-02-25	247.039993	126.618355
2025-02-26	240.360001	131.267929
2025-02-27	237.300003	120.138954
2025-02-28	241.839996	124.908508

Alternatively, we can grab just the closing prices for Apple:

df_orig["Close"]["AAPL"]

Which gives us us a Series rather than a DataFrame:

Date
2024-03-01    178.815674
2024-03-04    174.277115
2024-03-05    169.320496
2024-03-06    168.325180
2024-03-07    168.205750
                 ...    
2025-02-24    247.100006
2025-02-25    247.039993
2025-02-26    240.360001
2025-02-27    237.300003
2025-02-28    241.839996
Name: AAPL, Length: 250, dtype: float64

Pros and cons of yfinance’s default multi-index structure #

While the default yfinance structure works fine for a few tickers, it has some serious limitations:

The number of columns in the DataFrame increases by 5× as a function of the number of tickers
With just 2 tickers, we already have 10 columns (5 price types × 2 tickers)
If you’re tracking 100 stocks, you’d end up with 500 columns, making the DataFrame unwieldy
Visual debugging becomes nearly impossible with that many columns
Computing technical indicators (like moving averages) across multiple stocks becomes more complex than it needs to be

That said, the column-based MultiIndex does have certain advantages. It’s excellent for grabbing all closing prices across tickers in a single operation, making it easier to compute cross-sectional metrics like correlations between stocks or sector performance on a given day.

If your analysis primarily focuses on comparing multiple stocks at specific time points, this structure can be quite convenient.

However, for the types of analysis many traders want to do (including myself) — calculating technical indicators, identifying patterns for individual stocks, or running backtests — the column-based approach quickly becomes unwieldy.

Most technical indicators operate on time series data for a single instrument, not across instruments, making the row-based structure I’m about to show you far more practical.

Updating yfinance’s multi-index structure to use row indexes #

I prefer a row-based MultiIndex approach for several reasons:

It limits the number of columns to just 5 (OHLCV) regardless of how many tickers you track
The row index now has two levels: ticker and date
This approach makes it incredibly easy to use Pandas’ groupby operations for per-ticker calculations
Technical indicators and visualizations become much simpler to implement

Here’s how to transform the default column-based structure to a row-based structure:

# clone the original dataframe
df = df_orig.copy()

# restructure the default multi-index yfinance dataframe by converting from
# wide to long format, renaming the indices, ensuring the columns are provided
# in OHLCV order, reordering the index such that date is first and symbol is
# second, and finally sorting the index itself
df = df.stack(level="Ticker", future_stack=True)
df.index.names = ["Date", "Symbol"]
df = df[["Open", "High", "Low", "Close", "Volume"]]
df = df.swaplevel(0, 1)
df = df.sort_index()
df

Note how the DataFrame structure has changed:

	Price	Open	High	Low	Close	Volume
Symbol	Date
AAPL	2024-03-01	178.706190	179.681580	176.546390	178.815674	73488000
	2024-03-04	175.322168	176.068644	172.973259	174.277115	81510100
	2024-03-05	169.957487	171.231471	168.822845	169.320496	95132400
	2024-03-06	170.256065	170.435227	167.887245	168.325180	68587700
	2024-03-07	168.355038	169.927615	167.698152	168.205750	71765100
...	...	...	...	...	...	...
NVDA	2025-02-24	136.547442	138.577254	130.068042	130.268021	251381100
	2025-02-25	129.968045	130.188026	124.428561	126.618355	271428700
	2025-02-26	129.978054	133.717701	128.478192	131.267929	322553800
	2025-02-27	134.987587	134.997581	119.998968	120.138954	443175800
	2025-02-28	118.009141	125.078491	116.389295	124.908508	389091100

Here’s what each line is doing:

df.stack(level="Ticker", future_stack=True) converts from wide to long format, pivoting the ticker column from the column axis to the index
df.index.names = ["Date", "Symbol"] renames the index levels for clarity
df = df[["Open", "High", "Low", "Close", "Volume"]] ensures the columns are in the standard OHLCV order
df.swaplevel(0, 1) reorders the index so that Symbol comes first, then Date
df.sort_index() sorts the index for efficient access

Let’s examine our new index structure:

# show the updated index structure
df.index

Which gives us:

MultiIndex([('AAPL', '2024-03-01'),
            ('AAPL', '2024-03-04'),
            ('AAPL', '2024-03-05'),
            ('AAPL', '2024-03-06'),
            ('AAPL', '2024-03-07'),
            ('AAPL', '2024-03-08'),
            ('AAPL', '2024-03-11'),
            ('AAPL', '2024-03-12'),
            ('AAPL', '2024-03-13'),
            ('AAPL', '2024-03-14'),
            ...
            ('NVDA', '2025-02-14'),
            ('NVDA', '2025-02-18'),
            ('NVDA', '2025-02-19'),
            ('NVDA', '2025-02-20'),
            ('NVDA', '2025-02-21'),
            ('NVDA', '2025-02-24'),
            ('NVDA', '2025-02-25'),
            ('NVDA', '2025-02-26'),
            ('NVDA', '2025-02-27'),
            ('NVDA', '2025-02-28')],
           names=['Symbol', 'Date'], length=500)

Now, our rows are indexed first by Symbol, then by Date, creating a hierarchical structure.

The columns are now much simpler:

# as well as the updated columns
df.columns

Index(['Open', 'High', 'Low', 'Close', 'Volume'], dtype='object', name='Price')

Just the five standard OHLCV columns, regardless of how many tickers we’re tracking.

Accessing individual tickers using the new row indexes #

If we want to access just the data for a single ticker, we can use the xs (cross-section) method:

# display *just* the subset of AAPL market data
df.xs("AAPL")

Which will give us the OHLCV values for AAPL:

Price	Open	High	Low	Close	Volume
Date
2024-03-01	178.706190	179.681580	176.546390	178.815674	73488000
2024-03-04	175.322168	176.068644	172.973259	174.277115	81510100
2024-03-05	169.957487	171.231471	168.822845	169.320496	95132400
2024-03-06	170.256065	170.435227	167.887245	168.325180	68587700
2024-03-07	168.355038	169.927615	167.698152	168.205750	71765100
...	...	...	...	...	...
2025-02-24	244.929993	248.860001	244.419998	247.100006	51326400
2025-02-25	248.000000	250.000000	244.910004	247.039993	48013300
2025-02-26	244.330002	244.979996	239.130005	240.360001	44433600
2025-02-27	239.410004	242.460007	237.059998	237.300003	41153600
2025-02-28	236.949997	242.089996	230.199997	241.839996	56833400

Performing data analysis operations on our new multi-index DataFrame #

The real power of this structure becomes apparent when we start doing analysis.

Let’s compute a 50-day moving average on the closing prices:

# notice how easy it is to analyze OHLCV data for each of the symbols with
# this structure by (1) first grouping on the symbol, followed by (2) applying
# our analysis method via 'transform'
df["50MA"] = df.groupby(level="Symbol")["Close"].transform(
    lambda x: x.rolling(window=50).mean()
)
df

Which will add a new 50MA column to our DataFrame:

	Price	Open	High	Low	Close	Volume	50MA
Symbol	Date
AAPL	2024-03-01	178.706190	179.681580	176.546390	178.815674	73488000	NaN
	2024-03-04	175.322168	176.068644	172.973259	174.277115	81510100	NaN
	2024-03-05	169.957487	171.231471	168.822845	169.320496	95132400	NaN
	2024-03-06	170.256065	170.435227	167.887245	168.325180	68587700	NaN
	2024-03-07	168.355038	169.927615	167.698152	168.205750	71765100	NaN
...	...	...	...	...	...	...	...
NVDA	2025-02-24	136.547442	138.577254	130.068042	130.268021	251381100	134.377040
	2025-02-25	129.968045	130.188026	124.428561	126.618355	271428700	134.208256
	2025-02-26	129.978054	133.717701	128.478192	131.267929	322553800	134.047671
	2025-02-27	134.987587	134.997581	119.998968	120.138954	443175800	133.703902
	2025-02-28	118.009141	125.078491	116.389295	124.908508	389091100	133.517319

This is where the row-based MultiIndex really shines:

We group the DataFrame by Symbol, creating separate groups for each ticker
We select just the Close price column for all tickers
We apply the transform method to compute a 50-day moving average for each group
The result is stored in a new 50MA column that preserves the original index structure

Let’s verify the calculation for AAPL:

# let's investigate the computed 50MA for AAPL
df.xs("AAPL")["50MA"]

Date
2024-03-01           NaN
2024-03-04           NaN
2024-03-05           NaN
2024-03-06           NaN
2024-03-07           NaN
                 ...    
2025-02-24    240.485810
2025-02-25    240.476652
2025-02-26    240.359467
2025-02-27    240.151713
2025-02-28    240.031363
Name: 50MA, Length: 250, dtype: float64

The first 49 days show NaN values because a 50-day moving average needs at least 50 days of data. Once we have enough data points, the moving average starts calculating properly.

We can confirm that our groupby approach gives the same result as calculating the moving average directly on AAPL’s closing prices:

# now, let's *manually* compute the 50MA for AAPL (i.e., on a series rather
# than a multi-index dataframe)
df.xs("AAPL")["Close"].rolling(window=50).mean()

Date
2024-03-01           NaN
2024-03-04           NaN
2024-03-05           NaN
2024-03-06           NaN
2024-03-07           NaN
                 ...    
2025-02-24    240.485810
2025-02-25    240.476652
2025-02-26    240.359467
2025-02-27    240.151713
2025-02-28    240.031363
Name: Close, Length: 250, dtype: float64

And formally verify they’re identical:

# verify our calculations match
pd.testing.assert_series_equal(
    df.xs("AAPL")["50MA"],
    df.xs("AAPL")["Close"].rolling(window=50).mean(),
    check_names=False
)

No error means the values are identical, so our calculation indeed correct.

Exercises #

Before we wrap up, try these exercises to reinforce what you’ve just learned:

DataFrame scaling test: Download data for 5 different tickers and observe how the DataFrame structure changes and scales with both the default column-based MultiIndex and our row-based approach. Note the differences in column count and overall DataFrame shape.
Technical indicator calculation: Add a 20-day moving average to the DataFrame using the same approach I detailed earlier in this article. Create a visualization comparing both the 20-day and 50-day moving averages against the closing price.
Signal generation: Create a new column that indicates when the 20-day MA crosses above the 50-day MA. Can you also add a column to identify when the 20-day MA crosses below the 50-day MA. Moving average crossovers are typically used as common buy/sell signals.
RSI implementation (Advanced): Calculate the relative strength index (RSI) for each ticker using the row-based MultiIndex structure. You may need to query Google or your favorite LLM to help you learn more about implementing RSI. Test your implementation with different lookback periods (14 days is standard, but try 9 and 21 as well).

Remember, the best way to master these DataFrame manipulations is through hands-on practice—try implementing each example yourself to develop a deeper understanding of how MultiIndex structures can enhance your market analysis workflows.

Final thoughts #

Restructuring yfinance’s default DataFrame to use a row-based MultiIndex provides significant advantages when working with multiple stock tickers:

The number of columns stays constant regardless of how many tickers you track
Group-by operations make it easy to compute technical indicators per ticker
The structure scales gracefully as you add more tickers to your analysis
It’s much easier to visually inspect and debug your data

This approach has been my go-to method for organizing stock market data in Python, and I hope it helps streamline your quantitative analysis workflows too.

In the next tutorial, we’ll look at how to visualize this data using candlestick charts with mplfinance, a stock data plotting library built on matplotlib.

👉 Click here to download the source code to this tutorial