Skip to main content

Rethinking yfinance's default MultiIndex format

·12 mins
In this tutorial, I’ll show you how to organize stock market data using a row-based MultiIndex DataFrame structure in Pandas, and why I prefer this approach over the default column-based structure that yfinance provides.

When I first started working with financial data in Python, I quickly discovered that the default DataFrame structure from yfinance wasn’t ideal for the kind of analysis I wanted to do.

The solution?

A simple reshaping trick that makes working with multiple stock tickers dramatically easier.

Read on to learn more.


This tutorial is part 2 in my series on getting started with fintech and market analysis with Python:

  1. How to download market data with yfinance and Python
  2. Rethinking yfinance’s default MultiIndex format (this tutorial)
  3. How to plot candlestick charts with Python and mplfinance
  4. How to compute Simple Moving Averages (SMAs) for trading with Python and Pandas
  5. Finding consecutive integer groups in arrays with Python and NumPy
  6. Computing slope of series with Pandas and SciPy
  7. Market stage detection with Python and Pandas
  8. Implementing TradingView’s Stochastic RSI indicator in Python
  9. Introduction to position sizing
  10. Risk/Reward analysis and position sizing with Python

Configuring your development environment #

Before we dive in, let’s set up our Python environment with the packages we’ll need:

$ pip install numpy pandas yfinance

The packages we’re installing include:

  • numpy: The fundamental package for scientific computing in Python
  • pandas: Provides high-performance, easy-to-use data structures and data analysis tools
  • yfinance: A library that lets us download market data from Yahoo Finance

With our development environment configured, we can move on to writing some actual code.

Configuring our date range and ticker #

To start, let’s import our required Python packages and configure a few key variables:

# import the necessary packages
from datetime import timedelta
from datetime import datetime
import yfinance as yf
import pandas as pd

# set the start and end dates for our market data
end_date = datetime(year=2025, month=3, day=1)
start_date = end_date - timedelta(days=365)

# set the name of the ticker we want to download market data for
ticker = ["NVDA", "AAPL"]

The start_date and end_date parameters define the timeframe for our historical data request, spanning one year of trading data.

We’re using two popular tech stock tickers (NVIDIA and Apple) for this example (but you could use any valid ticker symbols).

Understanding yfinance’s default multi-index structure #

Let’s start by downloading some data using yfinance and examining its default structure:

# download daily market data
df_orig = yf.download(
    tickers=ticker,
    start=start_date,
    end=end_date,
    interval="1d",
    auto_adjust=True,
    progress=False
)
df_orig

Which should give us a DataFrame that looks like the following:

PriceCloseHighLowOpenVolume
TickerAAPLNVDAAAPLNVDAAAPLNVDAAAPLNVDAAAPLNVDA
Date
2024-03-01178.81567482.248108179.68158082.269104176.54639079.405174178.70619079.96996473488000479135000
2024-03-04174.27711585.205002176.06864487.662079172.97325983.687574175.32216884.09841581510100615616000
2024-03-05169.32049685.935760171.23147186.068720168.82284583.389600169.95748785.24198895132400520639000
2024-03-06168.32518088.670860170.43522789.694526167.88724587.001411170.25606587.99308968587700582520000
2024-03-07168.20575092.638550169.92761592.736517167.69815289.572556168.35503890.12837371765100608119000
.................................
2025-02-24247.100006130.268021248.860001138.577254244.419998130.068042244.929993136.54744251326400251381100
2025-02-25247.039993126.618355250.000000130.188026244.910004124.428561248.000000129.96804548013300271428700
2025-02-26240.360001131.267929244.979996133.717701239.130005128.478192244.330002129.97805444433600322553800
2025-02-27237.300003120.138954242.460007134.997581237.059998119.998968239.410004134.98758741153600443175800
2025-02-28241.839996124.908508242.089996125.078491230.199997116.389295236.949997118.00914156833400389091100

Let’s examine the parameters I’m using with yf.download():

  • A list of ticker symbols passed to tickers
  • Daily data frequency with interval="1d"
  • Price adjustments for splits and dividends via auto_adjust=True
  • Quiet mode with progress=False to avoid cluttering the output
For more details on downloading market data with yfinance, check out the first tutorial in this series.

The structure of this DataFrame is what I want to focus on — let’s examine the row index first:

# show the original index structure
df_orig.index

This gives us:

DatetimeIndex(['2024-03-01', '2024-03-04', '2024-03-05', '2024-03-06',
               '2024-03-07', '2024-03-08', '2024-03-11', '2024-03-12',
               '2024-03-13', '2024-03-14',
               ...
               '2025-02-14', '2025-02-18', '2025-02-19', '2025-02-20',
               '2025-02-21', '2025-02-24', '2025-02-25', '2025-02-26',
               '2025-02-27', '2025-02-28'],
              dtype='datetime64[ns]', name='Date', length=250, freq=None)

As we can see, the row index is a simple DatetimeIndex with one row per trading day.

Now, let’s look at the columns:

# show the original columns
df_orig.columns

The output reveals the column structure which is considerably more complex than the row structure:

MultiIndex([( 'Close', 'AAPL'),
            ( 'Close', 'NVDA'),
            (  'High', 'AAPL'),
            (  'High', 'NVDA'),
            (   'Low', 'AAPL'),
            (   'Low', 'NVDA'),
            (  'Open', 'AAPL'),
            (  'Open', 'NVDA'),
            ('Volume', 'AAPL'),
            ('Volume', 'NVDA')],
           names=['Price', 'Ticker'])

Here we see that the columns are organized as a MultiIndex with two levels:

  • The first level represents the price type (Open, High, Low, Close, Volume)
  • The second level represents the ticker symbol (AAPL, NVDA)

We can access all closing prices for both stocks using:

df_orig["Close"]

Note how both the closing prices for APPL and NVDA are included in the output:

TickerAAPLNVDA
Date
2024-03-01178.81567482.248108
2024-03-04174.27711585.205002
2024-03-05169.32049685.935760
2024-03-06168.32518088.670860
2024-03-07168.20575092.638550
.........
2025-02-24247.100006130.268021
2025-02-25247.039993126.618355
2025-02-26240.360001131.267929
2025-02-27237.300003120.138954
2025-02-28241.839996124.908508

Alternatively, we can grab just the closing prices for Apple:

df_orig["Close"]["AAPL"]

Which gives us us a Series rather than a DataFrame:

Date
2024-03-01    178.815674
2024-03-04    174.277115
2024-03-05    169.320496
2024-03-06    168.325180
2024-03-07    168.205750
                 ...    
2025-02-24    247.100006
2025-02-25    247.039993
2025-02-26    240.360001
2025-02-27    237.300003
2025-02-28    241.839996
Name: AAPL, Length: 250, dtype: float64

Pros and cons of yfinance’s default multi-index structure #

While the default yfinance structure works fine for a few tickers, it has some serious limitations:

  • The number of columns in the DataFrame increases by 5× as a function of the number of tickers
  • With just 2 tickers, we already have 10 columns (5 price types × 2 tickers)
  • If you’re tracking 100 stocks, you’d end up with 500 columns, making the DataFrame unwieldy
  • Visual debugging becomes nearly impossible with that many columns
  • Computing technical indicators (like moving averages) across multiple stocks becomes more complex than it needs to be

That said, the column-based MultiIndex does have certain advantages. It’s excellent for grabbing all closing prices across tickers in a single operation, making it easier to compute cross-sectional metrics like correlations between stocks or sector performance on a given day.

If your analysis primarily focuses on comparing multiple stocks at specific time points, this structure can be quite convenient.

However, for the types of analysis many traders want to do (including myself) — calculating technical indicators, identifying patterns for individual stocks, or running backtests — the column-based approach quickly becomes unwieldy.

Most technical indicators operate on time series data for a single instrument, not across instruments, making the row-based structure I’m about to show you far more practical.

Updating yfinance’s multi-index structure to use row indexes #

I prefer a row-based MultiIndex approach for several reasons:

  • It limits the number of columns to just 5 (OHLCV) regardless of how many tickers you track
  • The row index now has two levels: ticker and date
  • This approach makes it incredibly easy to use Pandas’ groupby operations for per-ticker calculations
  • Technical indicators and visualizations become much simpler to implement

Here’s how to transform the default column-based structure to a row-based structure:

# clone the original dataframe
df = df_orig.copy()

# restructure the default multi-index yfinance dataframe by converting from
# wide to long format, renaming the indices, ensuring the columns are provided
# in OHLCV order, reordering the index such that date is first and symbol is
# second, and finally sorting the index itself
df = df.stack(level="Ticker", future_stack=True)
df.index.names = ["Date", "Symbol"]
df = df[["Open", "High", "Low", "Close", "Volume"]]
df = df.swaplevel(0, 1)
df = df.sort_index()
df

Note how the DataFrame structure has changed:

PriceOpenHighLowCloseVolume
SymbolDate
AAPL2024-03-01178.706190179.681580176.546390178.81567473488000
2024-03-04175.322168176.068644172.973259174.27711581510100
2024-03-05169.957487171.231471168.822845169.32049695132400
2024-03-06170.256065170.435227167.887245168.32518068587700
2024-03-07168.355038169.927615167.698152168.20575071765100
.....................
NVDA2025-02-24136.547442138.577254130.068042130.268021251381100
2025-02-25129.968045130.188026124.428561126.618355271428700
2025-02-26129.978054133.717701128.478192131.267929322553800
2025-02-27134.987587134.997581119.998968120.138954443175800
2025-02-28118.009141125.078491116.389295124.908508389091100

Here’s what each line is doing:

  1. df.stack(level="Ticker", future_stack=True) converts from wide to long format, pivoting the ticker column from the column axis to the index
  2. df.index.names = ["Date", "Symbol"] renames the index levels for clarity
  3. df = df[["Open", "High", "Low", "Close", "Volume"]] ensures the columns are in the standard OHLCV order
  4. df.swaplevel(0, 1) reorders the index so that Symbol comes first, then Date
  5. df.sort_index() sorts the index for efficient access

Let’s examine our new index structure:

# show the updated index structure
df.index

Which gives us:

MultiIndex([('AAPL', '2024-03-01'),
            ('AAPL', '2024-03-04'),
            ('AAPL', '2024-03-05'),
            ('AAPL', '2024-03-06'),
            ('AAPL', '2024-03-07'),
            ('AAPL', '2024-03-08'),
            ('AAPL', '2024-03-11'),
            ('AAPL', '2024-03-12'),
            ('AAPL', '2024-03-13'),
            ('AAPL', '2024-03-14'),
            ...
            ('NVDA', '2025-02-14'),
            ('NVDA', '2025-02-18'),
            ('NVDA', '2025-02-19'),
            ('NVDA', '2025-02-20'),
            ('NVDA', '2025-02-21'),
            ('NVDA', '2025-02-24'),
            ('NVDA', '2025-02-25'),
            ('NVDA', '2025-02-26'),
            ('NVDA', '2025-02-27'),
            ('NVDA', '2025-02-28')],
           names=['Symbol', 'Date'], length=500)

Now, our rows are indexed first by Symbol, then by Date, creating a hierarchical structure.

The columns are now much simpler:

# as well as the updated columns
df.columns
Index(['Open', 'High', 'Low', 'Close', 'Volume'], dtype='object', name='Price')

Just the five standard OHLCV columns, regardless of how many tickers we’re tracking.

Accessing individual tickers using the new row indexes #

If we want to access just the data for a single ticker, we can use the xs (cross-section) method:

# display *just* the subset of AAPL market data
df.xs("AAPL")

Which will give us the OHLCV values for AAPL:

PriceOpenHighLowCloseVolume
Date
2024-03-01178.706190179.681580176.546390178.81567473488000
2024-03-04175.322168176.068644172.973259174.27711581510100
2024-03-05169.957487171.231471168.822845169.32049695132400
2024-03-06170.256065170.435227167.887245168.32518068587700
2024-03-07168.355038169.927615167.698152168.20575071765100
..................
2025-02-24244.929993248.860001244.419998247.10000651326400
2025-02-25248.000000250.000000244.910004247.03999348013300
2025-02-26244.330002244.979996239.130005240.36000144433600
2025-02-27239.410004242.460007237.059998237.30000341153600
2025-02-28236.949997242.089996230.199997241.83999656833400

Performing data analysis operations on our new multi-index DataFrame #

The real power of this structure becomes apparent when we start doing analysis.

Let’s compute a 50-day moving average on the closing prices:

# notice how easy it is to analyze OHLCV data for each of the symbols with
# this structure by (1) first grouping on the symbol, followed by (2) applying
# our analysis method via 'transform'
df["50MA"] = df.groupby(level="Symbol")["Close"].transform(
    lambda x: x.rolling(window=50).mean()
)
df

Which will add a new 50MA column to our DataFrame:

PriceOpenHighLowCloseVolume50MA
SymbolDate
AAPL2024-03-01178.706190179.681580176.546390178.81567473488000NaN
2024-03-04175.322168176.068644172.973259174.27711581510100NaN
2024-03-05169.957487171.231471168.822845169.32049695132400NaN
2024-03-06170.256065170.435227167.887245168.32518068587700NaN
2024-03-07168.355038169.927615167.698152168.20575071765100NaN
........................
NVDA2025-02-24136.547442138.577254130.068042130.268021251381100134.377040
2025-02-25129.968045130.188026124.428561126.618355271428700134.208256
2025-02-26129.978054133.717701128.478192131.267929322553800134.047671
2025-02-27134.987587134.997581119.998968120.138954443175800133.703902
2025-02-28118.009141125.078491116.389295124.908508389091100133.517319

This is where the row-based MultiIndex really shines:

  1. We group the DataFrame by Symbol, creating separate groups for each ticker
  2. We select just the Close price column for all tickers
  3. We apply the transform method to compute a 50-day moving average for each group
  4. The result is stored in a new 50MA column that preserves the original index structure

Let’s verify the calculation for AAPL:

# let's investigate the computed 50MA for AAPL
df.xs("AAPL")["50MA"]
Date
2024-03-01           NaN
2024-03-04           NaN
2024-03-05           NaN
2024-03-06           NaN
2024-03-07           NaN
                 ...    
2025-02-24    240.485810
2025-02-25    240.476652
2025-02-26    240.359467
2025-02-27    240.151713
2025-02-28    240.031363
Name: 50MA, Length: 250, dtype: float64

The first 49 days show NaN values because a 50-day moving average needs at least 50 days of data. Once we have enough data points, the moving average starts calculating properly.

We can confirm that our groupby approach gives the same result as calculating the moving average directly on AAPL’s closing prices:

# now, let's *manually* compute the 50MA for AAPL (i.e., on a series rather
# than a multi-index dataframe)
df.xs("AAPL")["Close"].rolling(window=50).mean()
Date
2024-03-01           NaN
2024-03-04           NaN
2024-03-05           NaN
2024-03-06           NaN
2024-03-07           NaN
                 ...    
2025-02-24    240.485810
2025-02-25    240.476652
2025-02-26    240.359467
2025-02-27    240.151713
2025-02-28    240.031363
Name: Close, Length: 250, dtype: float64

And formally verify they’re identical:

# verify our calculations match
pd.testing.assert_series_equal(
    df.xs("AAPL")["50MA"],
    df.xs("AAPL")["Close"].rolling(window=50).mean(),
    check_names=False
)

No error means the values are identical, so our calculation indeed correct.

Exercises #

Before we wrap up, try these exercises to reinforce what you’ve just learned:

  1. DataFrame scaling test: Download data for 5 different tickers and observe how the DataFrame structure changes and scales with both the default column-based MultiIndex and our row-based approach. Note the differences in column count and overall DataFrame shape.

  2. Technical indicator calculation: Add a 20-day moving average to the DataFrame using the same approach I detailed earlier in this article. Create a visualization comparing both the 20-day and 50-day moving averages against the closing price.

  3. Signal generation: Create a new column that indicates when the 20-day MA crosses above the 50-day MA. Can you also add a column to identify when the 20-day MA crosses below the 50-day MA. Moving average crossovers are typically used as common buy/sell signals.

  4. RSI implementation (Advanced): Calculate the relative strength index (RSI) for each ticker using the row-based MultiIndex structure. You may need to query Google or your favorite LLM to help you learn more about implementing RSI. Test your implementation with different lookback periods (14 days is standard, but try 9 and 21 as well).

Remember, the best way to master these DataFrame manipulations is through hands-on practice—try implementing each example yourself to develop a deeper understanding of how MultiIndex structures can enhance your market analysis workflows.

Final thoughts #

Restructuring yfinance’s default DataFrame to use a row-based MultiIndex provides significant advantages when working with multiple stock tickers:

  • The number of columns stays constant regardless of how many tickers you track
  • Group-by operations make it easy to compute technical indicators per ticker
  • The structure scales gracefully as you add more tickers to your analysis
  • It’s much easier to visually inspect and debug your data

This approach has been my go-to method for organizing stock market data in Python, and I hope it helps streamline your quantitative analysis workflows too.

In the next tutorial, we’ll look at how to visualize this data using candlestick charts with mplfinance, a stock data plotting library built on matplotlib.

👉 Click here to download the source code to this tutorial