Skip to main content

How to download market data with yfinance and Python

·10 mins
In this tutorial, you’ll learn how to download market data using Python and the yfinance API, the “hello, world” of fintech data access.

There are many market data APIs out there, ranging from dirt-cheap to “do I really need both kidneys?” expensive.

However, when you’re just getting started, nearly everyone begins with yfinance.

It’s free, easy to use, and for basic financial analysis, it’s all you need.

Think of yfinance as your gateway drug to the world of financial data.

Start here, get your bearings, and when you’re ready for the harder stuff, you’ll have a solid foundation to build on.


This tutorial is part 1 in a larger series on getting started with fintech and market analysis with Python:

  1. How to download market data with yfinance and Python (this tutorial)
  2. Rethinking yfinance’s default MultiIndex format
  3. How to plot candlestick charts with Python and mplfinance
  4. How to compute Simple Moving Averages (SMAs) for trading with Python and Pandas
  5. Finding consecutive integer groups in arrays with Python and NumPy
  6. Computing slope of series with Pandas and SciPy
  7. Market stage detection with Python and Pandas
  8. Implementing TradingView’s Stochastic RSI indicator in Python
  9. Introduction to position sizing
  10. Risk/Reward analysis and position sizing with Python

Configuring your development environment #

Before we dive in, let’s set up our Python environment with the packages we’ll need:

$ pip install numpy pandas yfinance matplotlib seaborn

These libraries give us everything we need to fetch, process, and visualize market data:

  • numpy: The foundation of numerical computing in Python
  • pandas: Data manipulation and analysis (crucial for working with time series data)
  • yfinance: Our market data API
  • matplotlib and seaborn: Visualization libraries to create charts and plots

Downloading market data for a single ticker #

Let’s start by importing the necessary packages:

# import the necessary packages
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns
import yfinance as yf

Along with defining the time period to fetch historical market data:

# set the start and end dates for our market data request
end_date = datetime(year=2025, month=3, day=1)
start_date = datetime(year=2023, month=1, day=1)

We’re grabbing about two years of data—enough to give us a decent view of how a stock’s price has evolved over time, without overwhelming our analysis.

Let’s download data for a single ticker, NVIDIA (NVDA):

# set the name of the ticker we want to download market data for
ticker = "NVDA"

# download market data for a single ticker
df_single = yf.download(
    tickers=ticker,
    start=start_date,
    end=end_date,
    interval="1d",
    group_by="ticker",
    auto_adjust=True,
    progress=False
)
df_single

The output DataFrame should look something like this:

TickerNVDA
PriceOpenHighLowCloseVolume
Date
2023-01-0314.83883914.98372114.08445714.303278401277000
2023-01-0414.55507314.84083814.22934014.736923431324000
2023-01-0514.47913514.55207614.13641614.253321389168000
2023-01-0614.46215014.99771214.02251114.846835405044000
2023-01-0915.27148816.04285515.12860415.615206504231000
..................
2025-02-24136.547442138.577254130.068042130.268021251381100
2025-02-25129.968045130.188026124.428561126.618355271428700
2025-02-26129.978054133.717701128.478192131.267929322553800
2025-02-27134.987587134.997581119.998968120.138954443175800
2025-02-28118.009141125.078491116.389295124.908508389091100

The resulting DataFrame has the following columns:

  • Open: Price at market open
  • High: Highest price during the trading session
  • Low: Lowest price during the trading session
  • Close: Price at market close
  • Volume: Number of shares traded

These are the standard columns you’ll see when working with market/trading data, and are often abbreviated “OHLCV” for short.

The parameters we pass to yf.download controls the data we get back, including how it is organized in the output DataFrame:

  • tickers: Symbol(s) to download
  • start: Start date of the history request
  • end: End date of the request
  • interval: Timeframe for each row of data (daily in this case)
  • group_by: How to organize the columns in the DataFrame (either ticker, which will place the ticker name as the top-level column or column, which will place the OHLCV values as the top-level columns, followed by the individual tickers)
  • auto_adjust: Automatically adjusts prices for splits and dividends
  • progress: Whether to display a progress bar (we disabled it)

By setting the interval parameter, we can control what timeframe of data we want to download:

  • 1m: One minute data
  • 2m Two minute data
  • 5m: Five minute data
  • 15m: Fifteen minute data
  • 30m: Thirty minute data
  • 60m: Sixty minute data
  • 90m: Ninety minute data
  • 1d: Daily data (what we’re using)
  • 1wk: Weekly data
  • 1mo: Monthly data

Try changing the interval parameter to see how the DataFrame’s index changes—smaller intervals give you more granular data but cover shorter time periods.

Note that the intraday intervals (i.e, 1m, 2m, 5m, 15m, 30m, 60m, 90m, and 1h) cannot extend past the last 60 days.

Now, let’s examine the column structure of our DataFrame:

# show the column structure
df_single.columns

Which will give us the following:

MultiIndex([('NVDA',   'Open'),
            ('NVDA',   'High'),
            ('NVDA',    'Low'),
            ('NVDA',  'Close'),
            ('NVDA', 'Volume')],
           names=['Ticker', 'Price'])

If you’ve worked with Pandas before, you might be surprised to see that our DataFrame has a MultiIndex for columns, including two levels:

  1. Level 1: The ticker symbol (i.e., NVDA)
  2. Level 2: The Open, High, Low, Close and Volume columns, respectively

As we’ll see in a second, this MultiIndex column structure allows us to easily organize data in the DataFrame when working with multiple symbols.

Downloading market data for multiple tickers #

Let’s expand our analysis to include multiple stocks:

# define the list of tickers we want to fetch market data for
tickers = ["NVDA", "META", "AAPL"]

# download market data for a multiple tickers
df_multi = yf.download(
    tickers=tickers,
    start=start_date,
    end=end_date,
    interval="1d",
    group_by="ticker",
    auto_adjust=True,
    progress=False
)
df_multi

Notice how the DataFrame column structure has changed:

TickerNVDAMETAAAPL
PriceOpenHighLowCloseVolumeOpenHighLowCloseVolumeOpenHighLowCloseVolume
Date
2023-01-0314.83883914.98372114.08445714.303278401277000122.352620125.889114121.814674124.26531235528500128.782649129.395518122.742873123.632530112117500
2023-01-0414.55507314.84083814.22934014.736923431324000126.895264128.558915125.371087126.88530732397100125.431615127.181276123.642420124.90770789113600
2023-01-0514.47913514.55207614.13641614.253321389168000125.650025128.030937124.066079126.45694725447100125.668857126.301500123.326101123.58310780962700
2023-01-0614.46215014.99771214.02251114.846835405044000128.479231129.834056125.560380129.52523827584500124.561732128.792531123.454601128.13023487754700
2023-01-0915.27148816.04285515.12860415.615206504231000130.660897132.444079128.788046128.97732526649100128.970489131.876702128.397153128.65416070790800
................................................
2025-02-24136.547442138.577254130.068042130.268021251381100686.280029687.270020662.450012668.13000515677000244.929993248.860001244.419998247.10000651326400
2025-02-25129.968045130.188026124.428561126.618355271428700665.969971668.000000641.859985657.50000020579700248.000000250.000000244.910004247.03999348013300
2025-02-26129.978054133.717701128.478192131.267929322553800659.650024683.010010658.000000673.70001214488700244.330002244.979996239.130005240.36000144433600
2025-02-27134.987587134.997581119.998968120.138954443175800682.450012688.650024657.570007658.23999012500000239.410004242.460007237.059998237.30000341153600
2025-02-28118.009141125.078491116.389295124.908508389091100658.039978669.630005642.599976668.20001217534200236.949997242.089996230.199997241.83999656833400

Our DataFrame now has 15 columns instead of 5 because we have 3 tickers with 5 data columns each (one for each of the OHLCV values, respectively).

To make this more clear, let’s examine the column structure:

# show the column structure for multiple tickers
df_multi.columns

Which will give us:

MultiIndex([('NVDA',   'Open'),
            ('NVDA',   'High'),
            ('NVDA',    'Low'),
            ('NVDA',  'Close'),
            ('NVDA', 'Volume'),
            ('META',   'Open'),
            ('META',   'High'),
            ('META',    'Low'),
            ('META',  'Close'),
            ('META', 'Volume'),
            ('AAPL',   'Open'),
            ('AAPL',   'High'),
            ('AAPL',    'Low'),
            ('AAPL',  'Close'),
            ('AAPL', 'Volume')],
           names=['Ticker', 'Price'])

Does the MultiIndex column structure make more sense now?

Effectively, this structure allows us to organize our data hierarchically by ticker and price type.

For example, we can grab all AAPL data with:

# access all columns for AAPL
df_multi["AAPL"]
PriceOpenHighLowCloseVolume
Date
2023-01-03128.782649129.395518122.742873123.632530112117500
2023-01-04125.431615127.181276123.642420124.90770789113600
2023-01-05125.668857126.301500123.326101123.58310780962700
2023-01-06124.561732128.792531123.454601128.13023487754700
2023-01-09128.970489131.876702128.397153128.65416070790800
..................
2025-02-24244.929993248.860001244.419998247.10000651326400
2025-02-25248.000000250.000000244.910004247.03999348013300
2025-02-26244.330002244.979996239.130005240.36000144433600
2025-02-27239.410004242.460007237.059998237.30000341153600
2025-02-28236.949997242.089996230.199997241.83999656833400

Or, just can just grab the closing prices:

# access just the closing prices for AAPL
df_multi["AAPL"]["Close"]
Date
2023-01-03    123.632530
2023-01-04    124.907707
2023-01-05    123.583107
2023-01-06    128.130234
2023-01-09    128.654160
                 ...    
2025-02-24    247.100006
2025-02-25    247.039993
2025-02-26    240.360001
2025-02-27    237.300003
2025-02-28    241.839996
Name: Close, Length: 541, dtype: float64

The same is true for the Open, High, Low, and Volume columns as well.

This hierarchical indexing makes it easy to work with data from multiple tickers in a single DataFrame, while still maintaining clear organization.

Plotting closing prices #

Now that we have our data, let’s visualize the closing prices for all three tickers:

# initialize a new figure
plt.figure(figsize=(14, 7))
sns.set(style="whitegrid")

# loop over the tickers
for ticker in tickers:
    # plot the closing price for each
    sns.lineplot(
        data=df_multi[ticker]["Close"],
        label=ticker,
        linewidth=2
    )

# set the plot title
plt.title(
    f"Stock Closing Prices ("
    f"{start_date.strftime('%Y-%m-%d')} "
    f"to {end_date.strftime('%Y-%m-%d')})"
)

# set the plot labels
plt.xlabel("Date")
plt.ylabel("Closing Price ($)")

# finish constructing the plot
plt.tight_layout()
plt.show()

Which will produce a plot that should look like this:

Closing prices

Note that we have three separate line plots (one for each ticker), allowing us to visualize the closing prices over the past ~2 years.

The most important line in the code above is:

data=df_multi[ticker]["Close"]

This is where we’re leveraging our MultiIndex structure.

For each ticker in our list, we’re extracting just the closing prices and passing them to the lineplot function.

The hierarchical indexing makes this selection clean and intuitive.

Other stock market data APIs #

While yfinance is a great starting point, as you advance in your market analysis journey, you might want to explore other data sources. Here are a few APIs I’ve personally used and recommend.

Polygon.io #

Polygon offers institutional-grade financial data with impressive historical coverage and reliability. Their API is well-documented and provides access to stocks, options, forex, and crypto data. They offer tiered pricing plans, including a generous free tier that’s perfect for getting started.

What I love about Polygon is their data quality and consistency—critical factors when you’re building trading algorithms that need to make decisions based on clean, accurate information.

Financial Modeling Prep API #

The Financial Modeling Prep API provides a comprehensive suite of financial data beyond just market prices. You can access company fundamentals, financial statements, economic indicators, and more. Their pricing is reasonable, making it accessible for individual traders and small teams.

Financial Modeling Prep is particularly useful when you need to combine price action with fundamental analysis—perfect for those longer-term investment strategies.

EOD Historical Data API #

EODHD (End of Day Historical Data) API offers a solid mix of price data, fundamentals, and alternative datasets. Their global market coverage is impressive, with data for stocks, ETFs, mutual funds, and more from over 60 exchanges worldwide.

I’ve found their API to be reliable and straightforward to use, with a pricing structure that scales well as your needs grow.

Exercises #

Before we wrap up, try these exercises to reinforce what you’ve just learned:

  1. Interval exploration: Download AAPL stock data using different intervals (try 1d, 1wk, and 1mo). Compare how the resulting DataFrames differ in size and time coverage.

  2. Date range manipulation: Experiment with different start and end dates. Try fetching data for:

    • The last 6 months
    • A specific calendar year (like 2024)
    • A period spanning a major market event (such as the 2008 housing market crash, or during the first six months of COVID)
  3. Group by experiment: Download data for multiple tickers (at least 3) and try changing the group_by parameter:

    • Set group_by="ticker" (what we used above)
    • Set group_by="column" and observe how the DataFrame structure changes
    • Which organization method do you find more intuitive?
  4. Advanced challenge: Download data for the NASDAQ-100 index (^NDX), the S&P 500 (^GSPC), and three stocks of your choosing. Create a visualization comparing their performance over the last year, with prices normalized to the same starting value.

Remember, the best way to learn is by doing—experiment with different parameters and see how they affect the data you receive.

Final thoughts #

Congratulations, you’ve now learned the basics of fetching market data with yfinance!

The MultiIndex structure used by yfinance provides a powerful way to organize financial time series data, especially when working with multiple securities (although, it does have its limitations, as we’ll see in the next article in the series).

In the next tutorial, we’ll dive deeper into advanced DataFrame structures for stock market data, building on what we’ve learned here.

Download the source code to this tutorial #

👉 Click here to download the source code to this tutorial