How to download market data with yfinance and Python

Table of Contents

In this tutorial, you’ll learn how to download market data using Python and the yfinance API, the “hello, world” of fintech data access.

There are many market data APIs out there, ranging from dirt-cheap to “do I really need both kidneys?” expensive.

However, when you’re just getting started, nearly everyone begins with yfinance.

It’s free, easy to use, and for basic financial analysis, it’s all you need.

Think of yfinance as your gateway drug to the world of financial data.

Start here, get your bearings, and when you’re ready for the harder stuff, you’ll have a solid foundation to build on.

This tutorial is part 1 in a larger series on getting started with fintech and market analysis with Python:

How to download market data with yfinance and Python (this tutorial)
Rethinking yfinance’s default MultiIndex format
How to plot candlestick charts with Python and mplfinance
How to compute Simple Moving Averages (SMAs) for trading with Python and Pandas
Finding consecutive integer groups in arrays with Python and NumPy
Computing slope of series with Pandas and SciPy
Market stage detection with Python and Pandas
Implementing TradingView’s Stochastic RSI indicator in Python
Introduction to position sizing
Risk/Reward analysis and position sizing with Python

Configuring your development environment #

Before we dive in, let’s set up our Python environment with the packages we’ll need:

$ pip install numpy pandas yfinance matplotlib seaborn

These libraries give us everything we need to fetch, process, and visualize market data:

numpy: The foundation of numerical computing in Python
pandas: Data manipulation and analysis (crucial for working with time series data)
yfinance: Our market data API
matplotlib and seaborn: Visualization libraries to create charts and plots

Downloading market data for a single ticker #

Let’s start by importing the necessary packages:

# import the necessary packages
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns
import yfinance as yf

Along with defining the time period to fetch historical market data:

# set the start and end dates for our market data request
end_date = datetime(year=2025, month=3, day=1)
start_date = datetime(year=2023, month=1, day=1)

We’re grabbing about two years of data—enough to give us a decent view of how a stock’s price has evolved over time, without overwhelming our analysis.

Let’s download data for a single ticker, NVIDIA (NVDA):

# set the name of the ticker we want to download market data for
ticker = "NVDA"

# download market data for a single ticker
df_single = yf.download(
    tickers=ticker,
    start=start_date,
    end=end_date,
    interval="1d",
    group_by="ticker",
    auto_adjust=True,
    progress=False
)
df_single

The output DataFrame should look something like this:

Ticker	NVDA
Price	Open	High	Low	Close	Volume
Date
2023-01-03	14.838839	14.983721	14.084457	14.303278	401277000
2023-01-04	14.555073	14.840838	14.229340	14.736923	431324000
2023-01-05	14.479135	14.552076	14.136416	14.253321	389168000
2023-01-06	14.462150	14.997712	14.022511	14.846835	405044000
2023-01-09	15.271488	16.042855	15.128604	15.615206	504231000
...	...	...	...	...	...
2025-02-24	136.547442	138.577254	130.068042	130.268021	251381100
2025-02-25	129.968045	130.188026	124.428561	126.618355	271428700
2025-02-26	129.978054	133.717701	128.478192	131.267929	322553800
2025-02-27	134.987587	134.997581	119.998968	120.138954	443175800
2025-02-28	118.009141	125.078491	116.389295	124.908508	389091100

The resulting DataFrame has the following columns:

Open: Price at market open
High: Highest price during the trading session
Low: Lowest price during the trading session
Close: Price at market close
Volume: Number of shares traded

These are the standard columns you’ll see when working with market/trading data, and are often abbreviated “OHLCV” for short.

The parameters we pass to yf.download controls the data we get back, including how it is organized in the output DataFrame:

tickers: Symbol(s) to download
start: Start date of the history request
end: End date of the request
interval: Timeframe for each row of data (daily in this case)
group_by: How to organize the columns in the DataFrame (either ticker, which will place the ticker name as the top-level column or column, which will place the OHLCV values as the top-level columns, followed by the individual tickers)
auto_adjust: Automatically adjusts prices for splits and dividends
progress: Whether to display a progress bar (we disabled it)

By setting the interval parameter, we can control what timeframe of data we want to download:

1m: One minute data
2m Two minute data
5m: Five minute data
15m: Fifteen minute data
30m: Thirty minute data
60m: Sixty minute data
90m: Ninety minute data
1d: Daily data (what we’re using)
1wk: Weekly data
1mo: Monthly data

Try changing the interval parameter to see how the DataFrame’s index changes—smaller intervals give you more granular data but cover shorter time periods.

Note that the intraday intervals (i.e, 1m, 2m, 5m, 15m, 30m, 60m, 90m, and 1h) cannot extend past the last 60 days.

Now, let’s examine the column structure of our DataFrame:

# show the column structure
df_single.columns

Which will give us the following:

MultiIndex([('NVDA',   'Open'),
            ('NVDA',   'High'),
            ('NVDA',    'Low'),
            ('NVDA',  'Close'),
            ('NVDA', 'Volume')],
           names=['Ticker', 'Price'])

If you’ve worked with Pandas before, you might be surprised to see that our DataFrame has a MultiIndex for columns, including two levels:

Level 1: The ticker symbol (i.e., NVDA)
Level 2: The Open, High, Low, Close and Volume columns, respectively

As we’ll see in a second, this MultiIndex column structure allows us to easily organize data in the DataFrame when working with multiple symbols.

Downloading market data for multiple tickers #

Let’s expand our analysis to include multiple stocks:

# define the list of tickers we want to fetch market data for
tickers = ["NVDA", "META", "AAPL"]

# download market data for a multiple tickers
df_multi = yf.download(
    tickers=tickers,
    start=start_date,
    end=end_date,
    interval="1d",
    group_by="ticker",
    auto_adjust=True,
    progress=False
)
df_multi

Notice how the DataFrame column structure has changed:

Ticker	NVDA					META					AAPL
Price	Open	High	Low	Close	Volume	Open	High	Low	Close	Volume	Open	High	Low	Close	Volume
Date
2023-01-03	14.838839	14.983721	14.084457	14.303278	401277000	122.352620	125.889114	121.814674	124.265312	35528500	128.782649	129.395518	122.742873	123.632530	112117500
2023-01-04	14.555073	14.840838	14.229340	14.736923	431324000	126.895264	128.558915	125.371087	126.885307	32397100	125.431615	127.181276	123.642420	124.907707	89113600
2023-01-05	14.479135	14.552076	14.136416	14.253321	389168000	125.650025	128.030937	124.066079	126.456947	25447100	125.668857	126.301500	123.326101	123.583107	80962700
2023-01-06	14.462150	14.997712	14.022511	14.846835	405044000	128.479231	129.834056	125.560380	129.525238	27584500	124.561732	128.792531	123.454601	128.130234	87754700
2023-01-09	15.271488	16.042855	15.128604	15.615206	504231000	130.660897	132.444079	128.788046	128.977325	26649100	128.970489	131.876702	128.397153	128.654160	70790800
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2025-02-24	136.547442	138.577254	130.068042	130.268021	251381100	686.280029	687.270020	662.450012	668.130005	15677000	244.929993	248.860001	244.419998	247.100006	51326400
2025-02-25	129.968045	130.188026	124.428561	126.618355	271428700	665.969971	668.000000	641.859985	657.500000	20579700	248.000000	250.000000	244.910004	247.039993	48013300
2025-02-26	129.978054	133.717701	128.478192	131.267929	322553800	659.650024	683.010010	658.000000	673.700012	14488700	244.330002	244.979996	239.130005	240.360001	44433600
2025-02-27	134.987587	134.997581	119.998968	120.138954	443175800	682.450012	688.650024	657.570007	658.239990	12500000	239.410004	242.460007	237.059998	237.300003	41153600
2025-02-28	118.009141	125.078491	116.389295	124.908508	389091100	658.039978	669.630005	642.599976	668.200012	17534200	236.949997	242.089996	230.199997	241.839996	56833400

Our DataFrame now has 15 columns instead of 5 because we have 3 tickers with 5 data columns each (one for each of the OHLCV values, respectively).

To make this more clear, let’s examine the column structure:

# show the column structure for multiple tickers
df_multi.columns

Which will give us:

MultiIndex([('NVDA',   'Open'),
            ('NVDA',   'High'),
            ('NVDA',    'Low'),
            ('NVDA',  'Close'),
            ('NVDA', 'Volume'),
            ('META',   'Open'),
            ('META',   'High'),
            ('META',    'Low'),
            ('META',  'Close'),
            ('META', 'Volume'),
            ('AAPL',   'Open'),
            ('AAPL',   'High'),
            ('AAPL',    'Low'),
            ('AAPL',  'Close'),
            ('AAPL', 'Volume')],
           names=['Ticker', 'Price'])

Does the MultiIndex column structure make more sense now?

Effectively, this structure allows us to organize our data hierarchically by ticker and price type.

For example, we can grab all AAPL data with:

# access all columns for AAPL
df_multi["AAPL"]

Price	Open	High	Low	Close	Volume
Date
2023-01-03	128.782649	129.395518	122.742873	123.632530	112117500
2023-01-04	125.431615	127.181276	123.642420	124.907707	89113600
2023-01-05	125.668857	126.301500	123.326101	123.583107	80962700
2023-01-06	124.561732	128.792531	123.454601	128.130234	87754700
2023-01-09	128.970489	131.876702	128.397153	128.654160	70790800
...	...	...	...	...	...
2025-02-24	244.929993	248.860001	244.419998	247.100006	51326400
2025-02-25	248.000000	250.000000	244.910004	247.039993	48013300
2025-02-26	244.330002	244.979996	239.130005	240.360001	44433600
2025-02-27	239.410004	242.460007	237.059998	237.300003	41153600
2025-02-28	236.949997	242.089996	230.199997	241.839996	56833400

Or, just can just grab the closing prices:

# access just the closing prices for AAPL
df_multi["AAPL"]["Close"]

Date
2023-01-03    123.632530
2023-01-04    124.907707
2023-01-05    123.583107
2023-01-06    128.130234
2023-01-09    128.654160
                 ...    
2025-02-24    247.100006
2025-02-25    247.039993
2025-02-26    240.360001
2025-02-27    237.300003
2025-02-28    241.839996
Name: Close, Length: 541, dtype: float64

The same is true for the Open, High, Low, and Volume columns as well.

This hierarchical indexing makes it easy to work with data from multiple tickers in a single DataFrame, while still maintaining clear organization.

Plotting closing prices #

Now that we have our data, let’s visualize the closing prices for all three tickers:

# initialize a new figure
plt.figure(figsize=(14, 7))
sns.set(style="whitegrid")

# loop over the tickers
for ticker in tickers:
    # plot the closing price for each
    sns.lineplot(
        data=df_multi[ticker]["Close"],
        label=ticker,
        linewidth=2
    )

# set the plot title
plt.title(
    f"Stock Closing Prices ("
    f"{start_date.strftime('%Y-%m-%d')} "
    f"to {end_date.strftime('%Y-%m-%d')})"
)

# set the plot labels
plt.xlabel("Date")
plt.ylabel("Closing Price ($)")

# finish constructing the plot
plt.tight_layout()
plt.show()

Which will produce a plot that should look like this:

Note that we have three separate line plots (one for each ticker), allowing us to visualize the closing prices over the past ~2 years.

The most important line in the code above is:

data=df_multi[ticker]["Close"]

This is where we’re leveraging our MultiIndex structure.

For each ticker in our list, we’re extracting just the closing prices and passing them to the lineplot function.

The hierarchical indexing makes this selection clean and intuitive.

Other stock market data APIs #

While yfinance is a great starting point, as you advance in your market analysis journey, you might want to explore other data sources. Here are a few APIs I’ve personally used and recommend.

Polygon.io #

Polygon offers institutional-grade financial data with impressive historical coverage and reliability. Their API is well-documented and provides access to stocks, options, forex, and crypto data. They offer tiered pricing plans, including a generous free tier that’s perfect for getting started.

What I love about Polygon is their data quality and consistency—critical factors when you’re building trading algorithms that need to make decisions based on clean, accurate information.

Financial Modeling Prep API #

The Financial Modeling Prep API provides a comprehensive suite of financial data beyond just market prices. You can access company fundamentals, financial statements, economic indicators, and more. Their pricing is reasonable, making it accessible for individual traders and small teams.

Financial Modeling Prep is particularly useful when you need to combine price action with fundamental analysis—perfect for those longer-term investment strategies.

EOD Historical Data API #

EODHD (End of Day Historical Data) API offers a solid mix of price data, fundamentals, and alternative datasets. Their global market coverage is impressive, with data for stocks, ETFs, mutual funds, and more from over 60 exchanges worldwide.

I’ve found their API to be reliable and straightforward to use, with a pricing structure that scales well as your needs grow.

Exercises #

Before we wrap up, try these exercises to reinforce what you’ve just learned:

Interval exploration: Download AAPL stock data using different intervals (try 1d, 1wk, and 1mo). Compare how the resulting DataFrames differ in size and time coverage.
Date range manipulation: Experiment with different start and end dates. Try fetching data for:
- The last 6 months
- A specific calendar year (like 2024)
- A period spanning a major market event (such as the 2008 housing market crash, or during the first six months of COVID)
Group by experiment: Download data for multiple tickers (at least 3) and try changing the group_by parameter:
- Set group_by="ticker" (what we used above)
- Set group_by="column" and observe how the DataFrame structure changes
- Which organization method do you find more intuitive?
Advanced challenge: Download data for the NASDAQ-100 index (^NDX), the S&P 500 (^GSPC), and three stocks of your choosing. Create a visualization comparing their performance over the last year, with prices normalized to the same starting value.

Remember, the best way to learn is by doing—experiment with different parameters and see how they affect the data you receive.

Final thoughts #

Congratulations, you’ve now learned the basics of fetching market data with yfinance!

The MultiIndex structure used by yfinance provides a powerful way to organize financial time series data, especially when working with multiple securities (although, it does have its limitations, as we’ll see in the next article in the series).

In the next tutorial, we’ll dive deeper into advanced DataFrame structures for stock market data, building on what we’ve learned here.

Download the source code to this tutorial #

👉 Click here to download the source code to this tutorial