How to download market data with yfinance and Python

Table of Contents
There are many market data APIs out there, ranging from dirt-cheap to “do I really need both kidneys?” expensive.
However, when you’re just getting started, nearly everyone begins with yfinance.
It’s free, easy to use, and for basic financial analysis, it’s all you need.
Think of yfinance
as your gateway drug to the world of financial data.
Start here, get your bearings, and when you’re ready for the harder stuff, you’ll have a solid foundation to build on.
This tutorial is part 1 in a larger series on getting started with fintech and market analysis with Python:
- How to download market data with yfinance and Python (this tutorial)
- Rethinking yfinance’s default MultiIndex format
- How to plot candlestick charts with Python and mplfinance
- How to compute Simple Moving Averages (SMAs) for trading with Python and Pandas
- Finding consecutive integer groups in arrays with Python and NumPy
- Computing slope of series with Pandas and SciPy
- Market stage detection with Python and Pandas
- Implementing TradingView’s Stochastic RSI indicator in Python
- Introduction to position sizing
- Risk/Reward analysis and position sizing with Python
Configuring your development environment #
Before we dive in, let’s set up our Python environment with the packages we’ll need:
$ pip install numpy pandas yfinance matplotlib seaborn
These libraries give us everything we need to fetch, process, and visualize market data:
numpy
: The foundation of numerical computing in Pythonpandas
: Data manipulation and analysis (crucial for working with time series data)yfinance
: Our market data APImatplotlib
andseaborn
: Visualization libraries to create charts and plots
Downloading market data for a single ticker #
Let’s start by importing the necessary packages:
# import the necessary packages
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns
import yfinance as yf
Along with defining the time period to fetch historical market data:
# set the start and end dates for our market data request
end_date = datetime(year=2025, month=3, day=1)
start_date = datetime(year=2023, month=1, day=1)
We’re grabbing about two years of data—enough to give us a decent view of how a stock’s price has evolved over time, without overwhelming our analysis.
Let’s download data for a single ticker, NVIDIA (NVDA):
# set the name of the ticker we want to download market data for
ticker = "NVDA"
# download market data for a single ticker
df_single = yf.download(
tickers=ticker,
start=start_date,
end=end_date,
interval="1d",
group_by="ticker",
auto_adjust=True,
progress=False
)
df_single
The output DataFrame should look something like this:
Ticker | NVDA | ||||
---|---|---|---|---|---|
Price | Open | High | Low | Close | Volume |
Date | |||||
2023-01-03 | 14.838839 | 14.983721 | 14.084457 | 14.303278 | 401277000 |
2023-01-04 | 14.555073 | 14.840838 | 14.229340 | 14.736923 | 431324000 |
2023-01-05 | 14.479135 | 14.552076 | 14.136416 | 14.253321 | 389168000 |
2023-01-06 | 14.462150 | 14.997712 | 14.022511 | 14.846835 | 405044000 |
2023-01-09 | 15.271488 | 16.042855 | 15.128604 | 15.615206 | 504231000 |
... | ... | ... | ... | ... | ... |
2025-02-24 | 136.547442 | 138.577254 | 130.068042 | 130.268021 | 251381100 |
2025-02-25 | 129.968045 | 130.188026 | 124.428561 | 126.618355 | 271428700 |
2025-02-26 | 129.978054 | 133.717701 | 128.478192 | 131.267929 | 322553800 |
2025-02-27 | 134.987587 | 134.997581 | 119.998968 | 120.138954 | 443175800 |
2025-02-28 | 118.009141 | 125.078491 | 116.389295 | 124.908508 | 389091100 |
The resulting DataFrame has the following columns:
- Open: Price at market open
- High: Highest price during the trading session
- Low: Lowest price during the trading session
- Close: Price at market close
- Volume: Number of shares traded
These are the standard columns you’ll see when working with market/trading data, and are often abbreviated “OHLCV” for short.
The parameters we pass to yf.download
controls the data we get back, including how it is organized in the output DataFrame:
tickers
: Symbol(s) to downloadstart
: Start date of the history requestend
: End date of the requestinterval
: Timeframe for each row of data (daily in this case)group_by
: How to organize the columns in the DataFrame (eitherticker
, which will place the ticker name as the top-level column orcolumn
, which will place the OHLCV values as the top-level columns, followed by the individual tickers)auto_adjust
: Automatically adjusts prices for splits and dividendsprogress
: Whether to display a progress bar (we disabled it)
By setting the interval
parameter, we can control what timeframe of data we want to download:
1m
: One minute data2m
Two minute data5m
: Five minute data15m
: Fifteen minute data30m
: Thirty minute data60m
: Sixty minute data90m
: Ninety minute data1d
: Daily data (what we’re using)1wk
: Weekly data1mo
: Monthly data
Try changing the interval
parameter to see how the DataFrame’s index changes—smaller intervals give you more granular data but cover shorter time periods.
1m
, 2m
, 5m
, 15m
, 30m
, 60m
, 90m
, and 1h
) cannot extend past the last 60 days.Now, let’s examine the column structure of our DataFrame:
# show the column structure
df_single.columns
Which will give us the following:
MultiIndex([('NVDA', 'Open'),
('NVDA', 'High'),
('NVDA', 'Low'),
('NVDA', 'Close'),
('NVDA', 'Volume')],
names=['Ticker', 'Price'])
If you’ve worked with Pandas before, you might be surprised to see that our DataFrame has a MultiIndex
for columns, including two levels:
- Level 1: The ticker symbol (i.e.,
NVDA
) - Level 2: The
Open
,High
,Low
,Close
andVolume
columns, respectively
As we’ll see in a second, this MultiIndex
column structure allows us to easily organize data in the DataFrame when working with multiple symbols.
Downloading market data for multiple tickers #
Let’s expand our analysis to include multiple stocks:
# define the list of tickers we want to fetch market data for
tickers = ["NVDA", "META", "AAPL"]
# download market data for a multiple tickers
df_multi = yf.download(
tickers=tickers,
start=start_date,
end=end_date,
interval="1d",
group_by="ticker",
auto_adjust=True,
progress=False
)
df_multi
Notice how the DataFrame column structure has changed:
Ticker | NVDA | META | AAPL | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Price | Open | High | Low | Close | Volume | Open | High | Low | Close | Volume | Open | High | Low | Close | Volume |
Date | |||||||||||||||
2023-01-03 | 14.838839 | 14.983721 | 14.084457 | 14.303278 | 401277000 | 122.352620 | 125.889114 | 121.814674 | 124.265312 | 35528500 | 128.782649 | 129.395518 | 122.742873 | 123.632530 | 112117500 |
2023-01-04 | 14.555073 | 14.840838 | 14.229340 | 14.736923 | 431324000 | 126.895264 | 128.558915 | 125.371087 | 126.885307 | 32397100 | 125.431615 | 127.181276 | 123.642420 | 124.907707 | 89113600 |
2023-01-05 | 14.479135 | 14.552076 | 14.136416 | 14.253321 | 389168000 | 125.650025 | 128.030937 | 124.066079 | 126.456947 | 25447100 | 125.668857 | 126.301500 | 123.326101 | 123.583107 | 80962700 |
2023-01-06 | 14.462150 | 14.997712 | 14.022511 | 14.846835 | 405044000 | 128.479231 | 129.834056 | 125.560380 | 129.525238 | 27584500 | 124.561732 | 128.792531 | 123.454601 | 128.130234 | 87754700 |
2023-01-09 | 15.271488 | 16.042855 | 15.128604 | 15.615206 | 504231000 | 130.660897 | 132.444079 | 128.788046 | 128.977325 | 26649100 | 128.970489 | 131.876702 | 128.397153 | 128.654160 | 70790800 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2025-02-24 | 136.547442 | 138.577254 | 130.068042 | 130.268021 | 251381100 | 686.280029 | 687.270020 | 662.450012 | 668.130005 | 15677000 | 244.929993 | 248.860001 | 244.419998 | 247.100006 | 51326400 |
2025-02-25 | 129.968045 | 130.188026 | 124.428561 | 126.618355 | 271428700 | 665.969971 | 668.000000 | 641.859985 | 657.500000 | 20579700 | 248.000000 | 250.000000 | 244.910004 | 247.039993 | 48013300 |
2025-02-26 | 129.978054 | 133.717701 | 128.478192 | 131.267929 | 322553800 | 659.650024 | 683.010010 | 658.000000 | 673.700012 | 14488700 | 244.330002 | 244.979996 | 239.130005 | 240.360001 | 44433600 |
2025-02-27 | 134.987587 | 134.997581 | 119.998968 | 120.138954 | 443175800 | 682.450012 | 688.650024 | 657.570007 | 658.239990 | 12500000 | 239.410004 | 242.460007 | 237.059998 | 237.300003 | 41153600 |
2025-02-28 | 118.009141 | 125.078491 | 116.389295 | 124.908508 | 389091100 | 658.039978 | 669.630005 | 642.599976 | 668.200012 | 17534200 | 236.949997 | 242.089996 | 230.199997 | 241.839996 | 56833400 |
Our DataFrame now has 15 columns instead of 5 because we have 3 tickers with 5 data columns each (one for each of the OHLCV values, respectively).
To make this more clear, let’s examine the column structure:
# show the column structure for multiple tickers
df_multi.columns
Which will give us:
MultiIndex([('NVDA', 'Open'),
('NVDA', 'High'),
('NVDA', 'Low'),
('NVDA', 'Close'),
('NVDA', 'Volume'),
('META', 'Open'),
('META', 'High'),
('META', 'Low'),
('META', 'Close'),
('META', 'Volume'),
('AAPL', 'Open'),
('AAPL', 'High'),
('AAPL', 'Low'),
('AAPL', 'Close'),
('AAPL', 'Volume')],
names=['Ticker', 'Price'])
Does the MultiIndex
column structure make more sense now?
Effectively, this structure allows us to organize our data hierarchically by ticker and price type.
For example, we can grab all AAPL data with:
# access all columns for AAPL
df_multi["AAPL"]
Price | Open | High | Low | Close | Volume |
---|---|---|---|---|---|
Date | |||||
2023-01-03 | 128.782649 | 129.395518 | 122.742873 | 123.632530 | 112117500 |
2023-01-04 | 125.431615 | 127.181276 | 123.642420 | 124.907707 | 89113600 |
2023-01-05 | 125.668857 | 126.301500 | 123.326101 | 123.583107 | 80962700 |
2023-01-06 | 124.561732 | 128.792531 | 123.454601 | 128.130234 | 87754700 |
2023-01-09 | 128.970489 | 131.876702 | 128.397153 | 128.654160 | 70790800 |
... | ... | ... | ... | ... | ... |
2025-02-24 | 244.929993 | 248.860001 | 244.419998 | 247.100006 | 51326400 |
2025-02-25 | 248.000000 | 250.000000 | 244.910004 | 247.039993 | 48013300 |
2025-02-26 | 244.330002 | 244.979996 | 239.130005 | 240.360001 | 44433600 |
2025-02-27 | 239.410004 | 242.460007 | 237.059998 | 237.300003 | 41153600 |
2025-02-28 | 236.949997 | 242.089996 | 230.199997 | 241.839996 | 56833400 |
Or, just can just grab the closing prices:
# access just the closing prices for AAPL
df_multi["AAPL"]["Close"]
Date
2023-01-03 123.632530
2023-01-04 124.907707
2023-01-05 123.583107
2023-01-06 128.130234
2023-01-09 128.654160
...
2025-02-24 247.100006
2025-02-25 247.039993
2025-02-26 240.360001
2025-02-27 237.300003
2025-02-28 241.839996
Name: Close, Length: 541, dtype: float64
The same is true for the Open
, High
, Low
, and Volume
columns as well.
This hierarchical indexing makes it easy to work with data from multiple tickers in a single DataFrame, while still maintaining clear organization.
Plotting closing prices #
Now that we have our data, let’s visualize the closing prices for all three tickers:
# initialize a new figure
plt.figure(figsize=(14, 7))
sns.set(style="whitegrid")
# loop over the tickers
for ticker in tickers:
# plot the closing price for each
sns.lineplot(
data=df_multi[ticker]["Close"],
label=ticker,
linewidth=2
)
# set the plot title
plt.title(
f"Stock Closing Prices ("
f"{start_date.strftime('%Y-%m-%d')} "
f"to {end_date.strftime('%Y-%m-%d')})"
)
# set the plot labels
plt.xlabel("Date")
plt.ylabel("Closing Price ($)")
# finish constructing the plot
plt.tight_layout()
plt.show()
Which will produce a plot that should look like this:
Note that we have three separate line plots (one for each ticker), allowing us to visualize the closing prices over the past ~2 years.
The most important line in the code above is:
data=df_multi[ticker]["Close"]
This is where we’re leveraging our MultiIndex
structure.
For each ticker in our list, we’re extracting just the closing prices and passing them to the lineplot
function.
The hierarchical indexing makes this selection clean and intuitive.
Other stock market data APIs #
While yfinance
is a great starting point, as you advance in your market analysis journey, you might want to explore other data sources. Here are a few APIs I’ve personally used and recommend.
Polygon.io #
Polygon offers institutional-grade financial data with impressive historical coverage and reliability. Their API is well-documented and provides access to stocks, options, forex, and crypto data. They offer tiered pricing plans, including a generous free tier that’s perfect for getting started.
What I love about Polygon is their data quality and consistency—critical factors when you’re building trading algorithms that need to make decisions based on clean, accurate information.
Financial Modeling Prep API #
The Financial Modeling Prep API provides a comprehensive suite of financial data beyond just market prices. You can access company fundamentals, financial statements, economic indicators, and more. Their pricing is reasonable, making it accessible for individual traders and small teams.
Financial Modeling Prep is particularly useful when you need to combine price action with fundamental analysis—perfect for those longer-term investment strategies.
EOD Historical Data API #
EODHD (End of Day Historical Data) API offers a solid mix of price data, fundamentals, and alternative datasets. Their global market coverage is impressive, with data for stocks, ETFs, mutual funds, and more from over 60 exchanges worldwide.
I’ve found their API to be reliable and straightforward to use, with a pricing structure that scales well as your needs grow.
Exercises #
Before we wrap up, try these exercises to reinforce what you’ve just learned:
Interval exploration: Download AAPL stock data using different intervals (try
1d
,1wk
, and1mo
). Compare how the resulting DataFrames differ in size and time coverage.Date range manipulation: Experiment with different
start
andend
dates. Try fetching data for:- The last 6 months
- A specific calendar year (like 2024)
- A period spanning a major market event (such as the 2008 housing market crash, or during the first six months of COVID)
Group by experiment: Download data for multiple tickers (at least 3) and try changing the
group_by
parameter:- Set
group_by="ticker"
(what we used above) - Set
group_by="column"
and observe how the DataFrame structure changes - Which organization method do you find more intuitive?
- Set
Advanced challenge: Download data for the NASDAQ-100 index (
^NDX
), the S&P 500 (^GSPC
), and three stocks of your choosing. Create a visualization comparing their performance over the last year, with prices normalized to the same starting value.
Remember, the best way to learn is by doing—experiment with different parameters and see how they affect the data you receive.
Final thoughts #
Congratulations, you’ve now learned the basics of fetching market data with yfinance!
The MultiIndex
structure used by yfinance
provides a powerful way to organize financial time series data, especially when working with multiple securities (although, it does have its limitations, as we’ll see in the next article in the series).
In the next tutorial, we’ll dive deeper into advanced DataFrame structures for stock market data, building on what we’ve learned here.