Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Caching in Python With lru_cache
There are many ways to achieve fast and responsive applications. Caching is one approach that, when used correctly, makes things much faster while decreasing the load on computing resources. Python’s functools
module comes with the @lru_cache
decorator, which gives you the ability to cache the result of your functions using the Least Recently Used (LRU) strategy. This is a simple yet powerful technique that you can use to leverage the power of caching in your code.
In this tutorial, you’ll learn:
@lru_cache
decorator@lru_cache
decorator and make it expire after a specific timeBy the end of this tutorial, you’ll have a deeper understanding of how caching works and how to take advantage of it in Python.
Free Bonus: 5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you’ll need to take your Python skills to the next level.
Caching and Its UsesCaching is an optimization technique that you can use in your applications to keep recent or often-used data in memory locations that are faster or computationally cheaper to access than their source.
Imagine you’re building a newsreader application that fetches the latest news from different sources. As the user navigates through the list, your application downloads the articles and displays them on the screen.
What would happen if the user decided to move repeatedly back and forth between a couple of news articles? Unless you were caching the data, your application would have to fetch the same content every time! That would make your user’s system sluggish and put extra pressure on the server hosting the articles.
A better approach would be to store the content locally after fetching each article. Then, the next time the user decided to open an article, your application could open the content from a locally stored copy instead of going back to the source. In computer science, this technique is called caching.
Implementing a Cache Using a Python DictionaryYou can implement a caching solution in Python using a dictionary.
Staying with the newsreader example, instead of going directly to the server every time you need to download an article, you can check whether you have the content in your cache and go back to the server only if you don’t. You can use the article’s URL as the key and its content as the value.
Here’s an example of how this caching technique might look:
Save this code to a caching.py
file, install the requests
library, then run the script:
Notice how you get the string "Fetching article from server..."
printed a single time despite calling get_article()
twice, in lines 17 and 18. This happens because, after accessing the article for the first time, you put its URL and content in the cache
dictionary. The second time, the code doesn’t need to fetch the item from the server again.
There’s one big problem with this cache implementation: the content of the dictionary will grow indefinitely! As the user downloads more articles, the application will keep storing them in memory, eventually causing the application to crash.
To work around this issue, you need a strategy to decide which articles should stay in memory and which should be removed. These caching strategies are algorithms that focus on managing the cached information and choosing which items to discard to make room for new ones.
There are several different strategies that you can use to evict items from the cache and keep it from growing past from its maximum size. Here are five of the most popular ones, with an explanation of when each is most useful:
Strategy Eviction policy Use case First-In/First-Out (FIFO) Evicts the oldest of the entries Newer entries are most likely to be reused Last-In/First-Out (LIFO) Evicts the latest of the entries Older entries are most likely to be reused Least Recently Used (LRU) Evicts the least recently used entry Recently used entries are most likely to be reused Most Recently Used (MRU) Evicts the most recently used entry Least recently used entries are most likely to be reused Least Frequently Used (LFU) Evicts the least often accessed entry Entries with a lot of hits are more likely to be reusedIn the sections below, you’ll take a closer look at the LRU strategy and how to implement it using the @lru_cache
decorator from Python’s functools
module.
A cache implemented using the LRU strategy organizes its items in order of use. Every time you access an entry, the LRU algorithm will move it to the top of the cache. This way, the algorithm can quickly identify the entry that’s gone unused the longest by looking at the bottom of the list.
The following figure shows a hypothetical cache representation after your user requests an article from the network:
Notice how the cache stores the article in the most recent slot before serving it to the user. The following figure shows what happens when the user requests a second article:
The second article takes the most recent slot, pushing the first article down the list.
The LRU strategy assumes that the more recently an object has been used, the more likely it will be needed in the future, so it tries to keep that object in the cache for the longest time.
Peeking Behind the Scenes of the LRU CacheOne way to implement an LRU cache in Python is to use a combination of a doubly linked list and a hash map. The head element of the doubly linked list would point to the most recently used entry, and the tail would point to the least recently used entry.
The figure below shows the potential structure of the LRU cache implementation behind the scenes:
Using the hash map, you can ensure access to every item in the cache by mapping each entry to the specific location in the doubly linked list.
This strategy is very fast. Accessing the least recently used item and updating the cache are operations with a runtime of O(1).
Note: For a deeper understanding of Big O notation, together with several practical examples in Python, check out Big O Notation and Algorithm Analysis with Python Examples.
Since version 3.2, Python has included the @lru_cache
decorator for implementing the LRU strategy. You can use this decorator to wrap functions and cache their results up to a maximum number of entries.
Just like the caching solution you implemented earlier, @lru_cache
uses a dictionary behind the scenes. It caches the function’s result under a key that consists of the call to the function, including the supplied arguments. This is important because it means that these arguments have to be hashable for the decorator to work.
Imagine you want to determine all the different ways you can reach a specific stair in a staircase by hopping one, two, or three stairs at a time. How many paths are there to the fourth stair? Here are all the different combinations:
You could frame a solution to this problem by stating that, to reach your current stair, you can jump from one stair, two stairs, or three stairs below. Adding up the number of jump combinations you can use to get to each of those points should give you the total number of possible ways to reach your current position.
For example, the number of combinations for reaching the fourth stair will equal the total number of different ways you can reach the third, second, and first stair:
As shown in the picture, there are seven different ways to reach the fourth stair. Notice how the solution for a given stair builds upon the answers to smaller subproblems. In this case, to determine the different paths to the fourth stair, you can add up the four ways of reaching the third stair, the two ways of reaching the second stair, and the one way of reaching the first stair.
This approach is called recursion. If you want to learn more, then check out Recursion in Python: An Introduction for an introduction to the topic.
Here’s a function that implements this recursion:
Save this code into a file named stairs.py
and run it with the following command:
Great! The code works for 4
stairs, but how about counting how many steps to reach a higher place in the staircase? Change the stair number in line 33 to 30
and rerun the script:
Wow, more than 53 million combinations! That’s a lot of hops!
Timing Your CodeWhen finding the solution for the thirtieth stair, the script took quite a bit of time to finish. To get a baseline, you can measure how long it takes for the code to run.
To accomplish this, you can use Python’s timeit
module. Add the following lines after line 33:
You also need to import the timeit
module at the top of the code:
Here’s a line-by-line explanation of these additions:
steps_to()
so that timeit.repeat()
knows how to call it.30
. This is the statement that will be executed and timed.timeit.repeat()
with the setup code and the statement. This will call the function 10
times, returning the number of seconds each execution took.Note: A common misconception is that you should find the average time of each run of the function instead of selecting the shortest time.
Time measurements are noisy because the system runs other processes concurrently. The shortest time is always the least noisy, making it the best representation of the function’s runtime.
Now run the script again:
The number of seconds you’ll see depends on your specific hardware. On my system, the script took forty seconds, which is quite slow for just thirty stairs!
Note: You can learn more about the timeit
module in the official Python documentation.
A solution that takes this long is a problem, but you can improve it using memoization.
Using Memoization to Improve the SolutionThis recursive implementation solves the problem by breaking it into smaller steps that build upon each other. The following figure shows a tree in which every node represents a specific call to steps_to()
:
Notice how you need to call steps_to()
with the same argument multiple times. For example, steps_to(5)
is computed two times, steps_to(4)
is computed four times, steps_to(3)
seven times, and steps_to(2)
six times. Calling the same function more than once adds computation cycles that aren’t necessary—the result will always be the same.
To fix this problem, you can use a technique called memoization. This approach ensures that a function doesn’t run for the same inputs more than once by storing its result in memory and then referencing it later when necessary. This scenario sounds like the perfect opportunity to use Python’s @lru_cache
decorator!
Note: For more information on memoization and using @lru_cache
to implement it, check out Memoization in Python.
With just two changes, you can considerably improve the algorithm’s runtime:
@lru_cache
decorator from the functools
module.@lru_cache
to decorate steps_to()
.Here’s what the top of the script will look like with the two updates:
Running the updated script produces the following result:
Caching the result of the function takes the runtime from 40 seconds down to 0.0008 milliseconds! That’s a fantastic improvement!
Note: In Python 3.8 and above, you can use the @lru_cache
decorator without parentheses if you’re not specifying any parameters. In previous versions, you may need to include the parentheses: @lru_cache()
.
Remember, behind the scenes, the @lru_cache
decorator stores the result of steps_to()
for each different input. Every time the code calls the function with the same parameters, instead of computing an answer all over again, it returns the correct result directly from memory. This explains the massive improvement in performance when using @lru_cache
.
With the @lru_cache
decorator in place, you store every call and answer in memory to access later if requested again. But how many calls can you save before running out of memory?
Python’s @lru_cache
decorator offers a maxsize
attribute that defines the maximum number of entries before the cache starts evicting old items. By default, maxsize
is set to 128
. If you set maxsize
to None
, then the cache will grow indefinitely, and no entries will be ever evicted. This could become a problem if you’re storing a large number of different calls in memory.
Here’s an example of @lru_cache
using the maxsize
attribute:
In this case, you’re limiting the cache to a maximum of 16
entries. When a new call comes in, the decorator’s implementation will evict the least recently used of the existing 16
entries to make a place for the new item.
To see what happens with this new addition to the code, you can use cache_info()
, provided by the @lru_cache
decorator, to inspect the number of hits and misses and the current size of the cache. For clarity, remove the code that times the runtime of the function. Here’s how the final script looks after all the modifications:
If you call the script again, then you’ll see the following result:
You can use the information returned by cache_info()
to understand how the cache is performing and fine-tune it to find the appropriate balance between speed and storage.
Here’s a breakdown of the properties provided by cache_info()
:
hits=52
is the number of calls that @lru_cache
returned directly from memory because they existed in the cache.
misses=30
is the number of calls that didn’t come from memory and were computed. Since you’re trying to find the number of steps to reach the thirtieth stair, it makes sense that each of these calls missed the cache the first time they were made.
maxsize=16
is the size of the cache as you defined it with the maxsize
attribute of the decorator.
currsize=16
is the current size of the cache. In this case, it shows that your cache is full.
If you need to remove all the entries from the cache, then you can use cache_clear()
provided by @lru_cache
.
Imagine you want to develop a script that monitors Real Python and prints the number of characters in any article that contains the word python
.
Real Python provides an Atom feed, so you can use the feedparser
library to parse the feed and the requests
library to load the contents of the article as you did before.
Here’s an implementation of the monitor script:
Save this script to a file called monitor.py
, install the feedparser
and requests
libraries, and run the script. It will run continuously until you stop it by pressing Ctrl+C in your terminal window:
Here’s a step-by-step explanation of the code:
feedparser
tries to access content served over HTTPS. See the note below for more information.monitor()
will loop indefinitely.feedparser
, the code loads and parses the feed from Real Python.5
entries on the list.python
is part of the title, then the code prints it along with the length of the article.5
seconds before continuing.monitor()
.Every time the script loads an article, the message "Fetching article from server..."
is printed to the console. If you let the script run long enough, then you’ll see how this message shows up repeatedly, even when loading the same link.
Note: For more information about the issue with feedparser
accessing content served over HTTPS, check out issue 84 on the feedparser
repository. PEP 476 describes how Python started enabling certificate verification by default for stdlib
HTTP clients, which is the underlying cause of this error.
This is an excellent opportunity to cache the article’s contents and avoid hitting the network every five seconds. You could use the @lru_cache
decorator, but what happens if the article’s content is updated?
The first time you access the article, the decorator will store its content and return the same data every time after. If the post is updated, then the monitor script will never realize it because it will be pulling the old copy stored in the cache. To solve this problem, you can set your cache entries to expire.
Evicting Cache Entries Based on Both Time and SpaceThe @lru_cache
decorator evicts existing entries only when there’s no more space to store new listings. With sufficient space, entries in the cache will live forever and never get refreshed.
This presents a problem for your monitoring script because you’ll never fetch updates published for previously cached articles. To get around this problem, you can update the cache implementation so it expires after a specific time.
You can implement this idea into a new decorator that extends @lru_cache
. If the caller tries to access an item that’s past its lifetime, then the cache won’t return its content, forcing the caller to fetch the article from the network.
Note: For more information about Python decorators, check Primer on Python Decorators and Python Decorators 101.
Here’s a possible implementation of this new decorator:
Here’s a breakdown of this implementation:
@timed_lru_cache
decorator will support the lifetime of the entries in the cache (in seconds) and the maximum size of the cache.lru_cache
decorator. This allows you to use the cache functionality already provided by lru_cache
.Notice how, when an entry is expired, this decorator clears the entire cache associated with the function. The lifetime applies to the cache as a whole, not to individual articles. A more sophisticated implementation of this strategy would evict entries based on their individual lifetimes.
Caching Articles With the New DecoratorYou can now use your new @timed_lru_cache
decorator with the monitor
script to prevent fetching the content of an article every time you access it.
Putting the code together in a single script for simplicity, you end up with the following:
Notice how line 30 decorates get_article_from_server()
with the @timed_lru_cache
and specifies a validity of 10
seconds. Any attempt to access the same article from the server within 10
seconds of having fetched it will return the contents from the cache and never hit the network.
Run the script and take a look at the results:
Notice how the code prints the message "Fetching article from server..."
the first time it accesses the matching articles. After that, depending on your network speed and computing power, the script will retrieve the articles from the cache either one or two times before hitting the server again.
The script tries to access the articles every 5
seconds, and the cache expires every 10
seconds. These times are probably too short for a real application, so you can get a significant improvement by adjusting these configurations.
Caching is an essential optimization technique for improving the performance of any software system. Understanding how caching works is a fundamental step toward incorporating it effectively in your applications.
In this tutorial, you learned:
@lru_cache
decorator@lru_cache
timeit
moduleThe next step to implementing different caching strategies in your applications is looking at the cachetools
module. This library provides several collections and decorators covering some of the most popular caching strategies that you can start using right away.
Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Caching in Python With lru_cache
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4