I have split this post into two parts – starting with how pages are cached in the first place, and then how Magento retrieves a page from the cache before falling back to dispatching a controller.
How pages are cached
The Magento enterprise page cache module observes several events that are fired during the course of dispatching your controller and sending a full page in the response. I’m going to dive into only a few of them – the main observers that actually cause your content to end up in the page cache, and then exactly how Magento gets it back out again.
The first thing we’re going to look at is how Magento caches pages, and that starts with the observation of the ‘controller_action_predispatch’ event – triggered before a request is dispatched (but after we have determined we will not be fulfilling the request from the cache). It’s here that the PageCache module will determine if the request that’s about to be dispatched is to be cached for subsequent requests. There are many things that will prevent a page from being cached and we can start by looking at these.
Examine the processPreDispatch method in Enterprise_PageCache_Model_Observer. This is the event that is fired on the controller_action_predispatch event. All the handlers first check to ensure that the page cache is enabled before doing anything else. Then we examine the no cache cookie and set a flag (memorize_disabled) in the session.
$this->_processor->canProcessRequest($request) && $this->_processor->getRequestProcessor($request)
The first of the above two condiditions does some basic validation on the request. $this->_processor will normally be an instance of ‘Enterprise_PageCache_Model_Processor’ by this point (it’s instantiated by the ‘enterprise_pagecache/processor’ model here so can be overridden), and here we will find the canProcessRequest method. First, we dip into the isAllowed method where we check that we’ve generated a requestId during the construction of this object. The requestId will become important later on in the process as it provides the key into your cache storage backend for the page content and needs to be reliably recreated based on the request parameters. We then check that the request is not over https, the no cache cookie (NO_CACHE) is not present, the no_cache GET variable is not present and that full page caching is enabled. If any of these checks fail we do not process this request any further in the full page cache module.
Assuming we pass, we move onto checking the configuration. We inspect the page depth parameter (the number of variations a page can have based on the number of variables in the query string). If we’re over the limit, the page is not cached. Then we check the cache settings for pages that are not in the default currency (system/page_cache/multicurrency = 0 and the presence of the CURRENCY cookie in the request).
The second of the two checks above is where we examine how the page is going to be cached. This is configured in ‘frontend/cache/requests’ and the defaults are found in app/code/core/Enterprise/PageCache/etc/config.xml in the following section of XML. By default all cms, catalog/category and catalog/product pages have a processor configured.
getRequestProcessor is where this configuration is examined and we we use the request data to attempt to construct a processor for the target page. Assuming one is configured, Magento is now ready to cache the page for the next request.
If at this point it looks as though we’re going to cache the page, the independent block cache is disabled and the first peice of setting up the cache is complete.
The next time we’re back in the PageCache code is once the request has been handled and we’re about to send the response. This is accomplished by hooking into the controller_front_send_response_before event and executing the ‘cacheResponse’ method on the observer. We very quickly end up back in the processor and in the processRequestResponse method.
The first two calls will be familiar. The same methods (canProcessRequest and getRequestProcessor) are called again – however the processor for this request already exists so the second method returns immediately. We delegate to the configured processor (configured above) at this point and call the ‘allowCache’ method with the Request object as a parameter. A number of further checks are done which could prevent the request from being cached – the presence of ‘___store’ or ‘___from_store’ in the query string (for the default processor) or the presence of the no cache flag in the session data.
Finally we’re actually ready to cache the response. Some filtering occurs to strip out blocks and replace them with placeholders (blocks can be cached independently from the page – to allow for independant cache rules – like the navigation menu which can persist across multiple pages). The placeholder configuration is stored in app/code/core/Enterprise/PageCache/etc/cache.xml and is structured like the following (this is a real example for the catalog/navigation block).
The above peice of XML declares the catalog/navigation block to be cached separately to any page that is appears on with CATALOG_NAVIGATION as the placeholder. It specifies the container class (used during storing and extracting the block) and the lifetime in seconds for the cached block.
Blocks are marked up at the time they are rendered and you can view them in the HTML source code between comment blocks that consist of the placeholder value defined in cache.xml, and a hash that is made up by the md5 sum of the results of calling the getCacheKey. They look something like the following (again, using CATALOG_NAVIGATION as an example).
This functionality gives you the ability to be able to cache blocks separately to the full page – with their own rules on how long they stay in the cache for (and independently of the page that they are on). This means that blocks can be made to be cached across pages rather than within them – meaning that blocks such as catalog/navigation need only be generated once – and served up from the cache for every request after that regardless of if the page itself is being served from the cache.
How pages are served up from the cache
Rather unlike how pages end up in the cache (using observers to watch for specific points in the dispatch process for a request), the method by which Magento retrieves information from the cache is somewhat more hard coded. Before dispatching the front controller in Mage_Core_Model_App::run (one of the very first methods that gets called in fulfilling a request) a check is done to see if the current request can be served up from the pages in the cache. This happens inside Mage_Core_Model_Cache::processRequest where we immediately check to see if there is a request processor configured that we can check the request against. The request processors are configured inside app/etc/enterprise.xml – the default one is set to Enterprise_PageCache_Model_Processor (predictably the same class used to cache the results of a dispatched request).
This is quite important – if you have a reason to extend the logic in Enterprise_PageCache_Model_Processor (like adding data that forms part of the cache id for a page), you not only have to extend it and rewrite ‘enterprise_pagecache/processor’, but also make sure that it’s added to this list of request processors so that Magento has a chance of being able to retrieve your items back out of the cache again.
Magento then loops through all the configured request processors, and calls extractContent on each of them. The first thing that this function does is create the cache id based on the request parameters (this happens when the processor is constructed, when the _createRequestIds method is called – as we saw earlier when the content was being cached in the first place). We check to see if any design changes have been cached for the current page, and then check the configured cache storage to see if we can retrieve the full page content for this request. If we do not get a match at this point, the request is considered a cache miss – and we return to the run method and start the process of dispatching the request (and potentially populating the cache with the results).
Assuming that we have a cache hit, we decompress it (content will be gzipped whenever possible in the cache storage to minimise space), and the content is processed by _processContent to replace all of the placeholders with separate cached blocks (ie, the catalog/navigation block that we cached separately when putting the page into the cache in the first place).
When writing about how pages are served up from the cache I referred to the configured cache storage. Prior to Magento EE 1.11, the full page cache simply used what you configured in the section in app/etc/local.xml. From 1.11 this has changed and you must now also have a section in your local.xml configuration (it uses exactly the same options as the config).
The aim of this post is to give an overview of how page caching works in magento and I am intending on following it up with some more detailed investigations into how more areas of cacheing work. If you have a specific request for more information, please do leave a comment and it may form part of a future blog post.