Magento Enterprise PageCache

January 5, 2012 Uncategorized 9 comments ,

I have split this post into two parts – starting with how pages are cached in the first place, and then how Magento retrieves a page from the cache before falling back to dispatching a controller.

How pages are cached

The Magento enterprise page cache module observes several events that are fired during the course of dispatching your controller and sending a full page in the response. I’m going to dive into only a few of them – the main observers that actually cause your content to end up in the page cache, and then exactly how Magento gets it back out again.

The first thing we’re going to look at is how Magento caches pages, and that starts with the observation of the ‘controller_action_predispatch’ event – triggered before a request is dispatched (but after we have determined we will not be fulfilling the request from the cache). It’s here that the PageCache module will determine if the request that’s about to be dispatched is to be cached for subsequent requests. There are many things that will prevent a page from being cached and we can start by looking at these.

Examine the processPreDispatch method in Enterprise_PageCache_Model_Observer. This is the event that is fired on the controller_action_predispatch event. All the handlers first check to ensure that the page cache is enabled before doing anything else. Then we examine the no cache cookie and set a flag (memorize_disabled) in the session.

$this->_processor->canProcessRequest($request) && $this->_processor->getRequestProcessor($request)

The first of the above two condiditions does some basic validation on the request. $this->_processor will normally be an instance of ‘Enterprise_PageCache_Model_Processor’ by this point (it’s instantiated by the ‘enterprise_pagecache/processor’ model here so can be overridden), and here we will find the canProcessRequest method. First, we dip into the isAllowed method where we check that we’ve generated a requestId during the construction of this object. The requestId will become important later on in the process as it provides the key into your cache storage backend for the page content and needs to be reliably recreated based on the request parameters. We then check that the request is not over https, the no cache cookie (NO_CACHE) is not present, the no_cache GET variable is not present and that full page caching is enabled. If any of these checks fail we do not process this request any further in the full page cache module.

Assuming we pass, we move onto checking the configuration. We inspect the page depth parameter (the number of variations a page can have based on the number of variables in the query string). If we’re over the limit, the page is not cached. Then we check the cache settings for pages that are not in the default currency (system/page_cache/multicurrency = 0 and the presence of the CURRENCY cookie in the request).

The second of the two checks above is where we examine how the page is going to be cached. This is configured in ‘frontend/cache/requests’ and the defaults are found in app/code/core/Enterprise/PageCache/etc/config.xml in the following section of XML. By default all cms, catalog/category and catalog/product pages have a processor configured.

<frontend>
    ...
    <cache>
        <requests>
            <cms>enterprise_pagecache/processor_default</cms>
            <catalog>
                <category>
                    <view>enterprise_pagecache/processor_category</view>
                </category>
            </catalog>
            <catalog>
                <product>
                    <view>enterprise_pagecache/processor_product</view>
                </product>
            </catalog>
        </requests>
    </cache>
</frontend>

getRequestProcessor is where this configuration is examined and we we use the request data to attempt to construct a processor for the target page. Assuming one is configured, Magento is now ready to cache the page for the next request.

If at this point it looks as though we’re going to cache the page, the independent block cache is disabled and the first peice of setting up the cache is complete.

The next time we’re back in the PageCache code is once the request has been handled and we’re about to send the response. This is accomplished by hooking into the controller_front_send_response_before event and executing the ‘cacheResponse’ method on the observer. We very quickly end up back in the processor and in the processRequestResponse method.

The first two calls will be familiar. The same methods (canProcessRequest and getRequestProcessor) are called again – however the processor for this request already exists so the second method returns immediately. We delegate to the configured processor (configured above) at this point and call the ‘allowCache’ method with the Request object as a parameter. A number of further checks are done which could prevent the request from being cached – the presence of ‘___store’ or ‘___from_store’ in the query string (for the default processor) or the presence of the no cache flag in the session data.

Finally we’re actually ready to cache the response. Some filtering occurs to strip out blocks and replace them with placeholders (blocks can be cached independently from the page – to allow for independant cache rules – like the navigation menu which can persist across multiple pages). The placeholder configuration is stored in app/code/core/Enterprise/PageCache/etc/cache.xml and is structured like the following (this is a real example for the catalog/navigation block).

<config>
    <placeholders>
        <catalog_navigation>
            <block>catalog/navigation</block>
            <name>catalog.topnav</name>
            <placeholder>CATALOG_NAVIGATION</placeholder>
            <container>Enterprise_PageCache_Model_Container_Catalognavigation</container>
            <cache_lifetime>86400</cache_lifetime>
        </catalog_navigation>
    </placeholders>
</config>

The above peice of XML declares the catalog/navigation block to be cached separately to any page that is appears on with CATALOG_NAVIGATION as the placeholder. It specifies the container class (used during storing and extracting the block) and the lifetime in seconds for the cached block.

Blocks are marked up at the time they are rendered and you can view them in the HTML source code between comment blocks that consist of the placeholder value defined in cache.xml, and a hash that is made up by the md5 sum of the results of calling the getCacheKey. They look something like the following (again, using CATALOG_NAVIGATION as an example).

<!--{CATALOG_NAVIGATION_a61df2dc9b9e5868f17a56461177d8c4}-->
<p>some html</p>
<!--/{CATALOG_NAVIGATION_a61df2dc9b9e5868f17a56461177d8c4}-->

This functionality gives you the ability to be able to cache blocks separately to the full page – with their own rules on how long they stay in the cache for (and independently of the page that they are on). This means that blocks can be made to be cached across pages rather than within them – meaning that blocks such as catalog/navigation need only be generated once – and served up from the cache for every request after that regardless of if the page itself is being served from the cache.

How pages are served up from the cache

Rather unlike how pages end up in the cache (using observers to watch for specific points in the dispatch process for a request), the method by which Magento retrieves information from the cache is somewhat more hard coded. Before dispatching the front controller in Mage_Core_Model_App::run (one of the very first methods that gets called in fulfilling a request) a check is done to see if the current request can be served up from the pages in the cache. This happens inside Mage_Core_Model_Cache::processRequest where we immediately check to see if there is a request processor configured that we can check the request against. The request processors are configured inside app/etc/enterprise.xml – the default one is set to Enterprise_PageCache_Model_Processor (predictably the same class used to cache the results of a dispatched request).

This is quite important – if you have a reason to extend the logic in Enterprise_PageCache_Model_Processor (like adding data that forms part of the cache id for a page), you not only have to extend it and rewrite ‘enterprise_pagecache/processor’, but also make sure that it’s added to this list of request processors so that Magento has a chance of being able to retrieve your items back out of the cache again.

Magento then loops through all the configured request processors, and calls extractContent on each of them. The first thing that this function does is create the cache id based on the request parameters (this happens when the processor is constructed, when the _createRequestIds method is called – as we saw earlier when the content was being cached in the first place). We check to see if any design changes have been cached for the current page, and then check the configured cache storage to see if we can retrieve the full page content for this request. If we do not get a match at this point, the request is considered a cache miss – and we return to the run method and start the process of dispatching the request (and potentially populating the cache with the results).

Assuming that we have a cache hit, we decompress it (content will be gzipped whenever possible in the cache storage to minimise space), and the content is processed by _processContent to replace all of the placeholders with separate cached blocks (ie, the catalog/navigation block that we cached separately when putting the page into the cache in the first place).

When writing about how pages are served up from the cache I referred to the configured cache storage. Prior to Magento EE 1.11, the full page cache simply used what you configured in the section in app/etc/local.xml. From 1.11 this has changed and you must now also have a section in your local.xml configuration (it uses exactly the same options as the config).

The aim of this post is to give an overview of how page caching works in magento and I am intending on following it up with some more detailed investigations into how more areas of cacheing work. If you have a specific request for more information, please do leave a comment and it may form part of a future blog post.

My OS X apps

November 19, 2011 Uncategorized No comments

I’ve had to reinstall OS X Lion in the past few days. Here is a list of apps that I really like (in no particular order)

Chrome / Firefox – Pretty much all that I use Safari for (sorry, apple!)

Twitter – obvious reasons

Alfred – really great app launcher (and other stuff)

Caffeine – keep your mac awake for longer

Skype – Well, more out of necessity.

Dropbox – would have been lost without this!

Virtualbox / vagrant – dev tools.

Homebrew + git + vim – more dev tools…

GitX – Nice OS X native git gui.

1Password – Conundrum – what do you do when your password file is in dropbox, and your dropbox password is in 1Password? Use the iPhone app! Brilliant.

Moom – really cool window resizing management for OS X

Growl – part of OS X but deserves a mention for v1.3 in the app store.

iTerm2 – much nicer than the default Terminal.

I’ll leave it at that for now. Will update this if I remember anything else that I have missed!

xhtml as a hypermedia format

October 2, 2011 Uncategorized No comments

XHTML is a tempting hypermedia format to use in a web service. A lot of the time, the sort of data that you serve using an API is very similar (if not the same) to what your website displays on its own. XHTML already has all the functionality built in for describing forms, where to post them back to once filled in, how to link to images, documents or any other form of data across the whole web of information.

Despite all this good stuff that you get for free, xhtml, or rather the xhtml media type (application/xhtml+xml) has its problems. And one of those problems is webkit.

Mobile devices are incredibly important in modern web development, so designing for the capabilities of those devices is just part of a standard workflow. The problem however, is that webkit on these devices prefer application/xml and application/xhtml over our friend, text/html. Take a look at the following Accept header sent by two desktop browsers (Firefox and Chrome):

Accept: text/html,application/xhtml+xml,application/xml;q=0.9, */*;q=0.8

Contrast this with what is sent from the iPhone (iOS 4.3.5):

Accept: application/xml,application/xhtml+xml,text/html;q=0.9, text/plain;q=0.8,image/png,*/*;q=0.5

And further, and Android phone (2.3.5):


Accept: text/xml, text/html, application/xhtml+xml, image/png, text/plain, */*;q=0.8

And suddenly we have a problem. If we use application/xhtml+xml then we have to account for iOS and Android devices preferring it over text/html.

There’s two things we can do to counter this. The first is to put a filter in front of your application to detect the user agents for browsers that are known to prefer the xml/xhtml representation that we want to reserve for our API clients. This is an application specific solution – you have to know that the browsers that you are filtering and then rewrite the Accept header so that text/html is served up by your application. You also have to accept that requests from those browsers will never be issuing requests against your web service as those requests will be rewritten to text/html.

The second solution is to use a vendor specific media type, and serve xhtml up on that. This means that you lose a small part of the self describing nature of using application/xhtml+xml – but means that you can be absolutely certain that no existing browser will prefer your API content over what it should be seeing. Vendor media types take the format of application/vnd.. – for example application/vnd.fdrop.xhtml+xml

Opinions on which of these two options are preferred (or if you have heard of other solutions or have any suggestions on other ways of dealing with this) would be most welcome.

Speaking at PHPNW 2011

June 30, 2011 Uncategorized No comments

I’m delighted to have been accepted to speak on REST and HATEOAS at the PHPNW conference. This marks the start of my speaking ‘career’, something that I have wanted to kick start for some time. The subject matter is something that I am very interested in, and it gives me a chance (and a focus point) for one of my pet projects that actually gets a little day to day usage over on http://fdrop.it. It’s been needing a well designed (and easy to use) API to give it the edge over other file sharing websites, and being accepted to speak at PHPNW on REST has given me the motivation that I need to polish this area of the site and it gives me the opportunity to explain REST with a real life practical example, that people can actually go away and play with.

The abstract is as follows;

REST and HATEOAS: A Case Study

A RESTful API is only truly RESTful if it uses hypermedia to tell us
about all the actions that can be performed on the curent resource,
allowing us to traverse the API from a single entry point. This
session looks at REST and HATEOAS (Hypermedia As The Engine Of
Application State) to illustrate good service structure.

We’ll use the RESTful file sharing service fdrop.it to illustrate the
various examples of how this can be used. This session is recommended
for architects and senior developers alike and will give a good
grounding in writing excellent, self-explanatory RESTful services.

Looking forward to seeing you there!

PHPNW 2001

PHP on Azure

February 13, 2011 Uncategorized 2 comments ,

When I first started out playing with PHP a number of years ago, I did it on my home PC running windows. I remember manually setting up apache (I never tried with IIS), configuring php to work with it and getting a mysql database up and running. The whole process was time consuming and frustrating – without package management it’s not a particularly pleasant experience.

With Microsofts Azure cloud platform available on a trial [link], and with PHP Benelux running the PHP on Azure contest [link] I decided it would be a good opportunity to have another go at using the windows environment as a hosting platform – though this time using the Web Platform Installer (WebPI) with IIS and SQL Server instead of my traditional LAMP stack.

WebPI promised that this would be a simple process. It certainly looks the part, and has a good range of PHP apps that can be setup out the box. Unfortunately as I was getting all the pre-requisites installed in my Windows 7 VM, the PHP command line tools didn’t appear to have the Windows Azure SDK available as a dependency. Nor was it available when I searched for it within WebPI.

SQL Server did install ok though, so I followed the manual instructions [link] to get IIS set up with PHP, and get the SDK installed.

That was fine – up until the point that I was building my first test deployment to my local development ‘cloud’. It transpires that the package.php script (used to build an azure package out of your php project ready for deployment) assumes that your username does not contain a space. Mine was ‘Ben Longden’.

So I created a new user account and tried again. None of the Azure SDK tools worked under my new user and just claimed that ‘A fatal error occured’, but reinstalling them helped. The next issue was that my new user doesn’t have access to connect to the SQL server installation (this is something that I have not yet resolved!).

So not plain sailing so far! I found it quite frustrating in all (compared to ‘sudo tasksel install lamp-server’), however my initial deployment to the Azure Cloud went fine in the end, once the environment was set up.

Initial concerns are with how the development process will work. Up next, will be setting up IIS as a development environment that I can work with, and automating the deployment to my local cloud.

Zend 5.3 certification

November 7, 2010 Uncategorized 1 comment

Hurrah! It’s probably about time I did this…

Zend Yellow Pages

:-)

Ryan Tomayako – How I explained REST to my wife

October 11, 2010 Uncategorized No comments ,

Having attended a talk by David Zuelke on Saturday at the PHPNW Conference 2010, he provided a link at the end of his talk to a blog post by Ryan Tomayoko, titled ‘How I explained REST to my wife’.

It’s a brilliantly simple explanation of something that I have struggled to vocalise for a few years!

Here it is in all it’s glory (don’t be scared by the size of the page – it’s mostly comments!).

Ryan Tomayoko – How I explained REST to my Wife

Buildings a Continuous Integration Server for PHP with Hudson

October 4, 2010 Uncategorized No comments ,

An article that I wrote for techPortal has been published.

http://techportal.ibuildings.com/2010/09/20/building-a-continuous-integration-server-for-php-with-hudson/

PHPNW TestFest 2010

August 12, 2010 Uncategorized No comments ,

I am very pleased to announce this years PHPNW TestFest over at MadLab in Manchester. The event will take place on Saturday 11th September from 12pm until sometime around 5-6pm. Lunch will be provided (courtesy of our sponsor, Ibuildings), and I daresay we may wish to make an bit of an evening of it at a local bar…

More information on what the php testfest is can be found over at http://wiki.php.net/qa/testfest-2010 and you can register your interest over on upcoming, at http://upcoming.yahoo.com/event/6621123.

All you will need on the day is to bring your laptop. The venue will provide all of the connectivity that we need.

Hope to see you there!

LOLCODE and PHP

June 12, 2010 Uncategorized No comments ,

As a proof of concept i’ve been working on a small LOLCODE (lolcode.com) interpreter thing… for PHP.

Yes, this is a basic implementation of LOLCODE (an interpreted language) written in PHP (an interpreted language).

At the moment there’s enough implemented to parse and run the following…

HAI. BTW THIS IS A COMMENT. VISIBLE “OH HAI, WORLD”. KTHXBAI.

You can find the source code up on github at github if you want to have a play (just extend \Lol\Token – see visible.php for an example).