php

Writing RESTful clients

July 18, 2012 Uncategorized 3 comments , ,

There’s plenty of articles out there on how to build the perfect REST API, so I am not going to go into that. This article is all about what to think about when you’re building the perfect client for a REST API. What you’re being served is clearly important – but how you use that information is 50% of the problem – and it needs to be done correctly.

API designers can spend a lot of time ensuring that their responses to you are perfectly well formed and self describing – but unless you pay attention to what it is that those messages are telling you, all that information is simply being thrown away.

Connection Negotiation

The very first thing your HTTP client will send to the server is the Request headers. Within that request, more often than not, you will send the Accept header and a list of media types that your client can support – and the order of preference within which you would like to see them. Chrome 20.0.1132.47 sends the following Accept header.

Accept:text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8

Obviously this is a web browser, so it’s preference is going to be for text/html or application/xhtml+xml. Failing that, we can also take application/xml. If the server still can’t handle the request – then we’ll take anything at all (*/*).

The web server will make a decision based on what your client has asked for. From that point on one of 3 things can happen. 1) You get back one of the media formats that you asked for. 2) You receive a 406 response (Not Acceptable). 3) You get back something you didn’t ask for at all (this is allowed as per RFC2616 – the server SHOULD only respond with a 406).

This means that unless you’re explicitly checking what the response is after the request comes back – you could find yourself only assuming that the server is behaving as you think it is.

The Accept header becomes very useful when you separate out a normal response from an error response. In the situation where a server error is causing a problem, you are likely to receive a 5xx response code from the server with little or no meaningful body (text/plain). Where an application error has occured you should also receive a 5xx (or a 4xx) code from the server – however having declared you are able to handle a more specific media type (application/vnd.error+xml for example), you are able to present more information to the user than you would otherwise be able to.

Take the situation where you are consuming an API which serves up application/hal+xml and text/plain (hal for the resource representation, and a plain text for an error). The server is free to be able to add support for representing errors in application/vnd.error+xml without affecting any existing client – as long as the server continues to serve text/plain unless the new media type is deemed as acceptable to the client. Once the client is updated to add support for the new media type, it can take advantage of the additional information that can be provided.

Cache

A RESTful API will contain explicit directives that declare if and for how long the response body can be cached for. This information is contained across Cache-Control / Vary headers or Expires header. If the response to your request comes back from the server and declared itself as being able to be cached (only Cache-Control: no-store is excluded from user-agent caches) then you should store it locally and serve up the cached response as long as it’s valid.

There are of course already libraries you can use in your projects that will handle cacheing for you. If you usually use PHP for this stuff (I do), then I highly recommend you take a look at Guzzle to handle this for you.

URLs

Yeah i’m going to talk about it again. Hypertext as the engine of application state. Golden rule #1, never construct links in code (unless you’re following a URI Template) – follow them from the links given to you in the resource. This means that when these links change – you don’t have to worry about your app breaking. #2, use the defined link relations of the hypermedia type to navigate through the application (ie, HTML’s ‘rel’ attribute on the link tag). That’s about it. Don’t assume anything if it’s not defined by the media type rules and the representation you receive and you can’t go wrong.

PHP 5.4 web server

April 11, 2012 Uncategorized No comments

For me, one of the most exciting things that we have in PHP 5.4 is its built in web server. It makes being able to jump in the deep end with new ideas much easier and avoids the tedious work in setting up and needing apache to run before you have code to deploy. I have been playing around with the Silex framework in the last few weeks to build a prototype of a REST API that I have been thinking about. This is how I got going with Silex once I had PHP 5.4 installed.


wget http://silex.sensiolabs.org/get/silex.phar
php -S localhost:8080

Create an index.php in the current directory (there’s an example app on the Silex home page). That’s it. Go play. :)

Magento Enterprise PageCache

January 5, 2012 Uncategorized 9 comments ,

I have split this post into two parts – starting with how pages are cached in the first place, and then how Magento retrieves a page from the cache before falling back to dispatching a controller.

How pages are cached

The Magento enterprise page cache module observes several events that are fired during the course of dispatching your controller and sending a full page in the response. I’m going to dive into only a few of them – the main observers that actually cause your content to end up in the page cache, and then exactly how Magento gets it back out again.

The first thing we’re going to look at is how Magento caches pages, and that starts with the observation of the ‘controller_action_predispatch’ event – triggered before a request is dispatched (but after we have determined we will not be fulfilling the request from the cache). It’s here that the PageCache module will determine if the request that’s about to be dispatched is to be cached for subsequent requests. There are many things that will prevent a page from being cached and we can start by looking at these.

Examine the processPreDispatch method in Enterprise_PageCache_Model_Observer. This is the event that is fired on the controller_action_predispatch event. All the handlers first check to ensure that the page cache is enabled before doing anything else. Then we examine the no cache cookie and set a flag (memorize_disabled) in the session.

$this->_processor->canProcessRequest($request) && $this->_processor->getRequestProcessor($request)

The first of the above two condiditions does some basic validation on the request. $this->_processor will normally be an instance of ‘Enterprise_PageCache_Model_Processor’ by this point (it’s instantiated by the ‘enterprise_pagecache/processor’ model here so can be overridden), and here we will find the canProcessRequest method. First, we dip into the isAllowed method where we check that we’ve generated a requestId during the construction of this object. The requestId will become important later on in the process as it provides the key into your cache storage backend for the page content and needs to be reliably recreated based on the request parameters. We then check that the request is not over https, the no cache cookie (NO_CACHE) is not present, the no_cache GET variable is not present and that full page caching is enabled. If any of these checks fail we do not process this request any further in the full page cache module.

Assuming we pass, we move onto checking the configuration. We inspect the page depth parameter (the number of variations a page can have based on the number of variables in the query string). If we’re over the limit, the page is not cached. Then we check the cache settings for pages that are not in the default currency (system/page_cache/multicurrency = 0 and the presence of the CURRENCY cookie in the request).

The second of the two checks above is where we examine how the page is going to be cached. This is configured in ‘frontend/cache/requests’ and the defaults are found in app/code/core/Enterprise/PageCache/etc/config.xml in the following section of XML. By default all cms, catalog/category and catalog/product pages have a processor configured.

<frontend>
    ...
    <cache>
        <requests>
            <cms>enterprise_pagecache/processor_default</cms>
            <catalog>
                <category>
                    <view>enterprise_pagecache/processor_category</view>
                </category>
            </catalog>
            <catalog>
                <product>
                    <view>enterprise_pagecache/processor_product</view>
                </product>
            </catalog>
        </requests>
    </cache>
</frontend>

getRequestProcessor is where this configuration is examined and we we use the request data to attempt to construct a processor for the target page. Assuming one is configured, Magento is now ready to cache the page for the next request.

If at this point it looks as though we’re going to cache the page, the independent block cache is disabled and the first peice of setting up the cache is complete.

The next time we’re back in the PageCache code is once the request has been handled and we’re about to send the response. This is accomplished by hooking into the controller_front_send_response_before event and executing the ‘cacheResponse’ method on the observer. We very quickly end up back in the processor and in the processRequestResponse method.

The first two calls will be familiar. The same methods (canProcessRequest and getRequestProcessor) are called again – however the processor for this request already exists so the second method returns immediately. We delegate to the configured processor (configured above) at this point and call the ‘allowCache’ method with the Request object as a parameter. A number of further checks are done which could prevent the request from being cached – the presence of ‘___store’ or ‘___from_store’ in the query string (for the default processor) or the presence of the no cache flag in the session data.

Finally we’re actually ready to cache the response. Some filtering occurs to strip out blocks and replace them with placeholders (blocks can be cached independently from the page – to allow for independant cache rules – like the navigation menu which can persist across multiple pages). The placeholder configuration is stored in app/code/core/Enterprise/PageCache/etc/cache.xml and is structured like the following (this is a real example for the catalog/navigation block).

<config>
    <placeholders>
        <catalog_navigation>
            <block>catalog/navigation</block>
            <name>catalog.topnav</name>
            <placeholder>CATALOG_NAVIGATION</placeholder>
            <container>Enterprise_PageCache_Model_Container_Catalognavigation</container>
            <cache_lifetime>86400</cache_lifetime>
        </catalog_navigation>
    </placeholders>
</config>

The above peice of XML declares the catalog/navigation block to be cached separately to any page that is appears on with CATALOG_NAVIGATION as the placeholder. It specifies the container class (used during storing and extracting the block) and the lifetime in seconds for the cached block.

Blocks are marked up at the time they are rendered and you can view them in the HTML source code between comment blocks that consist of the placeholder value defined in cache.xml, and a hash that is made up by the md5 sum of the results of calling the getCacheKey. They look something like the following (again, using CATALOG_NAVIGATION as an example).

<!--{CATALOG_NAVIGATION_a61df2dc9b9e5868f17a56461177d8c4}-->
<p>some html</p>
<!--/{CATALOG_NAVIGATION_a61df2dc9b9e5868f17a56461177d8c4}-->

This functionality gives you the ability to be able to cache blocks separately to the full page – with their own rules on how long they stay in the cache for (and independently of the page that they are on). This means that blocks can be made to be cached across pages rather than within them – meaning that blocks such as catalog/navigation need only be generated once – and served up from the cache for every request after that regardless of if the page itself is being served from the cache.

How pages are served up from the cache

Rather unlike how pages end up in the cache (using observers to watch for specific points in the dispatch process for a request), the method by which Magento retrieves information from the cache is somewhat more hard coded. Before dispatching the front controller in Mage_Core_Model_App::run (one of the very first methods that gets called in fulfilling a request) a check is done to see if the current request can be served up from the pages in the cache. This happens inside Mage_Core_Model_Cache::processRequest where we immediately check to see if there is a request processor configured that we can check the request against. The request processors are configured inside app/etc/enterprise.xml – the default one is set to Enterprise_PageCache_Model_Processor (predictably the same class used to cache the results of a dispatched request).

This is quite important – if you have a reason to extend the logic in Enterprise_PageCache_Model_Processor (like adding data that forms part of the cache id for a page), you not only have to extend it and rewrite ‘enterprise_pagecache/processor’, but also make sure that it’s added to this list of request processors so that Magento has a chance of being able to retrieve your items back out of the cache again.

Magento then loops through all the configured request processors, and calls extractContent on each of them. The first thing that this function does is create the cache id based on the request parameters (this happens when the processor is constructed, when the _createRequestIds method is called – as we saw earlier when the content was being cached in the first place). We check to see if any design changes have been cached for the current page, and then check the configured cache storage to see if we can retrieve the full page content for this request. If we do not get a match at this point, the request is considered a cache miss – and we return to the run method and start the process of dispatching the request (and potentially populating the cache with the results).

Assuming that we have a cache hit, we decompress it (content will be gzipped whenever possible in the cache storage to minimise space), and the content is processed by _processContent to replace all of the placeholders with separate cached blocks (ie, the catalog/navigation block that we cached separately when putting the page into the cache in the first place).

When writing about how pages are served up from the cache I referred to the configured cache storage. Prior to Magento EE 1.11, the full page cache simply used what you configured in the section in app/etc/local.xml. From 1.11 this has changed and you must now also have a section in your local.xml configuration (it uses exactly the same options as the config).

The aim of this post is to give an overview of how page caching works in magento and I am intending on following it up with some more detailed investigations into how more areas of cacheing work. If you have a specific request for more information, please do leave a comment and it may form part of a future blog post.

PHP on Azure

February 13, 2011 Uncategorized 2 comments ,

When I first started out playing with PHP a number of years ago, I did it on my home PC running windows. I remember manually setting up apache (I never tried with IIS), configuring php to work with it and getting a mysql database up and running. The whole process was time consuming and frustrating – without package management it’s not a particularly pleasant experience.

With Microsofts Azure cloud platform available on a trial [link], and with PHP Benelux running the PHP on Azure contest [link] I decided it would be a good opportunity to have another go at using the windows environment as a hosting platform – though this time using the Web Platform Installer (WebPI) with IIS and SQL Server instead of my traditional LAMP stack.

WebPI promised that this would be a simple process. It certainly looks the part, and has a good range of PHP apps that can be setup out the box. Unfortunately as I was getting all the pre-requisites installed in my Windows 7 VM, the PHP command line tools didn’t appear to have the Windows Azure SDK available as a dependency. Nor was it available when I searched for it within WebPI.

SQL Server did install ok though, so I followed the manual instructions [link] to get IIS set up with PHP, and get the SDK installed.

That was fine – up until the point that I was building my first test deployment to my local development ‘cloud’. It transpires that the package.php script (used to build an azure package out of your php project ready for deployment) assumes that your username does not contain a space. Mine was ‘Ben Longden’.

So I created a new user account and tried again. None of the Azure SDK tools worked under my new user and just claimed that ‘A fatal error occured’, but reinstalling them helped. The next issue was that my new user doesn’t have access to connect to the SQL server installation (this is something that I have not yet resolved!).

So not plain sailing so far! I found it quite frustrating in all (compared to ‘sudo tasksel install lamp-server’), however my initial deployment to the Azure Cloud went fine in the end, once the environment was set up.

Initial concerns are with how the development process will work. Up next, will be setting up IIS as a development environment that I can work with, and automating the deployment to my local cloud.

LOLCODE and PHP

June 12, 2010 Uncategorized No comments ,

As a proof of concept i’ve been working on a small LOLCODE (lolcode.com) interpreter thing… for PHP.

Yes, this is a basic implementation of LOLCODE (an interpreted language) written in PHP (an interpreted language).

At the moment there’s enough implemented to parse and run the following…

HAI. BTW THIS IS A COMMENT. VISIBLE “OH HAI, WORLD”. KTHXBAI.

You can find the source code up on github at github if you want to have a play (just extend \Lol\Token – see visible.php for an example).

ORM ORLY?

November 21, 2009 Uncategorized 2 comments ,

I’m regularly thinking about how to represent data in a relational database in OO PHP5 that doesn’t make me walk away feeling like i’ve just created something that smells bad.

The key thing for me is that my code should not care that the data is coming from a relational database. This poses one or two issues.

  • In a relational database, data typically has other ‘meta’ data associated with it. Id’s, timestamps and other snippets of information that are not strictly part of what my object is trying to represent.
  • If my object is not aware of how it should serialise itself, who is?

This of course isn’t a new problem. My language of choice (currently PHP5) has many ORM libraries available to it – Doctrine, Propel and more – most (or all) of which are loosely based around the Active Record design pattern.

My only problem with these frameworks is that they seem to do too much ‘magic’ for me (and I like to retain at least some control over what’s happening between my app and the database). Of course, it could also be that I have either not invested enough time in learning to use one of them to it’s maximum potential.

Despite that, the Active Record pattern appears to be quite a satisfactory way of creating that link between my application, and how it’s storing it’s data. My core objects can exist how I want them to, and their ‘datastore’ can be represented by an active record object (who’s attributes are exactly the same as the columns in the database) on the class itself. It gives me the degree of separation that i’ve been looking for.

However there’s still a piece missing. My core class still has to contain the relevant logic to load it’s associated active record. This suggests some static factory methods (I refer to my earlier comment on things that smell bad!), so i’m going to create a ‘Builder’ for my core object that’s concerned with marrying up the active record with the core object itself. Here’s how it looks (simplified) in a class diagram.

ActiveRecord class diagram

ActiveRecord class diagram

I’ve not closed the doors on Doctrine or Propel though, so it’ll be interesting so see just what an established ORM framework like Doctrine can do for me on top of this basic implementation.

Optimising PHP (a start…)

May 22, 2009 Uncategorized No comments

Scaling to thousands of users is often a problem for web applications – and in fact any system that has a sudden large influx of new users. In recent times we have had a situation where we had inherited a fairly simple web application that was massively under performing when a single user accessed a reasonably large dataset from a MySQL database.

Immediately we blamed the database – it must be the bottleneck – located on a different server and transferring a result in excess of 10,000 records. But when performing the query from a DB client there was virtually nothing in terms of a delay.

So what was the problem? The code was reasonably simple. An in house database abstraction layer was used to construct a representation of each row returned from the database and stored it in a big array. Common OO style promoted the use of a series of ‘setter’ methods to store the data in each constructed object.

class MyClass
{
        private $animal = null;
        private $mineral = null;
        private $vegetable = null;
 
        public function __construct()
        {
        }
 
        public function setAnimal($animal)
        {
                $this->animal = $animal;
        }
 
        public function setMineral($mineral)
        {
                $this->mineral = $mineral;
        }
 
        public function setVegetable($vegetable)
        {
                $this->vegetable = $vegetable;
        }
}

This block of code simulated how each of the objects were being built – using the data as returned from the database.

for($i=0; $i<10000; $i++) {
        $test = new MyClass();
        $test->setAnimal('lion');
        $test->setMineral('stone');
        $test->setVegetable('potato');
}

Surprisingly (or perhaps not…) this code takes around 4.5 seconds to run (obviously depends on hardware!). Quite an overhead when you’re targeting a 2 seconds per page load time!

Looking at the code – the constructor is clearly redundant. It’s not serving any useful purpose. Removing it cut off around half a second of page load time (average).

Removing one of the setters saved another second or so… removing all the setters (just constructing a blank object 10,000 times cut’s it down to 1/10th of a second! Could method calls in php be that expensive?

We can create the same result by doing all of this in a constructor – consider the following.

class MyClass
{
        private $animal = null;
        private $mineral = null;
        private $vegetable = null;
 
        public function __construct($animal, $mineral, $vegetable)
        {
                $this->animal = $animal;
                $this->mineral = $mineral;
                $this->vegetable = $vegetable;
        }
}
 
for($i=0; $i<10000; $i++) {
        $test = new MyClass('lion', 'stone', 'potato');
}

This cut’s things down to just under two seconds – definitely better – but not where we want to be. We know that the presence of the constructor (presumably it’s just the overhead of the method call itself) is costing us some time. What if the class was simply a data container and offered nothing in the way of accessors?

class MyClass
{
        public $animal = null;
        public $mineral = null;
        public $vegetable = null;
}
 
for($i=0; $i<10000; $i++) {
        $test = new MyClass();
        $test->animal = 'lion';
        $test->mineral = 'stone';
        $test->vegetable = 'potato';
}

3/10ths of a second. Ok – now we are getting somewhere! For 10k results – this is probably acceptable.

It appears that object creation and method calling is actually quite an expensive operation within PHP – if we actually had to deal with 100k rows, the fastest way of doing it is always to just use an array. The 10k test completes in less than 1/10th of a second.

A more accurate analysis of just how much object instantiation and method calls cost could be done using xdebug to profile how long each line of code actually takes to execute. But for my purposes, this worked out well.

My solution to the problem of VAT…

November 28, 2008 Uncategorized No comments

After lot’s of people spent a lot of time at work refactoring bad/old code after Alistair Darling announced that VAT would be reduced to 15% instead of 17.5% (and that people needed to ‘contact their software vendors to update the VAT field… riiight), I thought i’d stub out my own design for handling Price and Tax (whether it’s VAT or not shouldn’t really matter). Here’s a start. It follows the idea that where possible, a class should be immutable. Don’t modify an existing cost, create a new one and return that. Anyway – here’s the code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class Price
{
	protected $value = 0;
 
	protected $currency = null;
 
	public function __construct($value, $currency = 'GBP')
	{
		$this->value = $value;
		$this->currency = $currency;
	}
 
	public function getValue()
	{
		return $this->value;
	}
 
	public function getCurrency()
	{
		return $this->currency;
	}
 
	public function __toString()
	{
		return "{$this->value}";
	}
}

That’s all you should need for the price. Think about the money in your pocket – it has two property’s. It has a value, and it has a currency. I cannot change either without spending it (or converting it at a bureau de change), but then I would get a set of new coins in return. It has no concept of whether or not it is inclusive or exclusive of tax. Thats up to something else…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class TaxMan
{
	protected static $vat = array(
		'GBP' => 15.0
	);
 
	protected $currency = '';
 
	public function __construct($currency)
	{
		if(array_key_exists($currency, self::$vat)) {
			$this->currency = $currency;
		} else {
			throw new Exception('Cannot build a TaxMan for this currency');
		}
	}
 
	public function getVat(Price $price)
	{
		return new Price(($price->getValue() * (self::$vat[$this->currency]) / 100), $this->currency);
	}
}

So we can pass in a price object and get a new value representing how much vat should be added… Something like this.

1
2
3
4
5
$price = new Price(8.00);
$taxman = new TaxMan('GBP');
$vat = $taxman->getVat($price);
 
$total = new Price($price->getValue() + $vat->getValue(), $price->getCurrency());

This is not completely finished however (and I may come back to it to complete it), as it’s completely possible for multiple different tax levels within a currency (Currency can be used in multiple countries, or different products have varying levels of VAT associated with them). This will solve some of the problem – but there’s more work to be done.