Home | About | Contact
CV & Portfolio at JamesDavidJackson.com
Wednesday, 8th February 2012

The Blog of James Jackson // Tags // caching

Over the weekend I ran a script to test the time it takes my server to process the home page of nojacko.com. I wasn't testing the time the page took to load but the time the server took to process it. I created a script that ran every 20 minutes, loaded the homepage's html, and grabbed he benchmarking information from the footer (which looks like "Page built in X.XXXX with Y queries & Z caches"). The script would first delete the site's cache and then load the site 10 times. This got the benchmark data for the site with caching as after the first load the site would have been cached. After this the script would again load the site another 10 times but before each load the cache would be deleted, getting the data for the site without cached data. I let this run for most of the weekend and here's my findings.

Something I have to point out is this site is hosted on a shared server (by HostGator who I highly recommend), so server side optimising is left to HostGator. Also other sites share the server so load that they create could have affected the results.

The first finding is easy to see. The site without caching takes 49% longer time to load on average. This is something I guessed would happen because with caching the site doesn't have to connect to the database and query it. The minimum and maximum loading times also show this. The maximum and minimum loading time for caching off was 41% and 75% longer, respectively, then when caching was on. 41-75% is a very noticeable difference. To put it into context if the site was maxing out 3 servers without caching turning caching on would in theory allow the site to run on only 2 servers.

Average Times

The next thing I noticed was that when the database was being queried the difference between the average and maximum loading times was much higher than when cached. It seems that using a database the loading time is more variable. You can see from the scatter graph below that the cached times are more closely grouped between 0.01 and 0.1 seconds, whereas the non-cached times are more spread out between 0.02 and 0.5 seconds. Also, there were 11 instances of the non-cached site taking more than half a second and only 3 when the site is cached.

Scatter Graph

So the testing shows caching is worth while. In some cases your might not be able to cache or you might have a fast enough database server that the difference isn't noticeable but there is a good case for it. Increasing speed with caching can also save money and energy consumption as less servers are needed which cost both money and electricity.

I intend to do some more testing in the future. The next set of tests I do will probably be with part or full page caching, where I'll cache html outputs and not just the results of database queries.

More...

A few weeks back I had a training session with Positive Internet as part of my job at Jadu. The training was on Linux, focused towards setting up and managing a server. The training was really interesting and I learnt a lot. However, the one thing really stuck out for me was performance.

Modern websites are dynamic. They do so much behind the scenes to serve each and every page. Generally it does this all in a fraction of a second and that's what we want, a page displayed and fast. Great! It's all well and good when it's working and you get the page in under a 1/10th of a second but what if you get a spike in traffic from a site like digg? In this scenario you might get 100s of people visiting your site at the same time. Your page that loaded in a 1/10th of a second for one user could easily take 10 seconds or more to load for some users and that's assuming you only get 100 visitors in the space of a second. What if you get 100 visitors every second? Then the server wouldn't be able to handle the requests and probably crash, making your site totally unavailable. Just imagine how much you could lose from your site going down at such an opportune moment? It's not good.

Most web developers would know the above and have solutions to the problems. I did. One time TheWebDump.com was picking up traffic and my hosting company disabled the site. The reason they disabled the site was because it was causing too much load on the shared server. They had every right to do so. So I had to figure out why it was causing load and then come up with a solution. The cause was each page repeatedly made the same queries over and over. Some of these were quite complex which didn't help. The solution that I found was to cache the output of part the page that was included in every page into a file and just load it from there every time. Caching solved the problem for me.

The other option was the other type of cache, cash. I could have just moved to a dedicated server and got rid of my problems with cash. Throwing money at things isn't the way to fix problems. The site might have grown more and would need another server and possibly another and it would have cost too much. Caching is the answer. Granted sometimes you cannot cache or already cache as much as you can and cash is the only solution. If this is the case you should really be making enought money that spending cash to keep making money isn't going to cause problems.

Well that's was really all my exposure to caching uptill this training. The guys teaching demonstrated, with Apache Bench and a WordPress install, how a server can easily crash with not really that many users. They then installed a program that can follow the files being used by a process. So this was set to watch WordPress and it was really amazing how many files WordPress was using. WordPress was accessing the database and including the same php files every time the homepage was visited. Here I found out something I kinda knew but never thought about before. Everytime a php file is included into another php file it has to be interpated. Doesn't matter how simple the script you include is it still has to be interpated and then executed. This may take no time at all but it really adds up. Some of the php files were just basic template or skin files that rarely ever change and simply by making them static html files so much CPU time could have been saved.

I've not really touched on the reason I am posting this other than I'm just sharing some experiences and knowledge. Well the reason being is I was also told about query caching. It's simple. Make a database query and store the results in a file. This is something I'd never thought of but really stuck in my head after this training. I wanted to try this out and it just happened that I was getting back into my CMS. So one night I decided to test this out.

What I did:
1. Created a cache class,
2. Modified my database class to use this,  
3. Created a simple SELECT query that could be cached and,
4. Ran 100, 1,000 and 10,000 queries in 2 modes (not cached and cached) on my local WAMP server.

The results

Queries 100 1,000 10,000
SQLed 0.33330 3.19370 31.73474
Cached 0.03092 0.24735  2.58050
Cache was 10.77 x faster
12.91 x faster 12.29 x faster

 

I was happy to see it in action and it really showed me the benefits of caching queries. There was a problem with the test. That being that it wasn't a real world example. I cannot see any time I'd need to run the same query 100-10,000 times in one script execution. So I did another test. This time I looked at the time it took to generate my homepage. The homepage at the time consisted of only 3 queries but it's a realistic test. So I loaded the page with no cache, revisited the page and then deleted the cache. I did this 4 times and recorded the results.

 

Initial Load: 0.14047 0.15179 0.14206 0.15073 AVG: 0.1462625
Load from Cache: 0.03928 0.05372 0.05624 0.03871 AVG: 0.0469875

 

Note: Tests were run on a laptop running WAMP. It's not really a server machine but the results show good proof that caching can really make a difference.

You can see that loading from cache was much faster. On average it was 3.11279 times faster. So in theory if site was running at full capacity on 3 servers I could bring in caching and completely remove 2 servers. That 3 times faster loading is directly proportional to server costs. Cutting 2/3rds of any budget is great and all it took was about an hour of fiddling with some code. Even if caching doesn't cut costs it can help else where. Such as getting listed on search engines. Google News states on it's site that it only lists websites that respond fast so caching could really help you get more traffic. And with all this extra traffic you'll be glad your site is caching.

That's all. Hope someone finds this helpful. When I make live my latest version of the CMS I'll run more tests on the live server and post the findings.

More...

About Me

I am James Jackson, a web developer. I  graduated in 2008 from the University of Leicester with a  2:1 in Computer Science (BSc). To find out more about me and my skills please visit James David Jackson.com.

 

Would you like to know more? Visit the about page.

 

James  Jackson on Facebook James  Jackson on Twitter James Jackson's YouTube James Jackson  on Google James Jackson  on Google James Jackson  on Google James Jackson's  RSS Feed

 

Follow me on Spotify