up

Zoompf's Web Performance Blog

The Challenge of Dynamically Generating Static Content

 Billy Hoffman on December 7, 2009. Category: Optimization
TwitterLinkedInGoogle+FacebookShare
php_code

Time and time again I see people using PHP or some other application logic to try and hack around some issue they are facing. We saw this in our previous post Questions to Ask Hosting Providers: Web Server Configuration where people would use PHP to emulate mod_deflate or mod_expires. Andrew King, in his book Website Optimization talks about wrapping developer comments in CSS or JavaScript files in <?php ?> tags and using the PHP interpreter to remove them. People use PHP to combine CSS or JavaScript resources together. And today I read an article from the always awesome Chris Coyier over at css-tricks.com about using PHP to emulate CSS variables.

Don’t get me wrong. I was actually bemoaning the lack of variables in CSS two days before Chris wrote his article. (Actually, what we really want is more like C/C++ macros but that’s another story). Anyone who has tried to implement CSS sprites, change margins or element sizes, or modify color values knows what a pain it is to go through a CSS file and type the same thing over and over.

Using PHP to solve this problem, or any of the other problems listed above, makes perfect sense at first. Because it makes things easy. Because you are all being lazy. You are using a runtime mechanism to try and simplify your life.

Stop Being Lazy!

Now, under normal circumstances programmers should be lazy! After all your very job is to create something that does work for you! Unfortunately in this case your laziness is harming the performance of your application. Using application logic to dynamically generate static content at runtime is a massively bad idea. Consider these 4 consequences:

  • You take an order of magnitude performance hit for invoking the application tier instead of just serving a flat static file from the file system.
  • Since the web server is not serving a static file, there will be no Last-Modified header sent by default. That means no conditional GETs and no 304 responses which means lots of bandwidth consumption.
  • PHP, like virtually all application tiers, produces a chucked response. This is because the web server has no idea what the content length will be because it is dynamically generated. Dynamically generated chunked responses will not send the Accept-Range header. This means no pausing or resuming or error recovering. The entire resource must be re-downloaded.
  • Chunked encoding is not supported with HTTP/1.0, so any HTTP/1.0 device (like every caching proxy ever made) has to flip into “store and forward” mode where it downloads the entire response before passing it along.

And as if all these downsides for invoking the application tier was not enough, we have my personal favorite: Web Security! As someone who professionally broke into computer systems for many years when I see:

http://example.com/combine.php?files=a.js|b.js|c.js

I get very excited. Think about what a resource combiner script does. “Hey website, I’m going to give you a list of files on your hard drive, and I want you to read them off the disk, one at a time, and dump their raw contents into a response and send it to me!” Jackpot baby! This is what we call a Local File Inclusion vulnerability just waiting to happen. The developer has not so much created a resource combiner as they have provided me with a rudimentary remote file download service! I immediately do something like this:

http://example.com/combine.php?files=db.inc

In about 45 seconds I have downloaded the /etc/password file, your httpd.conf, your .htaccess, your raw mysql database, you app config files filled or user names, passwords, and database connection strings, and each PHP file to retrieve all your source code. Or worse I perform remote file inclusion, thereby injecting a PHP-Shell, which allows me to completely take over your website! (BTW: Roughly one in every 3 PHP resource combiner scripts I have seen contains these security vulnerabilities. Beware where you get your source code!)

The Fundamental Problem

The fundamentally problem in all of these examples are developers are getting lazy and are using PHP code to do something at runtime that should have been done earlier.

Properly Generating Static Content

Great! So what is a web developers to do? Go back to the dark ages where you cannot leverage all that great application logic in the generation of our content? I want my CSS variables and I want them now! Notice I never said you cannot dynamically generate static content! I just said you should not dynamically generate static content at runtime! Want CSS variables? Want to use a PHP script to combine resources or minify or whatever?Go ahead and do it! Just do it ahead of time. You can run your PHP script form the command line, produce your CSS file, complete with all the correct CDN paths and color values, and upload that to your website. And this isn’t just for PHP. Use Perl, Python, Ruby, Java, or whatever. You can even do it in QBASIC!

'CSSGEN.BAS - kicking it old school CDN$ = "http://zoompf.com/" LOGO$ = "includes/logo.png" PRINT ".logo {" PRINT " background: url("; CDN$ + LOGO$ + ");" PRINT "}"

And the output:

qbasic-css-gen

(Thats right. I totally just used QBasic 1.1 from DOS 5.0 to automate publishing a web application on 64bit Vista. Oh yeah!)

The moral of the story is never make the user pay for your laziness. Do not use the application tier of a website to dynamically generate static content at runtime. Instead do it at publishing time or even do it in a daily or hourly cron job. This approach allows you all the advantages of using application logic without drastically reducing the very web performance you were trying to improve in the first place!

Comments

    December 14, 2009 at 8:52 am

    [...] This post was Twitted by zoompf [...]

    January 31, 2010 at 2:55 pm

    I agree with your first drawback, invoking PHP instead of serving a static file introduces a performance hit… but is it not true that your other listed drawbacks can be solved by outputting the proper headers to the user? i.e. last-modified, long expire dates, set no cookies, etc.
    If you have a frequently edited site (css & js) you’ll have to run your compile script every time you make changes (at least on the production server). I’m not sure if that is worth a little PHP performance hit if you make it cacheable.

    January 31, 2010 at 5:21 pm

    Michael,

    I’m doing some measuring right now to show how much of a hit it is and will post again soon.

    I am a firm believe of process. It the only way to make something easy and repeatable. You shouldn’t be editing your live files. Yo should have some kind of publishing process. This process, among other things, should automatically minify, crunch, and insert time-stamps for caching. In other words this process should do all this stuff for you and it should be painless and easy. Once this process is in place, it does not make how frequently you edit the site. The process takes care of it. Best of all you avoid the performance hit.

    March 4, 2010 at 5:34 pm

    I agree with the downsides of this approach, I did stumble across a couple of solutions of which I’ve combined into something I believe is workable (although more testing is needed). Here is my current solution:

    1) Add content handler for CSS files to be processed as PHP (via HTAccess)
    2) In the CSS-PHP file I have created a basic caching mechanism which checks on file-modified times to see if the file needs to be regenerated/minified/combined.
    3) If not serve the pre-generated file and send the headers for no ETags and Far Futures Expires
    4) Increment a number appended to the end of the generated file in the Header include (HTAccess rewrites any request from style-###.css to style.css, in addition this is invoked via a PHP script called from my SVN deployment server)

    I am sure there will be some performance hits but I am hoping that due to minifying/combining/caching the file and sending the correct headers, I can mitigate most problems. Any thoughts on what everyone else is doing?

    -Adam

    Azza
    March 19, 2010 at 2:37 am

    I work on a website that has over 50 themes. Being able to add php in css is a huge benefit as it makes the maintenance and consistency of the themes so much better.

    I’d be interested too see performance results on the following.
    Static CSS VS PHP/CSS with caching and expires set in the header.

    Rob Anderson
    May 10, 2010 at 5:07 am

    Thanks for this article. It reinforces to me that just because you can do something, doesn’t mean you should. I was about to embark down this path, but I can now see the benefits of not being lazy!

    June 2, 2010 at 8:58 pm

    We have multiple servers (development, staging, live) for our set up. Our CMS contains about a hundred themes, and at least 18 modules that can be used on every page. A build process that combines every single possibility doesn’t seem that useful in my eyes.

    We take the following approach.

    1) Check to see if the files to be combined have been modified since they were last built.
    2) If not, just output the single combined file.
    3) Convert all CSS background images to absolute paths.
    4) Store the static file in a known cache location. The name is generated using some hashing mechanism that can be computed at step 2. This will serve the file according to Apache rules with a last-modified time, though I am considering eliminating the last modified header as we also serve with a far future (1 year) expires header. I think the few bytes that are saved is worth it considering the very low chance that someone will re-visit the page in one year+ and STILL have the file in their cache considering the very low cache settings browsers default to.
    5) Once an hour run a cron job that checks to see if the static file had been modified. If so run a CSS/JS minifier on it.

    Yes, I recognize that we are wasting time checking the combination each time, but when you are rapidly updating content that seems like a minor trade off to me. I see the value in it, but see more value that can be delivered to users using other means (right now I’m adding in additional security hooks, though I am hoping to look in to a memcache solution after that).

    Yes, users that visit the page between the creation time and the cron job they’ll get a bit of a delay, but running the minification step during the combining step puts too much of a delay for users from my experience.

    June 28, 2010 at 5:39 am

    Billy,

    I see this is included in the Zoompf performance test. How exactly do you check for this? I think I’ve seen some false positives.

Comments are closed.