Friday, December 12, 2008

Rewriting your urls to clean permanent looking links

There are several articles out there on using mod_rewrite and .htaccess to convert from dynamic pages with query strings in the urls, to a cleaner more permanent looking link. But even after reading the documentation on RewriteCond and RewriteRule it took me a few hours of experimentation to get the desired effect. So I thought I would share my settings to help serve as examples for others trying to do something similar.


Using PHP and MySQL I have pages that are dynamically created from a database. The URLs for those pages look like:
http://budgetweb.com/budgetweb/details.php?id=2825&provider=CravisHostFive
I wanted to use mod_rewrite so that the links in my feed would look like
http://budgetweb.com/service/2825/CravisHostFive/
Here's what I added to the .htaccess in the root of my web site
RewriteEngine on
RewriteRule ^service/([0-9]+)/([^/]*)/$ /budgetweb/details.php?id=$1&provider=$2
I also wanted to handle links typed in which did not have the trailing slash. Here's what I added for that.
RewriteEngine on
RewriteRule ^service/([0-9]+)/([^/]*)$ /service/$1/$2/ [R]
RewriteRule ^service/([0-9]+)/([^/]*)/$ /budgetweb/details.php?id=$1&provider=$2
This rewrite rule has [R] which results in a round trip to the browser, so that the link in the browser bar is corrected as well. Then on the subsequent request back to the server the path matches the second pattern.

Lastly I wanted to redirect all the existing links to this new URL structure, so that all the previously indexed pages would get updated. This proves to be more tricky than at first I understood. The path provided to the RewriteRule does not include the query string - i.e. everything after the question mark (?) in the url. So you need to use RewriteCond. Here's what I added.
RewriteCond %{QUERY_STRING} id=([0-9]+)\&provider=(.*)
RewriteRule ^budgetweb/details.php$ /service/%1/%2/? [R=301]
This works great. The %1 and %2 in the RewriteRule match the portions of the regular expression in the RewriteCond surrounded by (). The other trick here was to have the trailing question mark in the RewriteRule. Without that, the original query string is re-appended to the rewritten URL, resulting in an infinite recursion. Lastly, the [R=301] sends a 301 Permanent Redirect back to the browser.

Now one problem combining all the rules above, is that even with the trailing question mark I've still got an infinite recursion. The details.php URL is rewritten to the /service/ URL which is then rewritten back. I couldn't work out how to block this in the rewriten rule, so instead I just created a copy of my PHP from details.php to service.php and ended up with these rules in my .htaccess
RewriteEngine on
RewriteCond %{QUERY_STRING} id=([0-9]+)\&provider=(.*)
RewriteRule ^budgetweb/details.php$ /service/%1/%2/? [R=301]
RewriteRule ^service/([0-9]+)/([^/]*)$ /service/$1/$2/ [R]
RewriteRule ^service/([0-9]+)/([^/]*)/$ /budgetweb/service.php?id=$1&provider=$2
The end result is
http://budgetweb.com/budgetweb/details.php?id=2825&provider=CravisHostFive
Permanently Redirects to http://budgetweb.com/service/2825/CravisHostFive/
http://budgetweb.com/service/2825/CravisHostFive
Temporarily Redirects to http://budgetweb.com/service/2825/CravisHostFive/
http://budgetweb.com/service/2825/CravisHostFive/
Rewrites to...
http://budgetweb.com/budgetweb/service.php?id=2825&provider=CravisHostFive

Tuesday, December 09, 2008

Tracking subdomains in Google Analytics in one profile

Google Analytics lets you track subdomains in multiple profiles or a single profile. This article describes how to set it up in a single profile.

If you follow the first steps in the article then visitors to the following pages
  • http://www.budgetweb.com/
  • http://www.budgetweb.com/whatsnew.php
  • http://blog.budgetweb.com/
  • http://blog.budgetweb.com/1997/12/some-history-of-budgetwebcom.html
Show up in Google Analytics as
  • /
  • /whatsnew.php
  • /1997/12/some-history-of-budgetwebcom.html
The important point here is you can't distinguish between visitors to identical pages. So in the above report, Google Analytics shows / which combines visitors to the home page of both subdomains. So the article goes on to provide steps for how to prefix the page names with the domain name to distinguish. If you follow those steps then the above pages would show up as
  • /www.budgetweb.com/
  • /www.budgetweb.com/whatsnew.php
  • /blog.budgetweb.com/
  • /blog.budgetweb.com/1997/12/some-history-of-budgetwebcom.html
I implemented that, and it worked fine. But for me there were a couple of drawbacks.
  1. The filter only applies to new traffic, so your content pages no longer match your historical tracking, and you may have to update your goals or other custom reports
  2. Visits to www.budgetweb.com and budgetweb.com will result in separate page content showing up in the reports 
The latter means that where previously requests to www.budgetweb.com and budgetweb.com would be combined in the content report, they now show up as separate requests. So visits to the home pages show up in the reports as
  • www.budgetweb.com/
  • budgetweb.com/
This particular issue can be addressed with a search and replace filter as follows
  • Filter Type: Custom Filter / Search and Replace
  • Filter Field: Hostname
  • Search String: ^www\.budgetweb\.com
  • Replace String: budgetweb.com
  • Case Sensitive: No
Just remember to put this filter ahead of the one setup in the article mentioned above.

But there's another approach that addresses both of the issues above, at least for me. If you've got history on one domain, and you are interested in tracking a subdomain in that same google analytics web profile, then this should work for you too.
  • Filter Type: Custom Filter / Advanced
  • Field A->Extract A: Hostname / (blog).*
  • Field B->Extract B: Request URI / (.*)
  • Output To->Constructor: Request URI / /$A1$B1
  • Field A Required: Yes
  • Field B Required: No
  • Override Output Field: Yes
  • Case Sensitive: No
After implementing this approach, the 4 requests mentioned above show up in the content report as
  • /
  • /whatsnew.php
  • /blog/
  • /blog/1997/12/some-history-of-budgetwebcom.html
This means that the requests for www.budgetweb.com continue to match the history, and no changes need to be made in any goals or custom reports. And the new content shows up with a convenient short directory prefixed to the content pages.