Rewriting the DotNetNuke Url Rewriter Module
As part of my ongoing interest in making DotNetNuke websites more person and search engine friendly, I started hunting around in the space again to see what was available. While the latest DotNetNuke releases have good ability to enter site rewrite rules (such as an automatic home.aspx redirection), the crucial problem for me is that the site doesn't generate simple Url's (instead it still outputs pagename/tabid/nn/default.aspx) I had previously looked at the Inventua Http module, and while it is a good product, it didn't quite suit my needs. I guess it came down to not being able to get the source code either. It also didn't work with the Catalook module, which I have done some work with and still support on some sites. Further searching brought me across Scott McCulloch's 'Friendly Urls' work on his site at http://www.ventrian.com/Resources/Projects/FriendlyUrls.aspx. (it's even a friendly url!) This showed a bit more promise, because he was automatically rewriting the Url's based on the path of the page. But Scott left 'Human Readable Urls for Pages With Multiple Parameters' listed under 'items for discussion'. Without going into the explicitly technical details, here's a quick primer. There are two facets to Url Rewriting in DotNetNuke: This is really the easy part, because theoretically you can generate the friendliest Url's in the world to show - it's only when they have to be re-interpreted that it becomes a problem! Having said that, this is how it works: The 'standard' FriendlyUrl provider does this by performing a simple reshuffling of the various parts of the querystring to achieve the necessary output. It's fast and works pretty well. The trickier part is determining what /home/tabid/38/default.aspx means when someone clicks on a hyperlink with this address. To do this, DotNetNuke uses the 'UrlRewrite' HttpModule. Now, in the standard DotNetNuke build, the UrlRewrite module and the FriendlyUrl provider are in the same assembly for ease of packaging and coding, but they don't necessarily have to be like that. When a Http Request comes into the DotNetNuke code (ie, you click on a hyperlink to load a new page) the DNN base throws the incoming Url to the UrlRewrite module, with an implicit request 'turn this into something I can understand, will you?' You should remember that the DNN base is just a page called default.aspx, and it only knows how to interpret a classic style query string - ie ?tabid=38&key1=value1&key2=value2 etc. This is the sequence of events: All this happens for each and every request made to a DNN portal. So performance and scalability is paramount in any changes made in this area. If you are so inclined, I suggest stepping through the code one day just to see how often it gets called, and how much work goes on in the background to provide friendly Url's. It's when you do this you start to understand some of the issues surrounding a fully-dynamic website and fully-dynamic Url's. Please note : the changes I am discussing here were changes I made to suit my own needs, and those needs are probably not aligned with many people in the DotNetNuke user group. So none of this is a criticism of the base code, or other people's work. It's just a discussion on getting it to work the way I wanted. My requirements were the same as what Scott outlined in his article: better Urls for human and search engine purposes. However, most work I do with DNN modules tends to involve a lot of parameters in the query string, and so his open problem of what to do with multiple querystring parameters was the same as my own. Specifically, I am developing two new modules: a Tagging module for tagging DNN content, and a Directory Module, for storage of Organization Directories. One of the principal aims of these two modules was nice-looking Urls with high keyword ratios in the Url. But I soon struck the same problem - what do you do with multiple parameters. My Answer My solution to this question was to re-arrange the order of the parameters, and use the key name of the first parameter as the page name in the Url. Confused? I bet! Here's what I mean: While that may look a little strange and have you scratching your head, I can assure there's method to my madness. The first site I have implemented this on is a List of Auctioneers in Australia. It has the implementation of my new Tagging module and my new Directory module. Here are two Url's in that site, in 'original' and rewritten form: Note: the rewriting has nothing to do with the module code, it is all in the HttpModule.URLRewrite Assembly. I consider this a 'first draft' because I'm still not totally happy with it. I actually want to implement the Tag miniformat, and to do that I need to have the tag url looking like this : /TagList/Tag/Auctioneer - with no parameters on the end. It's certainly possible given the code I've written, but I'll leave it for a bit further down the track. That all works great for a new site such as the www.auctionlink.com.au site, as, in combination with my DotNetNuke Google Sitemap Provider, the googlebot has hoovered up all the friendly Url's and integrated them into the index nicely. So if you search for AuctionLink.com.au on Google, you should see the friendly Urls rather than the tabid/nn/default.aspx-style Urls. But what about sites that have been in the index for a while? Indeed, this is the problem with the ifinity.com.au site - it is in the Google index with the standard, friendly Urls. I'd like to implement my version of the UrlRewrite module for this site as well, and get some friendlier Url's happening. But Google and others have the old Url's already in the index, and, because the DNN framework will still respond to the older style Url's, I could even get search-engine penalised for duplicate content - which is when the search engines deem you to have two distinct Url's pointing to the same content. Even if that's not a problem, I'd still like to have the best Url's showing in the index. Enter the 301 Redirect After reading Matt Cutt's blog on Google about 301 redirects, it got me thinking- I could issue a 301 redirect for every request that came into the site for an older style Url. That way search engines which take notice of the Http standards should eventually update their indexes to include the new content. If you don't know what a 301 redirect is, it's a Http status code of '301 - Moved Permanently'. It's basically saying, yes I have the content for you, but now it's over here: please update your references. In some agents (certain browsers) it will actually call the new Url instead, where some agents will just ignore it and show the content if it's available. Google have stated (via Matt Cutt's blog) that they do read and obey 301, and you can use it to advice the Googlebot of new locations. The code does this by detecting if a Url came into the site as an old style friendly Url - ie tabid/38/default.aspx. Then it works out what the 'friendly' url should be : home.aspx. If the incoming Url and the Friendly Url are different, then a 301 status is returned and the new friendly Url is given as the new location. There are exceptions to this, and there is a switch in the web.config to disable it once there are few 'old style' url's coming into the site (presumably once the search indexes are updated) There is also a section in the web.config for excluding certain pages from rewrites. For instance, in the auctionLINK site, I have implemented a search function that deliberately uses a non-friendly query string. This page is excluded from 301 redirects because I don't want the query string to be friendly. Here is the result, as shown by Fiddler, the free http monitoring tool distributed by Microsoft: I have provided the source code for the HttpModule.UrlRewrite version in the Free Downloads page of this site. Steps to use it are: This code is really in a BETA state. It is pretty fresh off the code production line and hasn't been totally tested in anger yet. If you do install it, please promise me that you will test it in a non-critical environment first, and that you will check every Url on your site to make sure that it works as you expect. I'm still developing it myself and may post updates if I think they are worthwhile. This code is a branch of Scott McCulloch's work, so full credit to him for getting me started in the right direction. And his work is a branch off the original work for the DotNetNuke base by Charles Nurse, so full credit to him as well. I don't claim credit for much of it at all, but by downloading it you are subjecting yourselef to the license of the DotNetNuke framework, so play nicely and attribute credit where credit is due. How the Url Rewriting in DotNetNuke works
Generating Friendly Urls for Hyperlinks How I set out to Improve it for my own purposes
What about the old code?
Implementing 301 Redirects in UrlRewrite
GET /Home/tabid/37/Default.aspx HTTP/1.1
Accept: */*
Accept-Language: en-au
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; InfoPath.1)
Host: www.auctionlink.com.au
Proxy-Connection: Keep-Alive
Pragma: no-cache
HTTP/1.1 301 Moved Permanently
Date: Thu, 30 Aug 2007 07:26:32 GMT
Server: Microsoft-IIS/6.0
X-AspNet-Version: 2.0.50727
Location: http://www.auctionlink.com.au/Home.aspx
Cache-Control: private
Content-Length: 0
Connection: closeInstalling and Using the Code
The Disclaimer
Copyright Bruce Chapman 2007
Comments
Url rewriter is good.
It is useful as well.
Thanks for this.