Rewriting the DotNetNuke Url Rewriter Module - Again

Ref: http://www.ifinity.com.au/Blog/Technical_Blog/EntryID/24/

Redoing the DotNetNuke Friendly Url Provider for Human Reading and Search Engines Indexing

NOTE: Due to popular demand, there is now a Support Forum for support requests on this module, or if you just want the thing to work ASAP, I can provide one-on-one support or install it for you.

What I did previously was improve upon Scott McCulloch's work by working out a way to handle parameters. All well and good, and it worked OK. But it just still didn't look quite right. What's more, I wanted to use the rel="tag" microformat on my tagging module, and I was stuck with the .aspx extension on everything, which the microformat doesn't recognise.

So after some back and forwards on the Ventrian forum where I posted my update, I went away and had a good think about it. There were two things nagging at me:

  • 1. The performance would drop off drastically with the number of pages in a site, due to the iterative search for a match.
  • 2. The only way to get truly nice Url's is to ditch the page extension of .aspx. When you think about it, it doesn't serve the user at all. It's really only there to make it easier on the web server. It had to go

Doing number 1 was easy : just do a dictionary based lookup on the page path, if it's found, great. If not, 404 coming right up.

It works like this (if you're not intimately familiar with DNN, you might want to skip to the next part). On the first request, a dictionary of path information is built up. This contains the path (example: mysite/mypage) and the actual DNN request Url (example: default.aspx?tabid=37). The dictionary is stashed away in the Cache.
So far so good. The incoming url is deconstructed into segments. Working backwards from the full url (mysite/mypage/mypath/myvalue) the url is tried against the dictionary for a 'hit'. If a page entry in the dictionary is found, great, the url is rewritten and processing continues.
If the dictionary didn't contain the page, then the next segment is removed, thus mysite/mypage/mypath/myvalue is trimmed to mysite/mypage/mypath. This trimming process is repeated until there are not segments left. If nothing was found after all that, the page dictionary is rebuilt one more time, just in case it's a new page, then the process is repeated. If still nothing, then no url rewriting will be done and a 404 will probably occur.

There is obviously a great deal more complexity in it than that, but that's the basic algorithm.

So onto number 2. Some will look at you with fear in their eyes when you mention removing the .aspx extension from asp.net pages and declare that 'you cannot change the laws of physics'. But in reality there is nothing difficult about removing the .aspx extension from asp.net pages. There's really two (easy) ways.

  • -> Implement an ISAPI Rewriting DLL. There's some commercial ones available, and some open-source ones. I assume these work OK, but looked like way too much work for me.
  • -> Direct all calls to the website through asp.net by mapping a wildcard (*) to the aspnet_isapi.dll

Given I was already rewriting Urls, the second of those options was the path of least resistance for me. Now I realise that people using DotNetNuke on a shared host aren't going to be able to do this, but if you've got your own server, it's not difficult. By mapping all requests to go through asp.net, all requests for items on the website (or virtual directory) end up going through the DNN Url Rewriter. Eek - better make sure they work then.

To fix this, I implemented a couple of regex filters to be placed in the web.config. These restrict what items will be passed along to IIS unfettered and what will be rewritten by the UrlRewrite code. And that was about it - because the way the dictionary lookup works, the .aspx is redundant anyway. Because all entries in the dictionary are without .aspx, it's not needed to find them again - only the relative path.

Once the messy .aspx extensions are eradicated, getting nicer Url's is easy-peasy. Instead of doing by front-to-back flipping of parameters around, now I just put them out in-line, as they should. So, from the example given in the first posting on this topic, the results now are:

  • Original,standard, DotNetNuke friendly Url: /MyPage/TabId/38/Key1/Value1/Key2/Value/default.aspx
  • My rewritten Url (first version): /MyPage/Value1/Key2/Value2/Key1.aspx
  • My rewritten Url (no .aspx version): /MyPage/Key1/Value1/Key2/Value2

Of course, there are still times when the first-parm-last scenario might work well, so I've left it in as a configurable option. This is particularly the case if you can't remove the .aspx extensions (shared hosting, for example)

Here's the original examples from the first post redone:

What about the 301 Redirects?

The 301 redirect code in the original version was an important step forwards for my SEO DotNetNuke efforts. DNN has a habit of outputting a variety of Url's for the same bit of content, so it's important to stay on top of this by both generating a single Url per bit of content, but also to let Google know you've been on the case of hunting down every last default.aspx?tabid=37 and home/tabid/37/default.aspx reference.

So the new code maintains the previous features, and adds a couple more - when a request comes in that doesn't end in a '/', it puts one on. If a request comes in for a page with .aspx, and you've turned .aspx off, then it returns a 301 status and gives the new, cleaner url. As well as this, if you've got a page you've deleted from your website, or it was only visible for a period of time, then requests for the no-longer-valid page will redirect to the home page of your portal with a 301.

Full feature list of this Url Rewriter / Friendly Url Provider

Here's a list of all the things that this version can do:

  • Url's are generated as friendly Urls by all code which uses the standard NavigateUrl() call in DotNetNuke.
  • Choice of page extension - .aspx or any other extension (such as .page)
  • Choice of using page extensions - options are "always", "never", "pageOnly". Always and never are self-explanatory, pageonly means only use an extension when the Url is for a page that contains no query parameters.
  • 301 redirects for 'unfriendly' page requests. This can be on or off.
  • 301 redirects to home page for deleted and expired pages
  • A regex filter can be implemented to restrict 301 redirects for matching Urls
  • Individual pages can be restricted from 301 redirects by placing in a delimited list
  • Choice of two ways of handling parameters/query strings - ordered, in which the values are shown as consecutive levels in a path (key1/value1/key2/value2) and "firstparmlast" in which the first key value is placed last (value1/key2/value2/key2).
  • Choice of detecting duplicate portal alias/page path combinations. It is possible to have portal1/test as a page, and a portal which has portal1/test as an alias, thus duplicating the same path on a single DNN install.

How to install the DNN Friendly Url Provider

Step 1: Download the code from the Free Downloads page.

Step 2: Backup your existing DotNetNuke.HttpModules.UrlRewrite.dll from your website/bin directory, and copy the new version in to the /bin directory.

Step 3: Backup your existing web.config, and then replace the existing 'DNNFriendlyUrl' provider section with this one:

<add name="DNNFriendlyUrl" type="DotNetNuke.Services.Url.FriendlyUrl.DNNFriendlyUrlProvider, DotNetNuke.HttpModules.UrlRewrite" includePageName="true" regexMatch="[^\+a-zA-Z0-9 _-]" urlFormat="HumanFriendly" redirectUnfriendly="true" doNotRewriteRegex="(\.axd)|(/DesktopModules/)" doNotRedirect="DirectorySearchResults;" doNotRedirectRegex="[.]*(/logoff.aspx)" pageExtensionUsage="never" parameterHandling="ordered" ignoreFileTypesRegex="(\.gif)|(\.css)|(\.js)|(\.jpg)|(\.html)|(\.htm)" checkForDupUrls="true"/>

Step 4: Change any options to suit how you'd like your DNN installation to work. The full list is below.

Step 5: Try it out!

If you haven't already done so, you might want to take a look at the DotNetNuke Google Sitemap Generator available in the free downloads section

Changing Config Entries for Different Options

pageExtensionUsage="never"*

pageExtensionUsage="always"

/Enquiries/

/Enquiries.aspx

pageExtensionUsage="never" *

parameterHandling="ordered"

pageExtensionUsage="always"

parameterHandling="ordered"

/TagList/Tag/Valuers/

/TagList/Tag/Valuers.aspx

pageExtensionUsage="never" *

parameterHandling="firstparmlast"

pageExtensionUsage="always"

parameterHandling="firstparmlast"

/TagList/Valuers/Tag/

/TagList/Valuers/Tag.aspx

pageExtensionUsage="pageonly" *

pageExtensionUsage="always"

pageExtension=".page" **

/Enquiries.aspx

/Enquiries.page

pageExtensionUsage="pageonly"

parameterHandling="ordered"

pageExtensionUsage="always"

pageExtension=".page" **

parameterHandling="ordered"

/TagList/Tag/Valuers/

/TagList/Tag/Valuers.page

pageExtensionUsage="pageonly" *

parameterHandling="firstparmlast"

pageExtensionUsage="always"

pageExtension=".page" **

parameterHandling="firstparmlast"

/TagList/Valuers/Tag/

/TagList/Valuers/Tag.page



Complete list of web.config options

  • urlFormat (HumanFriendly,omitted) - 'humanFriendly' when using tabid less paths. If omitted, will use standard DNN-style friendly Urls
  • doNotRewriteRegex - regex string for excluding incoming requests. If the incoming request matches the regex, no rewriting will be attempted. This is used to handle exceptions, such as direct calls to pages/controls in the DesktopModules path.
  • checkForDupURls (true/false) - true means an exception will be logged if duplicate urls are found while building the tab dictionary. Duplicate url's are urls where the combination of portalAlias/tabpath from two different portals match. Preference is always given to the 'first' portal / tab path combo, so the second and suqsequent combos will never resolve properly.
  • redirectUnfriendly (true/false) - whether to issue 301 redirect status codes if an incoming url does not match the friendly url for that page
  • doNotRedirect - semicolon delimited list of tab paths where a 301 redirect should never be issued. Use when a particular page should use ?key=val type parameters, or unwanted results are occuring. This is only used when redirectUnfriendly = true
  • doNotRedirectRegex - a regex expression which, when evaluated against the incoming url, will not redirect if the result is a match. In the example, the /logoff.aspx page will never 301 redirect. This is only used when redirectUnfriendly = true
  • parameterHandling (ordered, firstlast) - ordered means key/value pairs are kept in order, and placed after the tab path. ie mypage/mykey1/myvalue1/mykey2/myvalue2. firstlast means take the key of the first parameter and place it last, at the end of the string. mypage/myvalue1/mykey2/myvalue2/mykey1. Default : firstlast
  • pageExtensionUsage (never, pageonly or always) - three options on whether to use page extensions (ie .aspx) for the pages. never = don't use page extensions, pageonly = only use page extensions when the page path does not include parameter key/value pairs (ie mysite/mypage.aspx but not mysite/mypage/mykey/myvalue), always = .aspx is appended to all urls. Default : alwaysUse. If using pageonly or never, the iis setup must have wildcard mapping to the aspnet_isapi.dll (instructions below)
  • ignoreFileTypeRegex (Regex string) - when mapping wildcard requests to aspnet_isapi.dll, the asp.net url rewriting function will be called for all types of iis resources on the page. This means unnecessary file handling for jpeg, gif, css,axd and other file types. By entering a regex string, any match found will bypass the url rewriting code and be passed back to iis.
  • pageExtension - allows the use of a page extension other than .aspx if required. (ie .page, .content - whatever)

Changing IIS Settings for either a custom page extension, or no page extension

Warning: This may destabilise your website. Always check this on a test version first, and understand what you are doing!

These instructions will vary between IIS 5.0, IIS 6.0 and the Workstation and Server versions of 2000/XP/2003/Vista

Changing IIS Settings to map all requests through the aspnet isapi dll

  • open property page for website / virtual directory.
  • click the 'configuration' button, select the 'mappings' tab
  • click on the 'add' button
  • Enter the details:
    • Executable = \framework\\aspnet_isapi.dll
    • For no page extension -> Extension = .*
    • For custom page extension -> Extension = .{your value}
    • Verbs = GET,POST,HEAD,DEBUG
    • Script Engine :checked
    • Check that file exists : unchecked
  • Click OK, OK, OK to close and apply changes.

Then thoroughly test all website functions. And you're done!

Update (22 Nov 2007)

Due to the problems with the 4.7 release and namespace collision with the config part of the module, I've released a new version which uses it's own namespace and should solve the problems. However, this requires new web.config entries. The changes are available in the free downloads section.

This new version requires different modifications to the web.config from those shown in this blog post. The options are the same, but the namespace has changed. Please see the example.web.config file included in the downloads to see what changes are required for the web.config

There are also three new features in this new version, which are set using the following configuration attributes:

  • redirectWrongCase - allows 301 redirection of mixed case requests to lower case requests
  • forceLowerCase - when true, forces all generated Urls to be in all lower case
  • redirectToSubDomain - when implemented, all requests will be 301 redirected to the specified subdomain, regardless of what subdomain they were requested on. Can be left as an empty string to remove the subdomain completely.

Again, see the example.web.config on the different options for these features

Any problems please report them using the comments section below (I might have to stretch to forums one day!)

Update (10 Jan 2008)

In response to the many requests to include a way of substituting spaces with a '-' or '_', I have included a 'replaceSpaceWith' option. This will replace any spaces in a tab path with the supplied character. It will also issue a 301 redirect for any request to the 'space removed' version. This means that existing pages found in search engines will redirect to the new version.

I've also fixed (finally) the problems with getting an exception on the Host Settings page. Basically, in versions of DotNetNuke prior to 4.5, the Friendly Url Provider shared a namespace with internal classes. I've explictly implemented a cast so that my provider will be able to interact with the host settings page. However, for DNN versions 4.6 and above, the FriendlyUrl provider was moved into the single DotNetNuke.HttpModules assembly. This solved the problem, so the later version of my Friendly Url provider just doesn't include the section for reading the FriendlyUrl rules.

The new files can be found on the Free Downloads page. You must install the correct version for your DNN install. If you are running 4.0 - 4.5, then you want the 4.5 version. If you are running 4.6 or later, then you want the 4.6 version. Trying to run the later version with the earlier DNN installs will just cause errors - there is no benefit in running the 4.6 code - it's just a compatibility change, both version are built from the same base source.

Copyright Bruce Chapman 2007


Comments

Nice article! Thank you so much!

Popular posts from this blog

DevOps - Key Concepts - Roles & Responsibilities - Tools & Technologies

Rewriting the DotNetNuke Url Rewriter Module

Trouble In the House of Google