Using Apache's mod_rewrite to redirect URLs

Example: Minimal_PDF

Consider the following Wikipedia-like method to access a page:

  1. Replace spaces with underscores in the <h1>
  2. Append the resulting name (e.g., Minimal_PDF) to the relative root (e.g., http://brendanzagaeski.appspot.com/Minimal_PDF)

Some URL redirection options

Apache's mod_rewrite
Flexible, effective. Can send URL redirects. To make search engines happy, be careful when using 302 redirects or rel="canonical" tags (more on rel="canonical") rather than 301 redirects. If the server permits it, every user can customize redirections using a personal .htaccess file.
File system links between the permanent filenames (e.g., 0004.html) and the desired filenames (e.g., Minimal_PDF.html)
For users without root permissions, ln -s might be the easiest way to do this, despite its inelegance.
Use a full copy of the file for each URL
Wasteful. Also, working with long, underscore-format filenames is unwieldy.

Example .htaccess file using mod_rewrite

RewriteEngine On
Options FollowSymlinks

RewriteCond  %{THE_REQUEST}      ^[A-Z]{3,9}\ (/index\.html|/~user/index\.html|/~user/)\ HTTP/
RewriteRule  ^index\.html$       http://user.example.org/           [R=301,L]

RewriteRule  ^About$             http://user.example.org/0001.html  [R=301,NC,L]
RewriteRule  ^hexdump_examples$  http://user.example.org/0006.html  [R=301,NC,L]

RewriteCond  %{REQUEST_URI}      ^/~user/
RewriteRule  (.*)                http://user.example.org/$1         [R=301,NC,L]

Step-by-step explanation

The general idea is to redirect all the different URLs for each document to just one canonical URL. This lets search engines know that the search scores for the various URLs should be added together to make the score for the canonical URL.

  1. RewriteEngine On
    Options FollowSymlinks

    Set up mod_rewrite for the directory containing the .htaccess. In this example RewriteBase is not needed because the target URLs are absolute paths (rather than relative paths).

  2. RewriteCond  %{THE_REQUEST}      ^[A-Z]{3,9}\ (/index\.html|/~user/index\.html|/~user/)\ HTTP/
    RewriteRule  ^index\.html$       http://user.example.org/           [R=301,L]

    Redirecting index.html to / is a special case because mod_dir is often configured to internally redirect / to /index.html. The RewriteCond rule prevents an infinite loop. This particular RewriteCond also takes care of the duplication caused by availability of the page both within a user directory (example.org/~user/index.html and example.org/~user/) and at the root of a subdomain (user.example.org/index.html). A 301 permanent redirect is used to tell search engines that there are no future plans to make /index.html a distinct page from the root directory (see also 301 vs 302 redirects). The L (last) flag is used because no further rewrites need to be applied if this rule matches.

  3. RewriteRule  ^About$             http://user.example.org/0001.html  [R=301,NC,L]
    As an example of the general case, the About page can be accessed using any of 4 different URLs:

    Thanks to the way Apache handles RewriteRule, both /About requests can be handled by one rule. As with /index.html, a 301 redirect is appropriate as long as there are no future plans to use any of the redirected URLs for independent pages. The NC (no case) flag is used, meaning that about, ABouT, and so on will all redirect to 0001.html.

  4. RewriteCond  %{REQUEST_URI}      ^/~user/
    RewriteRule  (.*)                http://user.example.org/$1         [R=301,NC,L]

    A conditional rewrite rule takes care of the third and final redirection case for the About page (example.org/~user/0001.html). Any request that starts with the /~user/ directory is redirected to the appropriate page on the user.example.org subdomain. The REQUEST_URI includes the user directory (so it looks like /~user/0001.html), but the RewriteRule only considers the page (so the (.*) regex group contains something like 0001.html).

Note that none of the redirection targets in this example .htaccess are themselves subject to redirection. This limits the number of redirections to 1, which seems like a good idea. One way to check the number of redirections is using cURL. For example:

curl -IL example.org/~user/0001.html
HTTP/1.1 301 Moved Permanently
Date: Sun, 11 Sep 2011 18:29:11 GMT
Server: Apache
Location: http://user.example.org/0001.html
Content-Type: text/html; charset=iso-8859-1

HTTP/1.1 200 OK
Date: Sun, 11 Sep 2011 18:29:11 GMT
Server: Apache
Content-Type: text/html

Links

mod_rewrite documentation

Found a mistake?

Submit a comment or correction

Updates

2013 Aug 10 Change wording because this site no longer uses .htaccess
2013 Jan 08 Comments link
2011 Sep 11 markup corrections, cURL example
2011 Apr 28 little rewordings
2010 Dec 15 markup changes
2010 Dec 4 some markup changes, a little more explanation, folder changed to directory
2010 Nov 22 Rewritten to incorporate some Search Engine Optimization (SEO) ideas, example .htaccess added
2010 Nov 13 Example URL changed to reflect accessibility via subdomain
2010 Sep 18