Clean URL’s With .htaccess redirect – How to Apply Regex Rules

.htacess redirect is perhaps the best & the fastest way to clean up a messy URL www.yoursite.com/products/php?key1=value1&key2=vaklue2.
It essentially involves a redirect from a messy pattern to a more legible structure like www.yoursite.com/products/value1/value2.

However there is no one size fit all .htaccess method to do that. Infact .htacces uses the regex rules to redirect based on given set of URL parameters.

For example if you want to clean a URL with ‘?page=somevalue’ parameter – the ․htaccess file would look like this.

RewriteEngine On
RewriteRule ^([a-zA-Z0-9]+)$ index.php?page=$1
RewriteRule ^([a-zA-Z0-9]+)/$ index.php?page=$1

As we cannot have one common solution you need to understand the regex rules governing this and then you can easily define your own set of redirects.

RewriteRule Regex explained

Taking the sample code from above:

Lets first see the first line:

1) The caret (^) symbol) here means “everything before this”. This the term will derive its meaning from the URL where this particular ․htaccess file is kept.

So if

a) .htaccess file is kept at www.yoursite.com/abc/.htaccess – the ^ would refer to “www.yoursite.com/abc/”
b) if .htaccess is placed in the root index – ^ will just mean only the root URL – for example “www.yoursite.com”

2) The expression within ([]) is the actual pattern to be matched.
so for example: ([a-zA-Z0-9]+) means – “any lowercase alphabet + any uppercase alphabet + any number from 0 to 9.

3) The plus sign (+) after the square brackets [] tells it that any amount of characters is fine. If you miss out the + sign the regex will allow only one character of each type in one clean URL.

4) The dollar sign ($) indicates the end of the clean URL, and everything that follows the $ sign is what apache is required to fix. So in the above example we are asking the Apache to clean the URL structure index.php?page=$1 (which occurs after the dollar sign)

5) $1 is the number of a specific key-value variables used in the URL structure to be rewritten. In our example, since we had only one variable set ( ([a-zA-Z0-9]+) ), we only needed to specify $1. If our URL had more than one set of variables for example section/([a-zA-Z0-9]+)/page/[0-9] then we would use $1, $2 and likewise for each new set of variables.

Now take a look at the second line:

The second line is an exact replica of the first line except that it contains a trailing slash. This is to rewrite those URLs that are specified by the users with a trailing slash.

Hope you find this useful ? Any comments- please leave them below and i will try to answer.