Building a URL Shortener

With all the talk of URL shortening services, I decided to add a quick service into Snook.ca, which is run on CakePHP, to redirect a short URL to a post. Because my static content already has short URLs and all I have are posts, creating a short URL handler for it was very easy.

To give you some context, I route my posts through a specific structure:

/archives/:category
/archives/:category/:articlename

In this case, I have a couple routes that route everything to my Posts controller and the bycat or view actions. These action take the named parameters and pulls out the appropriate content. Easy peasy.

The key thing here is that my articles have two identifiers: one is the slug, the other is the post ID. The process for the short service just takes a post ID and redirects it to its fully expanded URL.

Here's the Shortening Route:

Router::connect('/s/:id', 
   array('controller'=>'posts', 'action'=>'shorturl'), 
   array('id'=>'\\d+'));

As you can see, it looks for a specific pattern (it must be all numbers) and then passes that into my posts controller to my shorturl action.

function shorturl () {
   $id = $this->params['id'];
   $this->Post->unbindModel(array('hasMany'=>array('Comment')));
   $post = $this->Post->findById($id);
   $url = '/archives/' 
          . $post['Tag'][0]['safetag'] . '/' 
          . $post['Post']['slug'];
   $this->redirect($url);
}

I grab the named parameter. I unbind my Comment model to prevent the findById call in the next line from returning too much. Then I find my post which will return the associated tags (which are my "categories"). I build the URL and then redirect the user onwards.

I haven't exposed the short URL in any way, yet. For now, it's more to allow myself quick posts to Twitter without having to use another service and to see if people are retweeting the link.

And now it's really easy to find the first blog post (based on ID): http://snook.ca/s/1

Building your own

With a single model and a single action taking a single parameter, wiring up a URL shortener was very simple. How could you do it with a more complicated system?

Multiple Routes

Another easy way to extend this concept is to simply map each prefix to each model's view action that needs to be shortened. You could have a Posts model on /p/ and a Comments model on /c/. the :id for each of them would simply point to the view page for each one. That offers up a little more flexibilty but not much.

Automatically Creating and Caching Short Links

In thinking this through, especially for an established site, you could have it automatically create a short link for any URL on your site once it has been visited once. First, create a new model called Short (or whatever you feel like it should be called).The short model will consist of two fields: the primary key (id) and character field to store the URL.

CREATE TABLE shorts (
  id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  url varchar(100) NOT NULL UNIQUE
)

Within your AppController, grab the current URL (available via $this->url). With the URL being a unique key as well as the ID, you'll only have a single ID for each URL.

If you want to find the short URL for an existing URL, just look it up in the database.

$this->Short->findByUrl($this->url);

If a URL is not found, you'll need create a new record for it. It'd be advantageous for you to create a method on your model that'll do the find/not found/create process.

You can use your ID as your short form (as I did) which, given most sites, will be quite small. If you have 4000 unique URLs, you're using 4 characters. What if you wanted to optimize that even further?

You could convert that integer into a hexidecimal value. Anything under 4096 items will only take 3 characters. That's not bad. Anything over 4096 and you're back to 4 characters.

Creating a super-compressed URL

But what if you wanted to optimize that even further? The trick is to create your own base system with a custom set of characters. This next bit of code isn't CakePHP-specific.

$codeset = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
$base = strlen($codeset);
$n = 300;
$converted = "";

while ($n > 0) {
  $converted = substr($codeset, ($n % $base), 1) . $converted;
  $n = floor($n/$base);
}

echo $converted; // 4Q

It loops over the original number, converting it into the base that you want. In my particular example, it converts the 300 into 4Q. But you'll get up to 3844 before you need more than 2 characters. And up to over 238,000 before you get past 3 characters. Precious bytes.

If you were setting up a route for this, you can use the following regex pattern:

Router::connect('/s/:id', 
   array('controller'=>'shorts', 'action'=>'retrieve'), 
   array('id'=>'[0-9a-zA-Z]+'));

Of course, feel free to customize the acceptable character list — some people drop 0, O, 1 and l to avoid confusion.

Converting the Compressed Version back to Decimal

Going back is straightforward. In the retrieve method of the shorts controller that we set the route up for, we need to take our compressed ID and uncompress it into an integer we can search the database for.

$converted = $this->params['id'];
$c = 0;
for ($i = strlen($converted); $i; $i--) {
$c += strpos($codeset, substr($converted, (-1 * ( $i - strlen($converted) )),1)) * pow($base,$i-1); }
$this->Short->id = $c; // 300 from our earlier example
$this->redirect( $this->Short->field('url') );

Each point in the string is multiplied by the base to the power of that position. Then it grabs the URL field for that item. Finally, it redirects them off to their final destination.

Wrapping it up

This blog post twisted and turned but ended up in a great place. The principles of the shortening system could be applied to any system whether it's CakePHP or not. If you're a CakePHP fan, feel free to take this example and build it into a component or plugin.