Within the last month or so, some of the popular URL shortening services, like TinyURL and Bit.ly, have added an extra number to their hashes in URLs produced. (As an aside, URL shortening services work by taking a long URL, like http://blog.pilsch.com/past/2009/5/20/shout_out_for_todd_mays_gille_deleuze_an_introduction_/ and returns a URL like http://tinyurl.com/q89t3d that can be used on services like Twitter to save the precious characters). Anyway, with the proliferation of Twitter, these URL services have become very popular and have started to increase in length (I'll explain why later). So, figuring it couldn't possible be that hard, I wrote my own.
What Do URL Shorteners Do?
Essentially, a service like TinyURL functions like Searle's Chinese Room: you pass in input (either a hash or a long URL) and get back the correct (and meaningful) corollary of your input (hash for URL input; URL for hash input (technically you get routed to the URL, but I digress)).
The real challenge of creating your own URL shortening service is in creating a hashing algorithm. As this blog post highlights, most major hashing algorithms produce too long of an output for use in URL shortening. Moreover, the fact that TinyURL outputs hashes that get longer as people use the service, suggests that a hashing algorithm isn't being used (as hash algorithms like SHA-1 or MD5 produce fixed length output).
This is in fact the case. What all these services are doing is returning a representation of an integer in base 62 (read more about base conversion here). Why base 62, though? Well, because that's the base you would need to cover all alphanumeric characters (0-9, a-z, & A-Z). While I've seen some examples of using hexadecimal as the hash, this limited character set doesn't allow for an industrial strength URL shortener (only providing a character range of 0-9 & a-f).
It turns out that producing a base 62 converter in Ruby was the most difficult part of the whole operation. Because I'm lazy, I created a static class that provides conversion methods to and from base 62, rather than creating a new Base62 object. Here's the code, though:
class Base62
@@ranges = [
('0'..'9'),
('a'..'z'),
('A'..'Z')
]
@@base = nil
@@offsets = nil
def self.to_s(number)
if @@base.nil?
@@base = self.calculate_base
end
string = ""
while number > (@@base - 1)
place = number % @@base
string = self.lookup(place) + string
number = number / @@base
end
self.lookup(number) + string
end
def self.to_i(string)
if @@base.nil?
@@base = self.calculate_base
end
number = 0
i = string.length - 1
string.each_byte do |c|
c = c.chr
@@ranges.each_index do |j|
range = @@ranges[j]
if range.member? c
number += (c[0] - range.to_a.first[0] + @@offsets[j]) * (@@base ** i)
break
end
end
i -= 1
end
number
end
def self.lookup(place)
string = ""
if @@base.nil?
@@base = self.calculate_base
end
(0..(@@ranges.length-1)).each do |i|
range_array = @@ranges[i].to_a
start = 0 + @@offsets[i]
stop = range_array.length - 1 + @@offsets[i]
if (start..stop).member? place
string = range_array[place - @@offsets[i]]
break
end
end
string
end
def self.next_integer(integer)
integer + 1
end
def self.calculate_base
i = 0
@@offsets = []
@@ranges.each do |range|
@@offsets << i
i += range.to_a.length
end
i
end
end
As a side note, you can produce different character classes by changing the: @@ranges variable to include fewer or more characters. I've got a version that uses all the characters that are possible in a legal (and non-Unicode) URL, but I think I've had some problems with it:
@@ranges = [
('0'..'9'),
('a'..'z'),
('A'..'Z'),
('$'..'$'),
('-'..'.'),
('_'..'_'),
('('..')'),
('*'..'+'),
('!'..'!')
]
Similarly, I think the code above could be expanded to use Unicode as a base, providing a really, really short hash, like how Tinyarrows does it. In this fashion, the base for integer conversion would be in the hundreds of thousands, instead of just 62, providing for a much broader possible set of URLs. Unfortunately, as most Twitter clients (including TweetDeck, which I use) don't support Unicode characters (grr), I abandoned this part of the project (also, the Unicode specification makes the above approach difficult (thanks to the fact that there are a lot of "dead zones" in which non-printing characters get produced)).
Introducing Shorten
So, after getting a working base 62 converter, it was really simple to build Shorten, my URL shortening software. I wrote it using Sinatra as a framework and Sequel for the (really minimal) database code.
Essentially, the software uses the "id" column of a SQL database to compute the hash for the URL and then is a simple REST database client from there (only supporting GET and POST, though).
If you'd like to check it out, I'm running a copy at u.pilsch.com. You can get the source code here.
Installing Shorten
To install Shorten, you need a couple of things. First, you need to have a working copy of Ruby. Additionally, you need a couple of ruby gems installed. From a command prompt, type:
gem install sinatra sequel
If you get anything about "permission denied" or some such, try typing "sudo" before the command. This command will install the Sinatra web development framework and Sequel, the database manager.
Once you have that installed, you can unzip the shorten.zip file you downloaded.
Edit the file "main.rb." You'll need to change some values:
require 'ostruct'
Shorten = OpenStruct.new(
:base_url => "http://localhost:4567/",
:service_name => "Lovely URL Shortener",
:button_text => "Short ♥!"
)
Set :base_url, :service_name, and :button_text to the appropriate values (:base_url will be "http://localhost:4567" if you are just running on a local computer).
After you've edited "main.rb," in a command prompt run:
ruby main.rb
to begin running a local copy of Shorten. That's all you need to get it deployed, it will create the database for you (as a SQLite file).
Deploying Shorten
Shorten is written using Sinatra, as I've mentioned, which is a web framework based on Rack (as is Ruby on rails). As such, deploying it to a web host is fairly easy. You just have to write a config.ru file (see Rack documentation) and fight with your server's messed up Ruby implementation (see below).
Deploying Shorten on Dreamhost
Rather than send you off on the same wild goose chase I had to go through, in the process of deploying shorten on a Dreamhost server, I'll just show you how I did it.
First, you have to read this article about installing your own Ruby Gems. This is necessary because some of the gems installed in the Dreamhost repositories are hopelessly out of date. You will need to install, in your own gem repository: rack, sequel, sqlite3, and sinatra.
Next, read this article, but, be aware, the config.ru file he provides doesn't work (due to the default gems being so out of date on Dreamhost). Read on and I'll show you how to write a working file, though.
If you followed the instructions for creating a local copy of a gem repository, above, your config.ru file for deploying Shorten on your server should look like this:
LOAD_PATH.unshift ('/home/username/.gems/gems/rack-0.9.1/lib')
require ('/home/username/.gems/gems/rack-0.9.1/lib/rack.rb')
require ('/home/username/.gems/gems/sinatra-0.9.1.1/lib/sinatra.rb')
$LOAD_PATH.unshift '/home/username/.gems/gems/sqlite3-ruby-1.2.4/lib'
require ('/home/username/.gems/gems/sqlite3-ruby-1.2.4/lib/sqlite3.rb')
$LOAD_PATH.unshift '/home/username/.gems/gems/sequel-2.12.0/lib'
require ('/home/username/.gems/gems/sequel-2.12.0/lib/sequel.rb')
Sinatra::Application.default_options.merge!(
:views => File.join(File.dirname(__FILE__), 'views'),
:run => false,
:env => ENV['RACK_ENV'],
:raise_errors => true
)
log = File.new("sinatra.log", "a")
STDOUT.reopen(log)
STDERR.reopen(log)
require 'main'
run Sinatra::Application
Where, of course, you would replace username with your Dreamhost username. Also, if it isn't working, it might be because you have different versions of the gems installed. Look in your ~/.gems/gems folder and see if you have the gems being loaded (rack, sinatra, sqlite3, and sequel) but in different versions. Change the directory names in your config.ru file and type the following in a terminal (make sure you are in the directory where you installed Shorten):
touch tmp/restart.txt
This file explicitly loads the gems, which sucks, but all the other tips and tricks people have suggested just don't work on my install. YMMV. Good luck.
Enjoy!