URL shorteners

URLs are an essential element of the Web, and are often found as data elements in file formats. Many of them (particularly those found in social-networking feeds) are not direct addresses of the resources being identified, due to the use of URL shorteners which add a level of indirection. Shorteners are services which take a URL input by a user and generate a shorter URL that points to their service, which in turn resolves it to the original URL and redirects the browser there. They are used for a number of purposes including turning lengthy URLs into ones that will fit in limited space (e.g., a Twitter tweet or a line of an e-mail message), as well as for link tracking by marketers. Due to the tracking functions built into some URL shorteners, many social networks and other sites automatically run any URLs posted by their users through a shortening service, even if the URLs are short to begin with (and even if they have already been run through another shortener, so you may end up with double, triple, or more levels of redirection).

These services are highly problematic for archival preservation, since they depend on the continued life of the service on which the redirection is hosted, as well as the specific data record for the individual shortened URL involved. These have a way of vanishing when the provider goes out of business, suffers a server crash, gets hacked, comes under a court order or governmental censorship regime forcing removal of links, or adopts a new, more buzzword-compliant business model for which their new targeted goal of assisting enterprise clients in leveraging branding synergies leaves them no resources to devote to continued maintenance of their legacy data.

A project called urlte.am is archiving the data from such shorteners in text files compressed with XZ and distributed using BitTorrent.