Domain name

From Just Solve the File Format Problem
Revision as of 13:58, 23 February 2014 by Dan Tobias (Talk | contribs)

Jump to: navigation, search
File Format
Name Domain name
Ontology
Released 1985

A domain name is a way that hosts, servers, websites, and other things on the Internet are generally identified. They translate to IP addresses, the base addressing system of the Internet, via a protocol called DNS, but are designed to be more human-meaningful and stable names than the all-numeric IP addresses.

Domain names replaced the earlier ARPAnet hostname system, where each network host had a separate centrally-assigned address. Instead, DNS uses a distributed system where domain names can be registered under one of a number of top level domains which can operate independent registries, and in turn each domain can assign an unlimited number of subdomains within it. Names have the dot (.) as a separator between levels and are read from the rightmost part leftward, so that host.subdomain.example.net has net as its top-level domain, example as its second-level domain (registered under .net), subdomain as a third-level subdomain (assigned by whoever controls example.net with no further registry/registrar action necessary), and host as the hostname beneath that subdomain.

Originally, generic domain names were grouped by category of entity (commercial, organization, government, etc.) into a small group of three-letter top level domains, while two-letter top-level domains were reserved for country codes. One anomalous four-letter domain, .arpa, was originally used as a temporary place for ARPAnet hosts not yet assigned a permanent domain, but survived as the location of in-addr.arpa, an address used for reverse domain lookups from IP addresses (and the TLD .arpa was later "retronymmed" to mean "Address and Routing Parameter Area" in RFC 3172).

However, after the year 2000, a number of new top-level domains were added, some of which had more letters in them (e.g., .museum), and as of 2013 a huge expansion is in progress which is expected to see new domain endings by the hundreds.

Nevertheless, simple-minded people tend to expect that every address ends in .com, even though this technically was only intended to apply to commercial entities.

Contents

Registries and registrars

At one point, an entity known as InterNIC accomplished all domain registration, under the top-level domain policy of an organization called IANA, which was in turn pretty much singlehandedly run by one guy, Jon Postel. Those were the days. Now it takes a massive and incomprehensible cluster of organizations, companies, government agencies, bureaucrats, lawyers, etc., to do the same things.

The basic structure now is that an organization ICANN (of which IANA is a subsidiary) runs the system and approves contracts with registries, which run the different top level domains. (Verisign runs .com and .net, while various other companies and organizations run other top level domains.) But registries don't directly register domains; that function is done by registrars, who are separately accredited with ICANN and with the registries of the particular domains they support. End-users (registrants) deal with registrars who in turn deal with registries who in turn deal with ICANN. Lots of middlemen are in the process now.

Alternate roots

In addition to the "official" DNS root of the Internet, there have been alternate roots created to support alternate addresses that are not part of the "standard" domain name structure. Anybody can set up servers using the same protocols as the "normal" roots to serve domains, and create different domains there which anybody who uses that server instead of the usual ones can access. Various alternate roots have sometimes achieved a degree of popularity in limited circles, though they have never been a threat to the IANA root since the "mainstream" ISPs don't use them, so only a few geeks who choose to use different roots have access to anything on the alternative domains.

Pseudo-domains

In addition to normal domains and alternative-root ones, there are some domains not in any root which have, nevertheless, been used in addresses in some particular network or protocol that doesn't operate on DNS, but still is set up to use addresses resembling domains. For instance, the .onion pseudo-domain is used on the Tor network for encrypted websites that can be anonymously accessed if you have the proper plug-in to handle the protocol. In the past, other pseudo-domains were used for addresses on non-Internet networks, such as .uucp and .bitnet.

Internationalization

Domain names are limited to a subset of ASCII characters consisting of letters, numbers, and the dash (-), as well as the dot (.) used as a separator between levels. To support non-English languages, the Punycode encoding has been adopted to permit registration of names containing other Unicode characters outside the ASCII range, which are then converted to ASCII for transmission. Some registries and registrars now support such names in certain top-level domains, with some character restrictions to attempt to reduce fraud caused by people registering domains with characters that resemble other characters and thus make the name mimic an existing site for possible use in phishing schemes (where a site address is given that's supposedly of your bank or another trusted entity but is really a third-party site). Newer browsers support entering such internationalized domain addresses with raw characters, perhaps as part of IRIs (the internationalized form of URIs or URLs) which also include non-ASCII characters in other parts such as the pathnames.

Use of domain names in file formats

Domain names appear in many roles in file formats. Often the value of some data element will be a URL (or URI or URN or IRI, to name some other related things which have technical distinctions not always adhered to in usage), and most forms of URL have a domain name in them (e.g., the common HTTP variety starts with http:// followed by a hostname usually expressed in domain form). E-mail addresses also contain domain names following the @ sign.

A common system for generating unique identifiers (e.g., names of custom queues in iOS apps programmed under Cocoa in Objective-C; but similar schemes turn up in many places where identifiers are needed) is to begin the name with a reversed domain name based on the company or organization using the identifier, so that, for instance, if your company owns the domain example.com, your identifiers would start with com.example. This ensures that they don't clash with identifiers used by any other company; you merely need to adopt whatever internal policies are needed to prevent name clashes within your company. The reversed order is used to put highest-level parts first, since the low-level identifier you are creating comes at the end of the string. (Interestingly, the British-based academic network JANET used such a reversed naming system back in the 1980s, with hostnames starting with uk., but they had to flip the order the other way when they joined the Internet and had to follow its naming conventions.)

Official documents

  • RFC 289 (hostnames before DNS)
  • RFC 819 (early description of DNS before domains even began to be assigned)
  • RFC 882 (formal introduction of domain names)
  • RFC 1034 (later description)
  • RFC 1591 (official descriptions of the original group of top level domains)
  • RFC 2606 (dummy domains reserved for test/example use)
  • RFC 3071 (reflections on domain name system)
  • RFC 6761 (updated descriptions of reserved dummy domains)
  • UDRP (policy for domain disputes)

Official sites

Other links

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox