HTML

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
(Other resources: Added some resources covering historical perspectives on HTML development.)
m (Added sample files)
 
(55 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
{{FormatInfo
 
{{FormatInfo
 
|formattype=electronic
 
|formattype=electronic
|subcat=Markup
+
|subcat=Hypermedia
|extensions={{ext|html}}, {{ext|htm}}
+
|extensions={{ext|html}}, {{ext|htm}}, {{ext|xhtml}}, {{ext|xht}}
|mimetypes={{mimetype|text/html}}
+
|mimetypes={{mimetype|text/html}}, {{mimetype|application/xhtml+xml}}
 +
|pronom={{PRONOM|fmt/96}}, {{PRONOM|fmt/97}}, {{PRONOM|fmt/98}}, {{PRONOM|fmt/99}}, {{PRONOM|fmt/100}}, {{PRONOM|fmt/471}}, {{PRONOM|fmt/102}}, {{PRONOM|fmt/103}}
 +
|wikidata={{wikidata|Q8811}}, {{wikidata|Q62626012}}, {{wikidata|Q2892563}}, {{wikidata|Q41676552}}, {{wikidata|Q41676372}}, {{wikidata|Q3782232}}
 
|released=1990
 
|released=1990
 
}}
 
}}
 
+
'''HTML''' ('''h'''yper'''t'''ext '''m'''arkup '''l'''anguage) is one of the three pillars of the [[Web]] as originally developed by Tim Berners-Lee, along with [[HTTP]] and [[URL]]s. It is the markup language normally used for Web documents (although many other formats can also be used for material on the Web). It originally was an [[SGML]] based markup language. XHTML is HTML redeveloped using the stricter [[XML]] rules. Disagreement over the direction of W3C developments from some of the browser vendors led to the formation of the Web Hypertext Application Technology Working Group (WHATWG). They maintain the spec for the HTML5 or HTML Next or HTML Living Standard, which is not based on SGML any more. The W3C standardisation group will work to formalise the WHATWG specification as a series of standardised [http://wiki.whatwg.org/wiki/HTML_snapshots 'snapshots'] of the living standard. One version of this standard has been "frozen" into a W3C HTML5 recommendation as of October 28, 2014, while the ongoing "living standard" is getting regular updates which in due time get frozen into subsequent W3C recommendations such as 5.1, 5.2, etc.
'''HTML''' ('''h'''yper'''t'''ext '''m'''arkup '''l'''anguage) originally was a [[SGML]] based markup language. XHTML is HTML redeveloped using the stricter [[XML]] rules. Disagreement over the direction of W3C developments from some of the browser vendors lead to the formation of the Web Hypertext Application Technology Working Group (WHATWG). They maintain the spec for the HTML5 or HTML Next or HTML Living Standard, which is not based on SGML any more. The W3C standardisation group will work to formalise the WHATWG specification as a series of standardised [http://wiki.whatwg.org/wiki/HTML_snapshots 'snapshots'] of the living standard.
+
 
+
== Identifiers ==
+
* File extension: '''.HTML''', '''.HTM'''
+
* MIME type (Internet media type): '''text/html'''
+
 
+
== Hints and tips ==
+
* [http://stackoverflow.com/questions/1403087/how-can-i-convert-an-html-table-to-csv Discussion on converting HTML tables to CSV]
+
  
 
== Specs ==
 
== Specs ==
 
* W3C specifications:
 
* W3C specifications:
** [http://www.w3.org/MarkUp/HTMLPlus/htmlplus_1.html HTML (1) specification]
+
** [https://www.w3.org/MarkUp/HTMLPlus/htmlplus_1.html HTML (1) specification]
** [http://www.w3.org/MarkUp/html-spec/ HTML 2.0 specification] (see also the [http://tools.ietf.org/html/rfc1866 RFC])
+
** [https://www.w3.org/MarkUp/html-spec/ HTML 2.0 specification] (see also the [http://tools.ietf.org/html/rfc1866 RFC])
** [http://www.w3.org/TR/REC-html32 HTML 3.2 specification]
+
** [https://www.w3.org/TR/REC-html32 HTML 3.2 specification]
** [http://www.w3.org/TR/REC-html40/ HTML 4.01 specification]
+
** [https://www.w3.org/TR/REC-html40/ HTML 4.01 specification]
** [http://www.w3.org/TR/html5/ HTML5 working draft]
+
** [https://www.w3.org/TR/html5/ HTML 5 specification]
** [http://www.w3.org/TR/xhtml1 XHTML 1.0 specification]
+
** [https://www.w3.org/TR/html51/ HTML 5.1 specification (latest)]
** [http://www.w3.org/TR/xhtml11 XHTML 1.1 specification]
+
*** [https://www.w3.org/TR/2017/REC-html51-20171003/ HTML 5.1 2nd Edition 2017-10-03]
 +
** [https://www.w3.org/TR/html52/ HTML 5.2 specification (latest)]
 +
*** [https://www.w3.org/TR/2017/REC-html52-20171214/ HTML 5.2 2017-12-14]
 +
** [https://www.w3.org/TR/html53/ HTML 5.3 [candidate] specification (latest)]
 +
** [https://www.w3.org/TR/html/ HTML specification (latest)]
 +
** [https://www.w3.org/TR/xhtml1 XHTML 1.0 specification]
 +
** [https://www.w3.org/TR/xhtml11 XHTML 1.1 specification]
 +
** [https://www.w3.org/WebPlatform/WG/PubStatus#html HTML specifications publication status]
 +
** [https://www.w3.org/html/landscape/ The HTML Landscape] enumerates the differences between the W3C HTML 5.0, 5.1 and the WHATWG Living Standard. The source for the landscape site is available [https://github.com/w3c/html-landscape here].
 
* Web Hypertext Application Technology Working Group (WHATWG) specifications:
 
* Web Hypertext Application Technology Working Group (WHATWG) specifications:
 
** [http://wiki.whatwg.org/ The WHATWG Wiki]
 
** [http://wiki.whatwg.org/ The WHATWG Wiki]
Line 32: Line 34:
 
** [http://wiki.whatwg.org/wiki/HTML_derivatives List of 'HTML derivatives' and other spin-off specifications]
 
** [http://wiki.whatwg.org/wiki/HTML_derivatives List of 'HTML derivatives' and other spin-off specifications]
 
** [http://wiki.whatwg.org/wiki/HTML_snapshots HTML Snapshots]
 
** [http://wiki.whatwg.org/wiki/HTML_snapshots HTML Snapshots]
 +
** [http://html-differences.whatwg.org/ Differences from HTML 4: Living Document]
 +
* [https://github.com/mozilla/servo/wiki/Relevant-spec-links Relevant spec links according to the Mozilla Servo project].
 +
 +
== HTML vs. XHTML ==
 +
 +
In HTML versions prior to HTML 5, there was a "fork" between HTML and XHTML, with the former being [[SGML]]-based and the latter [[XML]]-based. While the features of both are for the most part very similar, there are some syntactic differences which can trap the unwary, usually not causing any actual problems in rendering in common browsers (which are very forgiving of errors), but preventing validation. For instance, any tags not requiring a matching ending tag (e.g., <br>) need an added slash in XHTML to make them self-closing (<br />). This should not be used in HTML. There are some other differences such as HTML tags and attributes being case-insensitive so they can be entered in either uppercase or lowercase, while XHTML is case-sensitive and its standard tags are all lowercase. Some parts of the respective syntaxes won't mix and still validate as either variety, which is a problem when webmasters paste in code from diverse sources (including ad-network and affiliate links and scripts which may have terms-of-service contracts mandating that they be used in an unmodified form). However, HTML 5, which is not directly based on either SGML or XML, is more forgiving of allowing such mixed syntax; its specs say that the underlying HTML document can be expressed in either syntax, and while you're still supposed to pick one or the other, there are very forgiving parsing rules for interpreting the document.
 +
 +
The "forgiving" processing of mixed syntax applies only to documents served with the MIME type "text/html"; if an XML MIME type is used, browsers are supposed to be stricter in interpreting the syntax and rejecting documents which are improper or which are of a form they don't understand.
 +
 +
== DOCTYPE ==
 +
 +
HTML and XHTML documents begin with a doctype declaration, which is of a format that had a specific meaning in SGML. Browsers and validators could recognize different doctypes to determine which version of HTML was being used, and browsers sometimes changed between "standards" and "quirks" parsing modes based on the doctype. HTML 5, since it was intended as a "living standard" and was no longer based on SGML, used a mimimalist doctype <code>&lt;!doctype html&gt;</code> designed to trigger standards mode in all browsers, but no longer giving any indication of which specific variety of HTML5 (and up) is in use.
 +
 +
== Nonstandard extensions ==
 +
 +
The formal specs, of course, do not fully describe the HTML documents in use in the "real world", as quite a number of nonstandard elements, attributes, and other extensions have been implemented in various browsers (including the most popular ones), and also, browsers have tended to be very forgiving of invalid markup, leading to lots of sloppy coding being widespread because "it works in [name of popular browser], so that's all that matters!"
 +
 +
In 2013, the Mozilla organization announced the removal of support for the nonstandard BLINK element, supported in various browsers since being introduced in the 1990s as a Netscape extension, and persisting despite widespread belief that it was annoying. New versions of Firefox and other Gecko-based browsers no longer flash text that is enclosed in this element, as well as in various [[Cascading Style Sheets|CSS]] rules suggesting blinking or flashing.
 +
 +
== Viewing source ==
 +
 +
Most browsers have a function to view the source code of the current page. HTML is inherently "open source", though the code might have some degree of obfuscation to make it hard to read in its raw form. This hasn't stopped some people, like a [https://twitter.com/GovParsonMO/status/1448697768311132160?s=20 governor of Missouri] from claiming "decoding HTML" (viewing the plain text source of it) to be illegal hacking and threatening to prosecute for it.
 +
 +
== See also ==
 +
* [[Hypermedia#HTML]]
  
 
== Software ==
 
== Software ==
 +
 +
=== Viewers ===
 +
* [https://en.wikipedia.org/wiki/Timeline_of_web_browsers Wikipedia's timeline of web browsers]
 +
* [http://browsers.evolt.org/ The Evolt Browser Archive]
 +
* [http://arc.opera.com/pub/opera/ Opera public downloads]
 +
 +
=== Validators ===
 +
* [http://validator.w3.org/ W3C HTML validator]
 +
* [https://github.com/validator/validator.github.io Command-line-based validator]
 +
 +
=== Conversions to/from HTML ===
 
* [http://www.mpdf1.com/mpdf/index.php mPDF: convert HTML to PDF]
 
* [http://www.mpdf1.com/mpdf/index.php mPDF: convert HTML to PDF]
 +
* [http://textract.readthedocs.org/en/latest/ Textract: extract text from various document formats including HTML]
 +
* [http://stackoverflow.com/questions/1403087/how-can-i-convert-an-html-table-to-csv Discussion on converting HTML tables to CSV]
 +
* [http://johnmacfarlane.net/pandoc/ Pandoc: Document format conversion swiss-army knife]
  
== Other resources ==
+
=== Miscellaneous ===
* [http://www.w3.org/community/webhistory/ W3C Web History Community Group]
+
* [https://github.com/uds-datalab/PDBF PBDF: Create documents that are simultaneously valid PDF, HTML, and VirtualBox OVA.]
 +
 
 +
== Sample files ==
 
* [https://github.com/w3c/html-testsuite/ HTML test suite]
 
* [https://github.com/w3c/html-testsuite/ HTML test suite]
* [http://mrcoles.com/demo/markdown-css/ Markdown CSS: makes HTML look like plain text]
+
* {{DexvertSamples|text/html}}
 +
 
 +
== Historical information ==
 +
* [http://www.w3.org/community/webhistory/ W3C Web History Community Group]
 
* [http://lists.w3.org/Archives/Public/www-talk/1992JanFeb/0000.html Tim Berners-Lee discusses Web protocols/formats in Jan 1992]
 
* [http://lists.w3.org/Archives/Public/www-talk/1992JanFeb/0000.html Tim Berners-Lee discusses Web protocols/formats in Jan 1992]
* [https://www.eff.org/press/releases/eff-makes-formal-objection-drm-html5 EFF Makes Formal Objection to DRM in HTML5]
 
* [http://www.the-pope.com/lostHTML.htm The Lost Tags of HTML], documenting early HTML versions and the tags that have been dropped from the standards.
 
 
* [http://diveintohtml5.info/past.html Dive into HTML5 - How did we get here?] also documents how HTML has developed.
 
* [http://diveintohtml5.info/past.html Dive into HTML5 - How did we get here?] also documents how HTML has developed.
 +
* [http://www.the-pope.com/lostHTML.htm The Lost Tags of HTML], documenting early HTML versions and the tags that have been dropped from the standards.
 +
* [http://www.montulli.org/theoriginofthe%3Cblink%3Etag The Origins of the <Blink> Tag]
 +
* [http://home.web.cern.ch/topics/birth-web CERN's 'The birth of the web']. Includes work on restoring the first website and building a line-mode/terminal web browser simulation.
 +
* [http://zachholman.com/posts/only-90s-developers/ Only 90s Web Developers Remember This]
 +
* [http://w3c.github.io/elements-of-html/ List of HTML/XHTML elements past and present]
 +
* [http://blog.foolip.org/2014/07/21/history-of-the-fullscreen-api/ History of the Fullscreen API]
 +
* [http://ericsink.com/Browser_Wars.html Memoirs From the Browser Wars]
 +
 +
== Other resources ==
 +
* [http://mrcoles.com/demo/markdown-css/ Markdown CSS: makes HTML look like plain text]
 +
* [https://www.eff.org/press/releases/eff-makes-formal-objection-drm-html5 EFF Makes Formal Objection to DRM in HTML5]
 +
* [http://boingboing.net/2013/10/02/w3c-green-lights-adding-drm-to.html W3C green-lights adding DRM to the Web's standards, says it's OK for your browser to say "I can't let you do that, Dave"]
 +
* [https://bugzilla.mozilla.org/show_bug.cgi?id=923590 Bug 923590 - Pledge never to implement HTML5 DRM (Bugzilla@Mozilla)]
 +
* [http://programming.oreilly.com/2013/04/stop-standardizing-html.html Stop standardizing HTML]
 +
* [http://bridgeit.mobi/ BridgeIt: JavaScript library to add native mobile features to HTML 5 web apps]
 +
* [http://www.latimes.com/business/technology/la-fi-tn-1-10-americans-html-std-study-finds-20140304,0,1188415.story 1 in 10 Americans think HTML is an STD, study finds]
 +
* [http://darobin.github.io/after5/ After 5: The future of HTML]
 +
* [http://shop.oreilly.com/product/0636920021049.do What Is HTML5? (free e-book)]
 +
* [http://www.andreaforte.net/ParkICER2013.pdf Towards a Taxonomy of Errors in HTML and CSS]
 +
* [http://w3c.github.io/csvw/html-note/ Embedding Tabular Metadata in HTML (W3C)]
 +
 +
[[Category:Markup]]
 +
[[Category:Web]]
 +
[[Category:W3C]]

Latest revision as of 14:56, 28 December 2023

File Format
Name HTML
Ontology
Extension(s) .html, .htm, .xhtml, .xht
MIME Type(s) text/html, application/xhtml+xml
PRONOM fmt/96, fmt/97, fmt/98, fmt/99, fmt/100, fmt/471, fmt/102, fmt/103
Wikidata ID Q8811, Q62626012, Q2892563, Q41676552, Q41676372, Q3782232
Released 1990

HTML (hypertext markup language) is one of the three pillars of the Web as originally developed by Tim Berners-Lee, along with HTTP and URLs. It is the markup language normally used for Web documents (although many other formats can also be used for material on the Web). It originally was an SGML based markup language. XHTML is HTML redeveloped using the stricter XML rules. Disagreement over the direction of W3C developments from some of the browser vendors led to the formation of the Web Hypertext Application Technology Working Group (WHATWG). They maintain the spec for the HTML5 or HTML Next or HTML Living Standard, which is not based on SGML any more. The W3C standardisation group will work to formalise the WHATWG specification as a series of standardised 'snapshots' of the living standard. One version of this standard has been "frozen" into a W3C HTML5 recommendation as of October 28, 2014, while the ongoing "living standard" is getting regular updates which in due time get frozen into subsequent W3C recommendations such as 5.1, 5.2, etc.

Contents

[edit] Specs

[edit] HTML vs. XHTML

In HTML versions prior to HTML 5, there was a "fork" between HTML and XHTML, with the former being SGML-based and the latter XML-based. While the features of both are for the most part very similar, there are some syntactic differences which can trap the unwary, usually not causing any actual problems in rendering in common browsers (which are very forgiving of errors), but preventing validation. For instance, any tags not requiring a matching ending tag (e.g., <br>) need an added slash in XHTML to make them self-closing (<br />). This should not be used in HTML. There are some other differences such as HTML tags and attributes being case-insensitive so they can be entered in either uppercase or lowercase, while XHTML is case-sensitive and its standard tags are all lowercase. Some parts of the respective syntaxes won't mix and still validate as either variety, which is a problem when webmasters paste in code from diverse sources (including ad-network and affiliate links and scripts which may have terms-of-service contracts mandating that they be used in an unmodified form). However, HTML 5, which is not directly based on either SGML or XML, is more forgiving of allowing such mixed syntax; its specs say that the underlying HTML document can be expressed in either syntax, and while you're still supposed to pick one or the other, there are very forgiving parsing rules for interpreting the document.

The "forgiving" processing of mixed syntax applies only to documents served with the MIME type "text/html"; if an XML MIME type is used, browsers are supposed to be stricter in interpreting the syntax and rejecting documents which are improper or which are of a form they don't understand.

[edit] DOCTYPE

HTML and XHTML documents begin with a doctype declaration, which is of a format that had a specific meaning in SGML. Browsers and validators could recognize different doctypes to determine which version of HTML was being used, and browsers sometimes changed between "standards" and "quirks" parsing modes based on the doctype. HTML 5, since it was intended as a "living standard" and was no longer based on SGML, used a mimimalist doctype <!doctype html> designed to trigger standards mode in all browsers, but no longer giving any indication of which specific variety of HTML5 (and up) is in use.

[edit] Nonstandard extensions

The formal specs, of course, do not fully describe the HTML documents in use in the "real world", as quite a number of nonstandard elements, attributes, and other extensions have been implemented in various browsers (including the most popular ones), and also, browsers have tended to be very forgiving of invalid markup, leading to lots of sloppy coding being widespread because "it works in [name of popular browser], so that's all that matters!"

In 2013, the Mozilla organization announced the removal of support for the nonstandard BLINK element, supported in various browsers since being introduced in the 1990s as a Netscape extension, and persisting despite widespread belief that it was annoying. New versions of Firefox and other Gecko-based browsers no longer flash text that is enclosed in this element, as well as in various CSS rules suggesting blinking or flashing.

[edit] Viewing source

Most browsers have a function to view the source code of the current page. HTML is inherently "open source", though the code might have some degree of obfuscation to make it hard to read in its raw form. This hasn't stopped some people, like a governor of Missouri from claiming "decoding HTML" (viewing the plain text source of it) to be illegal hacking and threatening to prosecute for it.

[edit] See also

[edit] Software

[edit] Viewers

[edit] Validators

[edit] Conversions to/from HTML

[edit] Miscellaneous

[edit] Sample files

[edit] Historical information

[edit] Other resources

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox