A Short Refresh On HTML5

on
  • HTML

I get asked a lot what all the changes from HTML4 to HTML5 were, in this blog post, we're going through most changes and additions that were introduced with HTML5.

The Doctype

The doctype pre HTML5 was pretty long and cumbersome. The new HTML5 doctype looks like this:

<!DOCTYPE html>

The doctype is case-insensitive, I mostly write it in all lowercase.

Specify the characterset

In HTML4, the charset was defined with:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

Which was shortened for HTML5 to:

<meta charset="UTF-8">

SVG and MathML

In HTML5 SVG and MathML elements may be used directly within a HTML document.

This makes the following example completely valid HTML:

<!doctype html>
<html lang="en">
  <head>
    <meta charset="UTF-8">
    <title>Using SVG in HTML</title>
  </head>
  <body>
    An SVG circle:
    <svg>
      <circle r="42" cx="42" cy="42" fill="rgb(28,77,70)"/>
    </svg>
  </body>
</html>

An SVG circle:

<svg height="84" width="84"><circle r="42" cx="42" cy="42" fill="rgb(28,77,70)"/></svg>

Other Changes

XML like void elements can now be used in HTML5. This is aligns HTML5 a bit with XHTML, and makes both of the following two elements valid:

<meta charset="UTF-8">
<meta charset="UTF-8"/>

Boolean attributes may omit the equals sign and value. Example:

<input type="checkbox" checked="checked"> <!-- HTML4 and XHTML -->
<input type="checkbox" checked>           <!-- HTML5 -->

Whitespace normalization was removed from attribute values, this means that whitespaces in id or similar attributes aren't ignored anymore and the <input> element can now use newline characters.

New structuring elements

  • <main> represents the main content of the document or application. It is suggested to add the ARIA role "main" to the element until Internet Explorer no longer needs to be supported (<main role="main">).

  • <section> represents a generic document or application section. It should include a heading element as a child to indicate the document structure.

  • <article> represents independent content of a document, e.g. a blog post, a news article or any item of content.

  • <aside> represents a section of the document with content related to the rest, which could be considered separate from the document.

  • <header> represents a group of introductory or navigational aids. It may contain heading elements, the site logo, a search form, and so on.

  • <footer> represents a footer for its nearest section, this typically contains information about the author of the section or links to related content.

  • <nav> represents a section of the document intended for navigation.

  • <figure> represents self-contained content, e.g. an image, a video, a quote, a code snippet, and so on.

  • <figcaption> represents a caption associated with a figure. The <figcaption> element must be the first or last child of a <figure> element and is optional.

  • <template> can be used to include arbitrary HTML to be cloned during runtime of a JavaScript application.

Other new elements

  • <audio> is used to embed sound content in documents.

  • <picture> is a container, which wraps a <img> element, to specify multiple picture sources for the <img> element.

  • <video> is used to embed video content in documents.

  • <source> is used to specify multiple media resources for <audio>, <picture> and <video> elements.

  • <track> is used to specify time-based data (e.g. subtitles) for an <audio> or a <video> element.

  • <embed> is used to embed external applications or content. This element was not standardized prior to HTML5

  • <mark> is used to highlight text for reference purposes.

  • <progress> is used in web applications to indicate loading progress of a task.

  • <meter> represents either a scalar value within a known range or a fractional value.

  • <time> represents a date or time.

  • <ruby>, <rt>, and <rp> allow for marking up ruby annotations.

  • <bdi> isolates a span of text that might be formatted in a different direction from other text outside it.

  • <wbr> represents a position where the browser may optionally break a line.

  • <canvas> is used to draw arbitrary graphics via JavaScript.

  1. <datalist> contains a set of <option> elements that represent the values available for other controls. Example: <input list="something"><datalist id="something"><option>else</option></datalist>

Removed elements

The following elements were removed because one should use CSS to style a document:

  • <basefont>

  • <big>

  • <center>

  • <font>

  • <strike>

  • <tt>

The following elements were removed because they hurt accessibility and usability.

  • <frame>

  • <frameset>

  • <noframes>

The following elements were removed because they can be replaced with other elements.

  • <acronym>, use <abbr> for abbreviations instead

  • <applet>, use <object> instead

  • <isindex>, use a HTML form instead

  • <dir>, use <ul> instead

New attributes

  • autofocus can be specified on the <button>, <input>, <select>, <textarea> elements. It provides a declarative way to focus a form control during page load.

  • placeholder can be specified on the <input> and <textarea> elements. It provides a hint intended to aid the user. The placeholder attribute should not be used as a replacement for the <label> element.

  • required can be specified on the <input>, <select>, <textarea> element. It indicated that the user must fill in a value in order to submit the form. For the <select> element, the first <option> must be a placeholder with an empty value (<option value="">Choose something</option> or <option></option>).

  • disabled on a <fieldset> element disables all the descendant form controls.

  • The <input> element has received several new attributes for each new type, you find all of them on the Mozilla Developer Network - <input> Element.

  • maxlength, minlength and wrap were introduced to the <textarea> element.

  • The <form> element has a new attribute novalidate to disable any form validation during submission.

  • The async attribute on <script> makes script loading and execution asynchronous.

  • The <ol> element attribute reversed indicates that the list order is descending.

  • The <img> element has received a new attribute called crossorigin to use CORS in the fetch and if it is successful, allows the image data to be read with the canvas API.

  • The contenteditable attributes switches an element into a editable element which can be manipulated by the user.

  • The data-* attributes which are intended for web-developers only, to be used by JavaScript or CSS.

  • The hidden attribute can now be used on any HTML element to visually hide it from the user.

  • The role and aria-* attributes to aid users with assistive technology.

  • The spellcheck attribute hints whether the content can be spellchecked or not.

  • The translate attributes gives a hint to translators whether the content should be translated.

Several existing attributes were also changed or removed in HTML5, you can read up those changes on the official W3C document on HTML5 changed attributes and HTML5 removed attributes.

The HTML5 Sections and Outlines

While in previous HTML versions the document structure and outline was only defined through the heading elements, in HTML5 sections can be nested and have their own heading hierarchy.

Example of an HTML4 document:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">  
    <title>HTML4 Outline Example</title>
  </head>
  <body>
    <div class="header">
      <h1>topaxi.codes</h1>
    </div>
    <div class="article">
      <h2>A Short Refresh On HTML5</h2>
      <p>...</p>
      <h3>The Doctype</h3>
      <p>...</p>
      <h4>List of Doctypes</h4>
      <ul>...</ul>
      <h3>Specify the characterset</h3>
      <p>...</p>
    </div>
  </body>
</html>

Leads to the following outline:

  1. topaxi.codes

    1. A Short Refresh On HTML5

      1. The Doctype

        1. List of Doctypes

      2. Specify the characterset

Example of an HTML5 document:

<!doctype html>
<html>
  <head>
    <meta charset="UTF-8">  
    <title>HTML5 Outline Example</title>
  </head>
  <body>
    <header>
      <h1>topaxi.codes</h1>
    <header>
    <main role="main">
      <article>
        <header>
          <h1>A Short Refresh On HTML5</h1>
        </header>
        <section>
          <p>...</p>
          <h1>The Doctype</h1>
          <p>...</p>
          <h2>List of Doctypes</h2>
          <ul>...</ul>
          <h1>Specify the characterset</h1>
          <p>...</p>
        </section>
      </article>
    </main>
  </body>
</html>

Leads to the following outline:

  1. topaxi.codes

    1. A Short Refresh On HTML5

      1. The Doctype

        1. List of Doctypes

      2. Specify the characterset

As one can see, each section has it's own hierarchy, the first heading defines at which level the hierarchy starts.

Theres a lot more minor changes to HTML5 I haven't mentioned here, if you're interested, you should look up the original W3C HTML5 Changes document.