Integration of UPnP with HTML5

Unofficial Draft 17 November 2011

Bob Lund


This document describes how an HTML5 UA can fulfill the role of the User Interace (UI) application in a UPnP architecture by mapping existing HTML5 APIs to equivalent UPnP functions. New HTML5 APIs are proposed where required.

Status of This Document

This document is merely a public working draft of a potential specification. It has no official standing of any kind and does not represent the support or consensus of any standards organisation.

Table of Contents

1. HTML5 and UPnP Integration Overview

UPnP [UPnP] defines a peer-to-peer network architecture providing applications with standards-based connectivity to devices and services in a local area network. An HTML5 [HTML5] UA can implement UPnP functions, thereby enabling Web applications to access home network services. This specification describes the HTML5 UA functions required for it to act as an application in a UPnP network.

New HTML5 UA functions needed for UPnP integration are highlighted in a box like this.

The UPnP A/V architecture [UPnP AV Arch] defines a 2-Box A/V model in which an application with a control point accesses ContentDirectory [UPnP CDS] and ConnectionManager [UPnP CMS] services to discover and create a connection to A/V content in a media server [UPnP MS]. The control point uses a transfer client, such as HTTP, to pull the content for playback.

UPnP 2-Box AV Architecture
UPnP 2-Box A/V Model (from UPnP A/V Architecture)

When HTML5 is integrated with the 2-Box A/V architecture, the HTML5 UA replaces the functionality of the control point. A Web page server constructs a user interface with audio and video elements whose src attributes are links to the content in the media server. The UPnP ContentDirectory and ConnectionManager services are not used; in fact no UPnP services are used in this case, which is described in [UPnP AV Arch] Section 4.4.1 Minimal Implementation.

HTML5 UPnP 2-Box AV Architecture
HTML5 and UPnP 2-Box A/V Model

A 3-Box A/V model is also defined in [UPnP AV Arch] in which the control point is logically separate from the device consuming the media (MediaRenderer). The control point discovers MediaServers and MediaRenderers and plays back content using UPnP ContentDirectory, ConnectionManager and A/V Transport services.

UPnP 3-Box AV Architecture
UPnP 3-Box A/V Model (from UPnP A/V Architecture)

When HTML5 is integrated with the 3-Box A/V architecture, the HTML5 UA provides the control point functionality. The most important difference compared to the HTML5 and UPnP 2-Box A/V Model is that the HTML5 audio and video elements are not used to play content, since the MediaRenderer is external to the HTML5 UA.

HTML5 UPnP 3-Box AV Architecture
HTML5 and UPnP 3-Box A/V Model

Sections User interface discovery in UPnP, HTML5 UA support for 2-box UPnP A/V and In-band tracks in HTML5 describe the details of HTML5 integration with the 2-box A/V UPnP model. The section HTML5 UA support for 3-box UPnP A/V describes the 3-box model HTML5 integration case. The Quality of Service section applies to both the 2-box and 3-box cases

2. User interface discovery in UPnP

UPnP defines a UI service abstraction consisting of:

A UA acting as a RUIC can implement a RUI control point (distinct from a A/V control point) to discover a RUIS as described in [UPnP] 1. Discovery and 2. Description. A Web server acting as a RUIS must advertise itself, as described in [RUIS DEV] and [RUIS SVC] to be discoverable by a RUIC.

UPnP 2-Box RUI Architecture
UA discovery of RUIS

A UA acting as a RUIC can be made discoverable by a RUIS as described in [RUIC DEV] and [RUIC SVC]. A Web server acting as a RUIS can implement a RUI control point that discovers advertised RUICs.

A RUI control point may be implemented as a functionaly standalone process that discovers both RUICs and RUISs and uses the RUIC service [RUIC SVC] to inform RUICs of compatible RUISs.

UPnP 3-Box RUI Architecture
External RUI control point discovery of RUIC and RUIS

In the specific case of RUI discovery, it is HTML5 UA implementation specific how a list of discovered RUISs would be presented to the user. The general UPnP discovery method described in section HTML5 UA support for 3-box UPnP A/V proposes an API that script could use for RUIS discovery, as well as discovery of other UPnP servers.

As part of advertisement, a RUIC can enumerate a list of UI protocol types it supports as described in [RUIC SVC] 3.1.4. DeviceProfile. Each listed UI protocol type can be further distinguished by a free format string. A discoverable RUIS provides a GetCompatibleUIs action that returns a UIListing of UI protocol types that match the argument in the GetCompatibleUIs action as described in [RUIS SVC] 2.4.1 GetCompatibleUIs. A control point can use the RUIC and RUIS UI protocol information to select the most appropriate RUI service (client or server depending on the location of the RUI control point).

An HTML5 UA should implement the RUI control point so that it can discover RUI servers in the home network.

3. HTML5 UA support for 2-box UPnP A/V

A UA in the 2-Box UPnP A/V model does not access UPnP services for discovery or playback of UPnP content. There are, however, some requirements for a UA acting in this role.

UPnP Content Metadata

UPnP A/V content has associated metadata in the res property [UPnP CDS] that identifies the type of the content and provides information used in playback of the content:

MIME type
The content's MIME type is in the 3rd field of the res@protocolInfo property.
Content protection
Information specific to a content protection implementation may be in the 4th field of the res@protocolInfo property.
The size of file based content is in the res@size property

A Web page with links to UPnP content, as in the 2-Box A/V model, needs to convey the res property metadata. The type attribute of the HTML5 source element or the type parameter in the canPlayType() method can be used to convery the MIME type in the res@protocolInfo property. There is currently no standard way to convey content protection or size metadata. This could be accomplished in several ways:

  1. New attributes on the video, audio and source elements.
  2. The HTML5 UA and Web server uses new HTTP headers to request and respond with this metadata

The use of new attributes is the preferred approach and is described in [Bug 13625].

UPnP Live Content

UPnP content may be file based or live. In the case of live content, tracks in the media resource may be added and deleted over time. The ability for script to be made aware of these changes is critical for live content support.

Support for addition and deletion of tracks is proposed in [Bug 13358] and [Bug 14492]

4. HTML5 UA support for 3-Box UPnP A/V

HTML5 UA support for 2-box UPnP A/V describes how an HTML5 UA can make use of UPnP discovery to find Remote UI Server services. Web content access to other UPnP services such as [UPnP CDS], [UPnP Media Server] and [UPnP Media Renderer] can be used to provider additional home network A/V (and other) service.

Web content can access UPnP home network services in several ways:

  1. The user can be prompted by script to provide the URL (IP address and port) that links to the UPnP device description that contains the list of URLs of UPnP services. Script can then use XHR to communicate with the UPnP services. Cross origin restrictions apply and UPnP asynchrnous events would not work.
  2. Script can learn the IP subnet of the home network and query every IP address (on well-know ports) to try to obtain the UPnP device description. UPnP does not define well-known ports for obtaining device descriptions so these would be implementation specific. Cross origin restrictions apply and UPnP asynchrnous events would not work.
  3. An API exposing home network device and service discovery can be added to the HTML5 UA. This approach has been discussed in [HNTF Req]. An API for discovery proposed in [HNTF API] that defines how an HTML5 UA discovers and exposes the URLs of UPnP and Zeroconf (based on [MC DNS] and [DNS SD]) network devices and services. A dialogue-based security mechanism is also described to control which home network services can be discovered by Web content. Cross-origin restrictions are relaxed for those service URLs that a user allows Web content to access.

An HTML5 UA should implement a service discovery process with the functionality described in [HNTF API].

5. Quality of Service

UPnP defines support for prioritized and parameterized quality of service (QoS) [UPnP QoS]. If UPnP QoS is implemented, an HTML5 UA can label connections used to retrieve Web pages with an appropriate level of QoS. One possible algorithm is to assign such connections with a QoS level that has lower priority than used for A/V content.

6. In-band tracks in HTML5

Web and UPnP A/V content makes use of video, audio and text tracks in a media resource to meet regulatory, accessibility and business requirements. Examples are closed captioning, subtitles, audio descriptions, secondary audio program and Enhanced TV.

How these tracks are identified, when in-band in a media resource, and the type of data in the track, may differ by media container, geographical region and service provider. [MPEG-2 TS Tracks] defines a method for an HTML5 UA to expose all video, audio and text tracks in an MPEG-2 transport stream [MPEG-2 TS] to script independent of region or service provider. A track-description text track is created by the HTML5 UA to make the media resource metadata describing the media resource tracks available to script. The method also defines how video, audio and text tracks in the media resource should be mapped to the equivalent VideoTrack, AudioTrack and TextTrack objects. Script can use the metadata in the track-description TextTrackCues to recognize the type of tracks in the media resource. The UA can expose all tracks in a MPEG-2 TS independent of geographic region or service provider. Similar specifications will be created defining how this same method can be used to make all tracks in other media containers, such as WebM and Ogg, available to script in a generic manner.

  1. The HTML5 UA should implement a method to expose media resource track metadata. [MPEG-2 TS Tracks] proposes an implementation.
  2. [Bug 13357] proposes a new audio kind category for audio descriptions recognized by the HTML5 UA.
  3. [Bug 13359] proposes a new attribute to identify the type of metadata in text tracks recognized by the HTML5 UA.


Thanks are expressed by the editor to the following individuals for their input to and feedback on this specification to date (in alphabetical order).

Amol Bhagwat, Clarke Stevens and Mark Vickers

A. References

A.1 Normative references

A vocabulary and associated APIs for HTML and XHTML
H.222.0 Infrastructure of audiovisual services - Transmission multiplexing and synchronization
UPnP Device Architecture 1.1
[UPnP AV Arch]
UPnP AV Architecture:2
ContentDirectory:4 Service
[UPnP Media Renderer]
MediaRenderer:3 Device
[UPnP Media Server]
MediaServer:4 Device
[UPnP QoS]
UPnP-QoS Architecture:3
Remote UI Client and Server

A.2 Informative references

[Bug 13357]
HTML5 spec bug 13357
[Bug 13358]
HTML5 spec bug 13358
[Bug 13359]
HTML5 spec bug 13359
[Bug 13625]
HTML5 spec bug 13625
[Bug 14492]
HTML5 spec bug 14492
DNS-Based Service Discovery
Networked Service Discovery and Messaging
Home Network TF Requirements
Multicast DNS
[MPEG-2 TS Tracks]
Mapping from MPEG-2 Transport to HTML5