HTML5之MSE标准为移动端的直播应用带来超低延时的播放体验


随着移动互联网应用的大规模普及,移动端的视频播放体验日益受到重视。

2016年11月17日,由Google与Microsoft等互联网巨头主导的HTML5 MSE扩展标准已经正式发布,这标志着以往由Apple公司发布的HLS协议标准将会很快退出历史舞台,就像PC端的 Adobe Flash Player 播放器一样,都将被新的通用性更强的技术标准所代替。这些变化最终都将大幅提升终端用户的视频播放体验,无疑将受到行业的广泛支持和拥抱。

下面是MSE的 推荐标准,国内目前已有公司按照该标准在产品中有了具体实现。

标准原文如下:


   

Media Source Extensions™

W3CRecommendation

Please check the errata for any errors or issues reported since publication.

The English version of this specification is the only normative version. Non-normative      translations may also be available.


Abstract

This specification extends HTMLMediaElement [HTML51]
to allow JavaScript to generate media streams for playback. Allowing
JavaScript to generate streams facilitates a variety of use cases like
adaptive streaming and time shifting live streams.

Status of This Document

This section describes the status of this document at the time
of its publication. Other documents may supersede this document. A list
of currentW3Cpublications and the latest revision of this technical report can be found in the W3Ctechnical reports index at https://www.w3.org/TR/.

   

The working group maintains a list of all bug reports. New features for this specification are expected to be incubated in the Web Platform Incubator Community Group.

One editorial issue (removing the exposure of createObjectURL(mediaSource) in workers) was addressed since the previous publication. For the list of changes done since the previous version, see the commits.

By publishing this Recommendation,W3Cexpects the functionality specified in this Recommendation will not be
affected by changes to File API. The Working Group will continue to
track these specifications.

This document was published by the HTML Media Extensions Working Group as a Recommendation. If you wish to make comments regarding this document,  the GitHub repository is preferred for discussion of this specification. Historical discussion can also be found in the mailing list archives).

In September 2016, the Working Group used an implementation
             report
to move this document to Recommendation.

This document has been reviewed byW3CMembers, by software developers, and by otherW3Cgroups and interested parties, and is endorsed by the Director as aW3CRecommendation. It is a stable document and may be used as reference material or cited from another document.W3C's
role in making the Recommendation is to draw attention to the
specification and to promote its widespread deployment. This enhances
the functionality and interoperability of the Web.

This document was produced by a group operating under the      5 February 2004W3CPatent
             Policy
.W3Cmaintains a public list of any patent
               disclosures
made in connection with the deliverables
of the group; that page also includes instructions for disclosing a
patent. An individual who has actual knowledge of a patent which the
individual believes contains      Essential
             Claim(s)
must disclose the information in accordance with      section
             6 of theW3CPatent Policy
.

This document is governed by the 1 September 2015W3CProcess Document.

1. Introduction
   

This section is non-normative.

This specification allows JavaScript to dynamically construct
media streams for <audio> and <video>. It defines a
MediaSource object that can serve as a source of media data for an
HTMLMediaElement. MediaSource objects have one or more SourceBuffer objects. Applications append data segments to the SourceBuffer objects, and can adapt the quality of appended data based on system performance and other factors. Data from the SourceBuffer
objects is managed as track buffers for audio, video and text data that
is decoded and played. Byte stream specifications used with these
extensions are available in the byte stream format registry [MSE-REGISTRY].Media Source Pipeline Model Diagram

1.1 Goals
     

This specification was designed with the following goals in mind:

  • Allow JavaScript to construct media streams independent of how the media is fetched.

  • Define a splicing and buffering model that facilitates use
    cases like adaptive streaming, ad-insertion, time-shifting, and video
    editing.

  • Minimize the need for media parsing in JavaScript.

  • Leverage the browser cache as much as possible.

  • Provide requirements for byte stream format specifications.

  • Not require support for any particular media format or codec.

This specification defines:

  • Normative behavior for user agents to enable
    interoperability between user agents and web applications when
    processing media data.

  • Normative requirements to enable other specifications to define media formats to be used within this specification.

1.2 Definitions
     

  • Active Track Buffers

  • The track buffers that provide coded frames for the enabled
               audioTracks, the selected videoTracks, and the            "showing" or "hidden" textTracks. All these tracks are associated with            SourceBuffer objects in the activeSourceBuffers list.

  • Append Window

  • A presentation timestamp range used to filter out coded frames
    while appending. The append window represents a single continuous time
    range with a single start time and end time. Coded frames with presentation timestamp within this range are allowed to be appended to the SourceBuffer while coded frames outside this range are filtered out. The append window start and end times are controlled by the appendWindowStart and appendWindowEnd attributes respectively.

  • Coded Frame

  • A unit of media data that has a presentation timestamp, a decode timestamp, and a coded frame duration.

  • Coded Frame Duration

  • The duration of a coded frame. For video and text, the duration indicates how long the video frame or text SHOULD
    be displayed. For audio, the duration represents the sum of all the
    samples contained within the coded frame. For example, if an audio frame
    contained 441 samples @44100Hz the frame duration would be 10
    milliseconds.

  • Coded Frame End Timestamp

  • The sum of a coded frame presentation timestamp and its            coded frame duration. It represents the presentation timestamp that immediately follows the coded frame.

  • Coded Frame Group

  • A group of coded frames that are adjacent and have monotonically increasing decode timestamps without any gaps. Discontinuities detected by the            coded frame processing algorithm and abort() calls trigger the start of a new coded frame group.

  • Decode Timestamp

  • The decode timestamp indicates the latest time at which
    the frame needs to be decoded assuming instantaneous decoding and
    rendering of this and any dependant frames (this is equal to the presentation timestamp of the earliest frame, in presentation order, that is dependant on this frame). If frames can be decoded out of presentation order, then the decode timestamp MUST be present in or derivable from the byte stream. The user agent MUST run the append error algorithm if this is not the case. If frames cannot be decoded out of presentation order and a decode timestamp is not present in the byte stream, then the decode timestamp is equal to the presentation timestamp.

  • Initialization Segment

  • A sequence of bytes that contain all of the initialization information required to decode a sequence of media segments. This includes codec initialization data, Track ID mappings for multiplexed segments, and timestamp offsets (e.g., edit lists).

    Note

    The byte stream format specifications in the byte stream format registry [MSE-REGISTRY] contain format specific examples.

  • Media Segment

  • A sequence of bytes that contain packetized & timestamped media data for a portion of the media timeline. Media segments are always associated with the most recently appended initialization segment.

    Note

    The byte stream format specifications in the byte stream format registry [MSE-REGISTRY] contain format specific examples.

  • MediaSource object URL

  • A MediaSource object URL is a unique Blob URI [FILE-API] created by createObjectURL(). It is used to attach a MediaSource object to an HTMLMediaElement.

    These URLs are the same as a Blob URI, except that anything in the definition of that feature that refers to File and Blob objects is hereby extended to also apply to MediaSource objects.

    The origin of the MediaSource object URL is the relevant settings object of this during the call to createObjectURL().

    Note

    For example, the origin of the MediaSource object URL affects the way that the media element is consumed by canvas.

  • Parent Media Source

  • The parent media source of a SourceBuffer object is the MediaSource object that created it.

  • Presentation Start Time

  • The presentation start time is the earliest time point in the presentation and specifies the initial playback position and earliest possible position. All presentations created using this specification have a presentation start time of 0.

    Note

    For the purposes of determining if HTMLMediaElement.buffered contains a TimeRange that includes the current playback position, implementations MAY choose to allow a current playback position at or after presentation start time and before the first TimeRange to play the first TimeRange if that TimeRange starts within a reasonably short time, like 1 second, after presentation start time. This allowance accommodates the reality that muxed streams commonly do not begin all tracks precisely at presentation start time. Implementations MUST report the actual buffered range, regardless of this allowance.

  • Presentation Interval

  • The presentation interval of a coded frame is the time interval from its presentation timestamp to the presentation timestamp plus the coded frame's duration. For example, if a coded frame has a presentation timestamp of 10 seconds and a coded frame duration
    of 100 milliseconds, then the presentation interval would be [10-10.1).
    Note that the start of the range is inclusive, but the end of the range
    is exclusive.

  • Presentation Order

  • The order that coded frames are rendered in the presentation. The presentation order is achieved by ordering coded frames in monotonically increasing order by their presentation timestamps.

  • Presentation Timestamp

  • A reference to a specific time in the presentation. The presentation timestamp in a coded frame indicates when the frame SHOULD be rendered.

  • Random Access Point

  • A position in a media segment
    where decoding and continuous playback can begin without relying on any
    previous data in the segment. For video this tends to be the location
    of I-frames. In the case of audio, most audio frames can be treated as a
    random access point. Since video tracks tend to have a more sparse
    distribution of random access points, the location of these points are
    usually considered the random access points for multiplexed streams.

  • SourceBuffer byte stream format specification

  • The specific byte stream format specification that describes the format of the byte stream accepted by a SourceBuffer instance. The            byte stream format specification, for a SourceBuffer object, is selected based on thetypepassed to the            addSourceBuffer() call that created the object.

  • SourceBuffer configuration

  • A specific set of tracks distributed across one or more SourceBuffer objects owned by a single MediaSource instance.

    Implementations MUST support at least 1 MediaSource object with the following configurations:

    MediaSource objects MUST
    support each of the configurations above, but they are only required to
    support one configuration at a time. Supporting multiple configurations
    at once or additional configurations is a quality of implementation
    issue.

    • A single SourceBuffer with 1 audio track and/or 1 video track.

    • Two SourceBuffers with one handling a single audio track and the other handling a single video track.

  • Track Description

  • A byte stream format specific structure that provides the Track ID, codec configuration, and other metadata for a single track. Each track description inside a single initialization segment has a unique Track ID. The user agent MUST run the append error algorithm if the Track ID is not unique within the initialization segment.

  • Track ID

  • A Track ID is a byte stream format specific identifier that
    marks sections of the byte stream as being part of a specific track.
    The Track ID in a track description identifies which sections of a media segment belong to that track.

2. MediaSource Object
   

The MediaSource object represents a source of media data for an HTMLMediaElement. It keeps track of the readyState for this source as well as a list of SourceBuffer
objects that can be used to add media data to the presentation.
MediaSource objects are created by the web application and then attached
to an HTMLMediaElement. The application uses the SourceBuffer objects in sourceBuffers to add media data to this source. The HTMLMediaElement fetches this media data from the MediaSource object when it is needed during playback.

Each MediaSource object has alive seekable rangevariable that stores a normalized TimeRanges object. This variable is initialized to an empty TimeRanges object when the MediaSource object is created, is maintained by setLiveSeekableRange() and clearLiveSeekableRange(), and is used in HTMLMediaElement Extensions to modify HTMLMediaElement.seekable behavior.

enum ReadyState {
    "closed",
    "open",
    "ended"
};

     

Enumeration description
closed Indicates the source is not currently attached to a media element.
open The source has been opened by a media element and is ready for data to be appended to the SourceBuffer objects in sourceBuffers.
ended The source is still attached to a media element, but endOfStream() has been called.
enum EndOfStreamError {
    "network",
    "decode"
};

     

Enumeration description
network

Terminates playback and signals that a network error has occured.

Note

JavaScript applications SHOULD
use this status code to terminate playback with a network error. For
example, if a network error occurs while fetching media data.

decode

Terminates playback and signals that a decoding error has occured.

Note

JavaScript applications SHOULD
use this status code to terminate playback with a decode error. For
example, if a parsing error occurs while processing out-of-band media
data.

[Constructor]
interface MediaSource : EventTarget {    readonly attribute SourceBufferList    sourceBuffers;    readonly attribute SourceBufferList    activeSourceBuffers;    readonly attribute ReadyState          readyState;             attribute unrestricted double duration;             attribute EventHandler        onsourceopen;             attribute EventHandler        onsourceended;             attribute EventHandler        onsourceclose;    SourceBuffer addSourceBuffer(DOMString type);    void         removeSourceBuffer(SourceBuffer sourceBuffer);    void         endOfStream(optional EndOfStreamError error);    void         setLiveSeekableRange(double start, double end);    void         clearLiveSeekableRange();    static boolean isTypeSupported(DOMString type);};

   

2.1 Attributes
     

2.2 Methods
     

2.3 Event Summary
     

Event name Interface Dispatched when…
sourceopen Event readyState transitions from "closed" to "open" or from "ended" to "open".
sourceended Event readyState transitions from "open" to "ended".
sourceclose Event readyState transitions from "open" to "closed" or "ended" to "closed".

2.4 Algorithms
     

2.4.1 Attaching to a media element
       

A MediaSource object can be attached to a media element by assigning a MediaSource object URL to the media element src attribute or the src attribute of a <source> inside a media element. A MediaSource object URL is created by passing a MediaSource object to createObjectURL().

If the resource fetch algorithm
was invoked with a media provider object that is a MediaSource object
or a URL record whose object is a MediaSource object, then let mode be
local, skip the first step in the resource fetch algorithm (which may otherwise set mode to remote) and add the steps and clarifications below to the "Otherwise (mode is local)" section of the resource fetch algorithm.

Note

The resource fetch algorithm's
first step is expected to eventually align with selecting local mode
for URL records whose objects are media provider objects. The intent is
that if the HTMLMediaElement's src attribute or selected child <source>'s src attribute is a blob: URL matching a MediaSource object URL when the respective src
attribute was last changed, then that MediaSource object is used as the
media provider object and current media resource in the local mode
logic in the resource fetch algorithm.
This also means that the remote mode logic that includes observance of
any preload attribute is skipped when a MediaSource object is attached.
Even with that eventual change to [HTML51],
the execution of the following steps at the beginning of the local mode
logic is still required when the current media resource is a
MediaSource object.

Note

Relative to the action which triggered the media
element's resource selection algorithm, these steps are asynchronous.
The resource fetch algorithm is run after the task that invoked the
resource selection algorithm is allowed to continue and a stable state
is reached. Implementations may delay the steps in the "Otherwise" clause, below, until the MediaSource object is ready for use.

Note

An attached MediaSource does not use the remote mode steps in the resource fetch algorithm,
so the media element will not fire "suspend" events. Though future
versions of this specification will likely remove "progress" and
"stalled" events from a media element with an attached MediaSource, user
agents conforming to this version of the specification may still fire
these two events as these [HTML51] references changed after implementations of this specification stabilized.

2.4.2 Detaching from a media element
       

The following steps are run in any case where the media element is going to transition to NETWORK_EMPTY and queue a task to fire a simple event named emptied at the media element. These steps SHOULD be run right before the transition.

  1. Set the readyState attribute to "closed".

  2. Update duration to NaN.

  3. Remove all the SourceBuffer objects from activeSourceBuffers.

  4. Queue a task to fire a simple event named removesourcebuffer at activeSourceBuffers.

  5. Remove all the SourceBuffer objects from sourceBuffers.

  6. Queue a task to fire a simple event named removesourcebuffer at sourceBuffers.

  7. Queue a task to fire a simple event named sourceclose at the MediaSource.

Note

Going forward, this algorithm is intended to be externally called and run in any case where the attached MediaSource, if any, must be detached from the media element. It MAY be called on HTMLMediaElement [HTML51]
operations like load() and resource fetch algorithm failures in
addition to, or in place of, when the media element transitions to NETWORK_EMPTY.
Resource fetch algorithm failures are those which abort either the
resource fetch algorithm or the resource selection algorithm, with the
exception that the "Final step" [HTML51] is not considered a failure that triggers detachment.

2.4.3 Seeking
       

Run the following steps as part of the "Wait until the
user agent has established whether or not the media data for the new
playback position is available, and, if it is, until it has decoded
enough data to play back that position"
step of the seek algorithm:

  1. Note

    The media element looks for media segments containing thenew playback positionin each SourceBuffer object in activeSourceBuffers. Any position within a TimeRange in the current value of the HTMLMediaElement.buffered attribute has all necessary media segments buffered for that position.

    1. If the HTMLMediaElement.readyState attribute is greater than                    HAVE_METADATA, then set the HTMLMediaElement.readyState attribute to HAVE_METADATA.

      Note

      Per HTMLMediaElement ready states [HTML51] logic, HTMLMediaElement.readyState changes may trigger events on the HTMLMediaElement.

    2. The media element waits until an appendBuffer() call causes the coded frame processing algorithm to set the HTMLMediaElement.readyState attribute to a value greater than HAVE_METADATA.

      Note

      The web application can use buffered and HTMLMediaElement.buffered to determine what the media element needs to resume playback.

  2. The media element resets all decoders and initializes each one with data from the appropriate initialization segment.

  3. The media element feeds coded frames from the active track buffers into the decoders starting with the closest random access point before thenew playback position.

  4. Resume the seek algorithm at the "Await a stable state" step.

2.4.4 SourceBuffer Monitoring
       

The following steps are periodically run during playback to make sure that all of the SourceBuffer objects in activeSourceBuffers have enough data to ensure uninterrupted playback. Changes to activeSourceBuffers also cause these steps to run because they affect the conditions that trigger state transitions.

Havingenough data to ensure uninterrupted playbackis an implementation specific condition where the user agent determines
that it currently has enough data to play the presentation without
stalling for a meaningful period of time. This condition is constantly
evaluated to determine when to transition the media element into and out
of the HAVE_ENOUGH_DATA
ready state. These transitions indicate when the user agent believes it
has enough data buffered or it needs more data respectively.

Note

An implementation MAY
choose to use bytes buffered, time buffered, the append rate, or any
other metric it sees fit to determine when it has enough data. The
metrics used MAY change during playback so web applications SHOULD only rely on the value of            HTMLMediaElement.readyState to determine whether more data is needed or not.

Note

When the media element needs more data, the user agent SHOULD transition it from HAVE_ENOUGH_DATA to            HAVE_FUTURE_DATA
early enough for a web application to be able to respond without
causing an interruption in playback. For example, transitioning when the
current playback position is 500ms before the end of the buffered data
gives the application roughly 500ms to append more data before playback
stalls.

2.4.5 Changes to selected/enabled track state
       

During playback activeSourceBuffers needs to be updated if the selected video track, the enabled audio track(s), or a text track mode changes. When one or more of these changes occur the following steps need to be followed.

2.4.6 Duration change
       

Follow these steps when duration needs to change to anew duration.

  1. If the current value of duration is equal tonew duration, then return.

  2. Ifnew durationis less than the highest presentation timestamp of any buffered coded frames for all SourceBuffer objects in sourceBuffers, then throw an InvalidStateError exception and abort these steps.

    Note

    Duration reductions that would truncate currently buffered media are disallowed. When truncation is necessary, use remove() to reduce the buffered range before updating duration.

  3. Lethighest end timebe the largest track buffer ranges end time across all the track buffers across all SourceBuffer objects in sourceBuffers.

  4. Ifnew durationis less thanhighest end time, then

    Note

    This condition can occur because the coded frame removal algorithm preserves coded frames that start before the start of the removal range.

    1. Updatenew durationto equalhighest end time.

  5. Update duration tonew duration.

  6. Update the media duration tonew durationand run the HTMLMediaElement duration change algorithm.

2.4.7 End of stream algorithm
       

This algorithm gets called when the application signals the end of stream via an endOfStream() call or an algorithm needs to signal a decode error. This algorithm takes anerrorparameter that indicates whether an error will be signalled.

  1. Change the readyState attribute value to "ended".

  2. Queue a task to fire a simple event named sourceended at the MediaSource.

    1. Run the duration change algorithm withnew durationset to the largest track buffer ranges end time across all the track buffers across all SourceBuffer objects in sourceBuffers.

      Note

      This allows the duration to properly
      reflect the end of the appended media segments. For example, if the
      duration was explicitly set to 10 seconds and only media segments for 0
      to 5 seconds were appended before endOfStream() was called, then the
      duration will get updated to 5 seconds.

    2. Notify the media element that it now has all of the media data.

3. SourceBuffer Object
   

enum AppendMode {
    "segments",
    "sequence"
};

   

Enumeration description
segments

The timestamps in the media segment determine where the coded frames are placed in the presentation. Media segments can be appended in any order.

sequence

Media segments will be treated as adjacent in time
independent of the timestamps in the media segment. Coded frames in a
new media segment will be placed immediately after the coded frames in
the previous media segment. The timestampOffset
attribute will be updated if a new offset is needed to make the new
media segments adjacent to the previous media segment. Setting the timestampOffset attribute in "sequence"
mode allows a media segment to be placed at a specific position in the
timeline without any knowledge of the timestamps in the media segment.

interface SourceBuffer : EventTarget {             attribute AppendMode          mode;    readonly attribute boolean             updating;    readonly attribute TimeRanges          buffered;             attribute double              timestampOffset;    readonly attribute AudioTrackList      audioTracks;    readonly attribute VideoTrackList      videoTracks;    readonly attribute TextTrackList       textTracks;             attribute double              appendWindowStart;             attribute unrestricted double appendWindowEnd;             attribute EventHandler        onupdatestart;             attribute EventHandler        onupdate;             attribute EventHandler        onupdateend;             attribute EventHandler        onerror;             attribute EventHandler        onabort;    void appendBuffer(BufferSource data);    void abort();    void remove(double start, unrestricted double end);};

   

3.1 Attributes
     

3.2 Methods
     

3.3 Track Buffers
     

Atrack bufferstores the track descriptions and coded frames for an individual track. The track buffer is updated as initialization segments and media segments are appended to the        SourceBuffer.

Each track buffer has alast decode timestampvariable that stores the decode timestamp of the last coded frame appended in the current coded frame group. The variable is initially unset to indicate that no coded frames have been appended yet.

Each track buffer has alast frame durationvariable that stores the coded frame duration of the last coded frame appended in the current coded frame group. The variable is initially unset to indicate that no coded frames have been appended yet.

Each track buffer has ahighest end timestampvariable that stores the highest coded frame end timestamp across all coded frames in the current coded frame group that were appended to this track buffer. The variable is initially unset to indicate that no coded frames have been appended yet.

Each track buffer has aneed random access point flagvariable that keeps track of whether the track buffer is waiting for a random access point coded frame. The variable is initially set to true to indicate that random access point coded frame is needed before anything can be added to the        track buffer.

Each track buffer has atrack buffer rangesvariable that represents the presentation time ranges occupied by the coded frames currently stored in the track buffer.

Note

For track buffer ranges, these presentation time ranges are based on presentation timestamps, frame durations, and potentially coded frame group start times for coded frame groups across track buffers in a muxed SourceBuffer.

For specification purposes, this information is treated as if it were stored in a normalized TimeRanges object. Intersected track buffer ranges are used to report HTMLMediaElement.buffered, and MUST therefore support uninterrupted playback within each range of HTMLMediaElement.buffered.

Note

These coded frame group start times differ slightly from those mentioned in the coded frame processing algorithm in that they are the earliest presentation timestamp across all track buffers following a discontinuity. Discontinuities can occur within the coded frame processing algorithm or result from the coded frame removal algorithm, regardless of mode. The threshold for determining disjointness of track buffer ranges is implementation-specific. For example, to reduce unexpected playback stalls, implementations MAY approximate the coded frame processing algorithm's
discontinuity detection logic by coalescing adjacent ranges separated
by a gap smaller than 2 times the maximum frame duration buffered so far
in this track buffer. Implementations MAY also use coded frame group start times as range start times across track buffers in a muxed SourceBuffer to further reduce unexpected playback stalls.

3.4 Event Summary
     

Event name Interface Dispatched when…
updatestart Event updating transitions from false to true.
update Event The append or remove has successfully completed. updating transitions from true to false.
updateend Event The append or remove has ended.
error Event An error occurred during the append. updating transitions from true to false.
abort Event The append or remove was aborted by an abort() call. updating transitions from true to false.

3.5 Algorithms
     

3.5.1 Segment Parser Loop
       

All SourceBuffer objects have an internalappend statevariable that keeps track of the high-level segment parsing state. It is initially set to WAITING_FOR_SEGMENT and can transition to the following states as data is appended.

Append state name Description
WAITING_FOR_SEGMENT Waiting for the start of an initialization segment or media segment to be appended.
PARSING_INIT_SEGMENT Currently parsing an initialization segment.
PARSING_MEDIA_SEGMENT Currently parsing a media segment.

Theinput bufferis a byte buffer that is used to hold unparsed bytes across appendBuffer() calls. The buffer is empty when the SourceBuffer object is created.

Thebuffer full flagkeeps track of whether appendBuffer()
is allowed to accept more bytes. It is set to false when the
SourceBuffer object is created and gets updated as data is appended and
removed.

Thegroup start timestampvariable keeps track of the starting timestamp for a new          coded frame group in the "sequence" mode. It is unset when the SourceBuffer object is created and gets updated when the mode attribute equals "sequence" and the          timestampOffset attribute is set, or the coded frame processing algorithm runs.

Thegroup end timestampvariable stores the highest coded frame end timestamp across all coded frames in the current coded frame group. It is set to 0 when the SourceBuffer object is created and gets updated by the coded frame processing algorithm.

Note

Thegroup end timestampstores the highest coded frame end timestamp across all track buffers in a SourceBuffer. Therefore, care should be taken in setting the mode attribute when appending multiplexed segments in which the timestamps are not aligned across tracks.

Thegenerate timestamps flagis a boolean variable that keeps track of whether timestamps need to be generated for the          coded frames passed to the coded frame processing algorithm. This flag is set by addSourceBuffer() when the SourceBuffer object is created.

When the segment parser loop algorithm is invoked, run the following steps:

  1. Loop Top: If theinput bufferis empty, then jump to the need more data step below.

  2. If theinput buffercontains bytes that violate the SourceBuffer byte stream format specification, then run the            append error algorithm and abort this algorithm.

  3. Remove any bytes that the byte stream format specifications say MUST be ignored from the start of theinput buffer.

  4. If theappend stateequals WAITING_FOR_SEGMENT, then run the following steps:

    1. If the beginning of theinput bufferindicates the start of an initialization segment, set theappend stateto PARSING_INIT_SEGMENT.

    2. If the beginning of theinput bufferindicates the start of a media segment, setappend stateto PARSING_MEDIA_SEGMENT.

    3. Jump to the loop top step above.

  5. If theappend stateequals PARSING_INIT_SEGMENT, then run the following steps:

    1. If theinput bufferdoes not contain a complete initialization segment yet, then jump to the need more data step below.

    2. Run the initialization segment received algorithm.

    3. Remove the initialization segment bytes from the beginning of theinput buffer.

    4. Setappend stateto WAITING_FOR_SEGMENT.

    5. Jump to the loop top step above.

  6. If theappend stateequals PARSING_MEDIA_SEGMENT, then run the following steps:

    1. If thefirst initialization segment received flagis false, then run the append error algorithm and abort this algorithm.

    2. If theinput buffercontains one or more complete coded frames, then run the                coded frame processing algorithm.

      Note

      The frequency at which the coded frame processing
      algorithm is run is implementation-specific. The coded frame processing
      algorithm MAY be called when the input buffer contains the complete media segment or it MAY be called multiple times as complete coded frames are added to the input buffer.

    3. If this SourceBuffer is full and cannot accept more media data, then set thebuffer full flagto true.

    4. If theinput bufferdoes not contain a complete media segment, then jump to the need more data step below.

    5. Remove the media segment bytes from the beginning of theinput buffer.

    6. Setappend stateto WAITING_FOR_SEGMENT.

    7. Jump to the loop top step above.

  7. Need more data: Return control to the calling algorithm.

3.5.2 Reset Parser State
       

When the parser state needs to be reset, run the following steps:

  1. If theappend stateequals PARSING_MEDIA_SEGMENT and theinput buffercontains some complete coded frames, then run the coded frame processing algorithm until all of these complete            coded frames have been processed.

  2. Unset thelast decode timestampon all track buffers.

  3. Unset thelast frame durationon all track buffers.

  4. Unset thehighest end timestampon all track buffers.

  5. Set theneed random access point flagon all track buffers to true.

  6. If the mode attribute equals "sequence", then set thegroup start timestampto thegroup end timestamp

  7. Remove all bytes from theinput buffer.

  8. Setappend stateto WAITING_FOR_SEGMENT.

3.5.3 Append Error Algorithm
       

This algorithm is called when an error occurs during an append.

  1. Run the reset parser state algorithm.

  2. Set the updating attribute to false.

  3. Queue a task to fire a simple event named error at this SourceBuffer object.

  4. Queue a task to fire a simple event named updateend at this SourceBuffer object.

  5. Run the end of stream algorithm with theerrorparameter set to "decode".

3.5.4 Prepare Append Algorithm
       

When an append operation begins, the follow steps are run to validate and prepare the SourceBuffer.

  1. If the SourceBuffer has been removed from the sourceBuffers attribute of the parent media source then throw an InvalidStateError exception and abort these steps.

  2. If the updating attribute equals true, then throw an InvalidStateError exception and abort these steps.

  3. If the HTMLMediaElement.error attribute is not null, then throw an InvalidStateError exception and abort these steps.

  4. If the readyState attribute of the parent media source is in the "ended" state then run the following steps:

    1. Set the readyState attribute of the parent media source to "open"

    2. Queue a task to fire a simple event named sourceopen at the parent media source.

  5. Run the coded frame eviction algorithm.

  6. If thebuffer full flagequals true, then throw a QuotaExceededError exception and abort these step.

    Note

    This is the signal that the implementation was
    unable to evict enough data to accommodate the append or the append is
    too big. The web application SHOULD use remove() to explicitly free up space and/or reduce the size of the append.

3.5.5 Buffer Append Algorithm
       

When appendBuffer() is called, the following steps are run to process the appended data.

  1. Run the segment parser loop algorithm.

  2. If the segment parser loop algorithm in the previous step was aborted, then abort this algorithm.

  3. Set the updating attribute to false.

  4. Queue a task to fire a simple event named update at this SourceBuffer object.

  5. Queue a task to fire a simple event named updateend at this SourceBuffer object.

3.5.6 Range Removal
       

Follow these steps when a caller needs to initiate a
JavaScript visible range removal operation that blocks other
SourceBuffer updates:

  1. Letstartequal the starting presentation timestamp for the removal range, in seconds measured from presentation start time.

  2. Letendequal the end presentation timestamp for the removal range, in seconds measured from presentation start time.

  3. Set the updating attribute to true.

  4. Queue a task to fire a simple event named updatestart at this SourceBuffer object.

  5. Return control to the caller and run the rest of the steps asynchronously.

  6. Run the coded frame removal algorithm withstartandendas the start and end of the removal range.

  7. Set the updating attribute to false.

  8. Queue a task to fire a simple event named update at this SourceBuffer object.

  9. Queue a task to fire a simple event named updateend at this SourceBuffer object.

3.5.7 Initialization Segment Received
       

The following steps are run when the segment parser loop successfully parses a complete initialization segment:

Each SourceBuffer object has an internalfirst initialization segment received flagthat tracks whether the first initialization segment
has been appended and received by this algorithm. This flag is set to
false when the SourceBuffer is created and updated by the algorithm
below.

  1. Update the duration attribute if it currently equals NaN:

  2. If the initialization segment has no audio, video, or text tracks, then run the append error algorithm and abort these steps.

  3. If thefirst initialization segment received flagis true, then run the following steps:

    1. Verify the following properties. If any of the checks fail then run the append error algorithm and abort these steps.

    2. Add the appropriate track descriptions from this initialization segment to each of the                track buffers.

    3. Set theneed random access point flagon all track buffers to true.

  4. Letactive track flagequal false.

  5. If thefirst initialization segment received flagis false, then run the following steps:

    1. Add this SourceBuffer to activeSourceBuffers.

    2. Queue a task to fire a simple event named addsourcebuffer at activeSourceBuffers

    3. Lettext byte stream track IDbe the                    Track ID for the current track being processed.

    4. Lettext languagebe a BCP 47 language tag for the language specified in the initialization segment for this track or an empty string if no language info is present.

    5. Iftext languageequals the 'und' BCP 47 value, then assign an empty string totext language.

    6. Lettext labelbe a label specified in the initialization segment for this track or an empty string if no label info is present.

    7. Lettext kindsbe a sequence of kind strings specified in the                    initialization segment for this track or a sequence with a single empty string element in it if no kind information is provided.

    8. For each value intext kinds, run the following steps:

    9. Create a new track buffer to store coded frames for this track.

    10. Add the track description for this track to the track buffer.

    11. Letcurrent text kindequal the value fromtext kindsfor this iteration of the loop.

    12. Letnew text trackbe a new TextTrack object.

    13. Generate a unique ID and assign it to the id property onnew text track.

    14. Assigntext languageto the language property onnew text track.

    15. Assigntext labelto the label property onnew text track.

    16. Assigncurrent text kindto the kind property onnew text track.

    17. Populate the remaining properties onnew text trackwith the appropriate information from the initialization segment.

    18. If the mode property onnew text trackequals "showing" or                        "hidden", then setactive track flagto true.

    19. Addnew text trackto the textTracks attribute on this SourceBuffer object.

      Note

      This should trigger TextTrackList [HTML51] logic to queue a task to fire a trusted event named addtrack, that does not bubble and is not cancelable, and that uses the TrackEvent interface, with the track attribute initialized tonew text track, at the TextTrackList object referenced by the textTracks attribute on this SourceBuffer object.

    20. Addnew text trackto the textTracks attribute on the HTMLMediaElement.

      Note

      This should trigger TextTrackList [HTML51] logic to queue a task to fire a trusted event named addtrack, that does not bubble and is not cancelable, and that uses the TrackEvent interface, with the track attribute initialized tonew text track, at the TextTrackList object referenced by the textTracks attribute on the HTMLMediaElement.

    21. Letvideo byte stream track IDbe the                    Track ID for the current track being processed.

    22. Letvideo languagebe a BCP 47 language tag for the language specified in the initialization segment for this track or an empty string if no language info is present.

    23. Ifvideo languageequals the 'und' BCP 47 value, then assign an empty string tovideo language.

    24. Letvideo labelbe a label specified in the initialization segment for this track or an empty string if no label info is present.

    25. Letvideo kindsbe a sequence of kind strings specified in the                    initialization segment for this track or a sequence with a single empty string element in it if no kind information is provided.

    26. For each value invideo kinds, run the following steps:

    27. Create a new track buffer to store coded frames for this track.

    28. Add the track description for this track to the track buffer.

    29. Set the selected property onnew video trackto true.

    30. Setactive track flagto true.

    31. Letcurrent video kindequal the value fromvideo kindsfor this iteration of the loop.

    32. Letnew video trackbe a new VideoTrack object.

    33. Generate a unique ID and assign it to the id property onnew video track.

    34. Assignvideo languageto the language property onnew video track.

    35. Assignvideo labelto the label property onnew video track.

    36. Assigncurrent video kindto the kind property onnew video track.

    37. If videoTracks.length equals 0, then run the following steps:

    38. Addnew video trackto the videoTracks attribute on this SourceBuffer object.

      Note

      This should trigger VideoTrackList [HTML51] logic to queue a task to fire a trusted event named addtrack, that does not bubble and is not cancelable, and that uses the TrackEvent interface, with the track attribute initialized tonew video track, at the VideoTrackList object referenced by the videoTracks attribute on this SourceBuffer object.

    39. Addnew video trackto the videoTracks attribute on the HTMLMediaElement.

      Note

      This should trigger VideoTrackList [HTML51] logic to queue a task to fire a trusted event named addtrack, that does not bubble and is not cancelable, and that uses the TrackEvent interface, with the track attribute initialized tonew video track, at the VideoTrackList object referenced by the videoTracks attribute on the HTMLMediaElement.

    40. Letaudio byte stream track IDbe the                    Track ID for the current track being processed.

    41. Letaudio languagebe a BCP 47 language tag for the language specified in the initialization segment for this track or an empty string if no language info is present.

    42. Ifaudio languageequals the 'und' BCP 47 value, then assign an empty string toaudio language.

    43. Letaudio labelbe a label specified in the initialization segment for this track or an empty string if no label info is present.

    44. Letaudio kindsbe a sequence of kind strings specified in the                    initialization segment for this track or a sequence with a single empty string element in it if no kind information is provided.

    45. For each value inaudio kinds, run the following steps:

    46. Create a new track buffer to store coded frames for this track.

    47. Add the track description for this track to the track buffer.

    48. Set the enabled property onnew audio trackto true.

    49. Setactive track flagto true.

    50. Letcurrent audio kindequal the value fromaudio kindsfor this iteration of the loop.

    51. Letnew audio trackbe a new AudioTrack object.

    52. Generate a unique ID and assign it to the id property onnew audio track.

    53. Assignaudio languageto the language property onnew audio track.

    54. Assignaudio labelto the label property onnew audio track.

    55. Assigncurrent audio kindto the kind property onnew audio track.

    56. If audioTracks.length equals 0, then run the following steps:

    57. Addnew audio trackto the audioTracks attribute on this SourceBuffer object.

      Note

      This should trigger AudioTrackList [HTML51] logic to queue a task to fire a trusted event named addtrack, that does not bubble and is not cancelable, and that uses the TrackEvent interface, with the track attribute initialized tonew audio track, at the AudioTrackList object referenced by the audioTracks attribute on this SourceBuffer object.

    58. Addnew audio trackto the audioTracks attribute on the HTMLMediaElement.

      Note

      This should trigger AudioTrackList [HTML51] logic to queue a task to fire a trusted event named addtrack, that does not bubble and is not cancelable, and that uses the TrackEvent interface, with the track attribute initialized tonew audio track, at the AudioTrackList object referenced by the audioTracks attribute on the HTMLMediaElement.

    59. If the initialization segment contains tracks with codecs the user agent does not support, then run the append error algorithm and abort these steps.

      Note

      User agents MAY consider codecs, that would otherwise be supported, as "not supported" here if the codecs were not specified in thetypeparameter passed to addSourceBuffer().
      For example, MediaSource.isTypeSupported('video/webm;codecs="vp8,vorbis"') may return true, but if                    addSourceBuffer() was called with 'video/webm;codecs="vp8"' and a Vorbis track appears in the                    initialization segment, then the user agent MAY use this step to trigger a decode error.

    60. For each audio track in the initialization segment, run following steps:

    61. For each video track in the initialization segment, run following steps:

    62. For each text track in the initialization segment, run following steps:

    63. Ifactive track flagequals true, then run the following steps:

    64. Setfirst initialization segment received flagto true.

  6. If the HTMLMediaElement.readyState attribute is HAVE_NOTHING, then run the following steps:

    1. If one or more objects in sourceBuffers havefirst initialization segment received flagset to false, then abort these steps.

    2. Set the HTMLMediaElement.readyState attribute to HAVE_METADATA.

      Note

      Per HTMLMediaElement ready states [HTML51] logic, HTMLMediaElement.readyState changes may trigger events on the HTMLMediaElement. This particular transition should trigger HTMLMediaElement logic to queue a task to fire a simple event named loadedmetadata at the media element.

  7. If theactive track flagequals true and the HTMLMediaElement.readyState attribute is greater than            HAVE_CURRENT_DATA, then set the HTMLMediaElement.readyState attribute to HAVE_METADATA.

    Note

    Per HTMLMediaElement ready states [HTML51] logic, HTMLMediaElement.readyState changes may trigger events on the HTMLMediaElement.

3.5.8 Coded Frame Processing
       

When complete coded frames have been parsed by the segment parser loop then the following steps are run:

  1. For each coded frame in the media segment run the following steps:

    • Ifspliced audio frameis set:

    • Addspliced audio frameto thetrack buffer.

    • Ifspliced timed text frameis set:

    • Addspliced timed text frameto thetrack buffer.

    • Otherwise:

    • Add the coded frame with thepresentation timestamp,decode timestamp, andframe durationto thetrack buffer.

    1. Letoverlapped framebe the coded frame intrack bufferthat matches the condition above.

    2. Letremove window timestampequal theoverlapped framepresentation timestamp plus 1 microsecond.

    3. If thepresentation timestampis less than theremove window timestamp, then removeoverlapped framefromtrack buffer.

      Note

      This is to compensate for minor errors
      in frame timestamp computations that can appear when converting back and
      forth between double precision floating point numbers and rationals.
      This tolerance allows a frame to replace an existing one as long as it
      is within 1 microsecond of the existing frame's start time. Frames that
      come slightly before an existing frame are handled by the removal step
      below.

    4. If the coded frame is not a random access point, then drop the coded frame and jump to the top of the loop to start processing the next coded frame.

    5. Set theneed random access point flagontrack bufferto false.

    6. Unset thelast decode timestampon all track buffers.

    7. Unset thelast frame durationon all track buffers.

    8. Unset thehighest end timestampon all track buffers.

    9. Set theneed random access point flagon all track buffers to true.

    10. Jump to the Loop Top step above to restart processing of the current coded frame.

    11. Add timestampOffset to thepresentation timestamp.

    12. Add timestampOffset to thedecode timestamp.

    13. Set timestampOffset equal togroup start timestamppresentation timestamp.

    14. Setgroup end timestampequal togroup start timestamp.

    15. Set theneed random access point flagon all track buffers to true.

    16. Unsetgroup start timestamp.

    17. Letpresentation timestampequal 0.

    18. Letdecode timestampequal 0.

    19. Letpresentation timestampbe a double precision floating point representation of the coded frame's presentation timestamp in seconds.

      Note

      Special processing may be needed
      to determine the presentation and decode timestamps for timed text
      frames since this information may not be explicitly present in the
      underlying format or may be dependent on the order of the frames. Some
      metadata text tracks, like MPEG2-TS PSI data, may only have implied
      timestamps. Format specific rules for these situations SHOULD be in the byte stream format specifications or in separate extension specifications.

    20. Letdecode timestampbe a double precision floating point representation of the coded frame's decode timestamp in seconds.

      Note

      Implementations don't have to
      internally store timestamps in a double precision floating point
      representation. This representation is used here because it is the
      represention for timestamps in the HTML spec. The intention here is to
      make the behavior clear without adding unnecessary complexity to the
      algorithm to deal with the fact that adding a timestampOffset may cause a
      timestamp rollover in the underlying timestamp representation used by
      the byte stream format. Implementations can use any internal timestamp
      representation they wish, but the addition of timestampOffset SHOULD behave in a similar manner to what would happen if a double precision floating point representation was used.

    21. Loop Top:

    22. Letframe durationbe a double precision floating point representation of the coded frame's duration in seconds.

    23. If mode equals "sequence" andgroup start timestampis set, then run the following steps:

    24. If timestampOffset is not 0, then run the following steps:

    25. Lettrack bufferequal the track buffer that the coded frame will be added to.

    26. Letframe end timestampequal the sum ofpresentation timestampandframe duration.

    27. Ifpresentation timestampis less than appendWindowStart, then set theneed random access point flagto true, drop the coded frame, and jump to the top of the loop to start processing the next coded frame.

      Note

      Some implementations MAY choose to collect some of these coded frames withpresentation timestampless than appendWindowStart and use them to generate a splice at the first coded frame that has a presentation timestamp greater than or equal to appendWindowStart even if that frame is not a random access point.
      Supporting this requires multiple decoders or faster than real-time
      decoding so for now this behavior will not be a normative requirement.

    28. Ifframe end timestampis greater than appendWindowEnd, then set theneed random access point flagto true, drop the coded frame, and jump to the top of the loop to start processing the next coded frame.

      Note

      Some implementations MAY choose to collect coded frames withpresentation timestampless than appendWindowEnd andframe end timestampgreater than appendWindowEnd
      and use them to generate a splice across the portion of the collected
      coded frames within the append window at time of collection, and the
      beginning portion of later processed frames which only partially overlap
      the end of the collected coded frames. Supporting this requires
      multiple decoders or faster than real-time decoding so for now this
      behavior will not be a normative requirement. In conjunction with
      collecting coded frames that span appendWindowStart, implementations MAY thus support gapless audio splicing.

    29. If theneed random access point flagontrack bufferequals true, then run the following steps:

    30. Letspliced audio framebe an unset variable for holding audio splice information

    31. Letspliced timed text framebe an unset variable for holding timed text splice information

    32. Iflast decode timestampfortrack bufferis unset andpresentation timestampfalls within the presentation interval of a coded frame intrack buffer, then run the following steps:

    33. Remove existing coded frames intrack buffer:

    34. Remove all possible decoding dependencies on the coded frames removed in the previous two steps by removing all coded frames fromtrack bufferbetween those frames removed in the previous two steps and the next                random access point after those removed frames.

      Note

      Removing all coded frames until the next random access point
      is a conservative estimate of the decoding dependencies since it
      assumes all frames between the removed frames and the next random access
      point depended on the frames that were removed.

    35. Setlast decode timestampfortrack buffertodecode timestamp.

    36. Setlast frame durationfortrack buffertoframe duration.

    37. Ifhighest end timestampfortrack bufferis unset orframe end timestampis greater thanhighest end timestamp, then sethighest end timestampfortrack buffertoframe end timestamp.

      Note

      The greater than check is needed because bidirectional prediction between coded frames can causepresentation timestampto not be monotonically increasing even though the decode timestamps are monotonically increasing.

    38. Ifframe end timestampis greater thangroup end timestamp, then setgroup end timestampequal toframe end timestamp.

    39. Ifgenerate timestamps flagequals true, then set                timestampOffset equal toframe end timestamp.

  2. If the HTMLMediaElement.readyState attribute is HAVE_METADATA and the new coded frames cause HTMLMediaElement.buffered to have a TimeRange for the current playback position, then set the HTMLMediaElement.readyState attribute to HAVE_CURRENT_DATA.

    Note

    Per HTMLMediaElement ready states [HTML51] logic, HTMLMediaElement.readyState changes may trigger events on the HTMLMediaElement.

  3. If the HTMLMediaElement.readyState attribute is HAVE_CURRENT_DATA and the new coded frames cause HTMLMediaElement.buffered to have a TimeRange that includes the current playback position and some time beyond the current playback position, then set the HTMLMediaElement.readyState attribute to HAVE_FUTURE_DATA.

    Note

    Per HTMLMediaElement ready states [HTML51] logic, HTMLMediaElement.readyState changes may trigger events on the HTMLMediaElement.

  4. If the HTMLMediaElement.readyState attribute is HAVE_FUTURE_DATA and the new coded frames cause HTMLMediaElement.buffered to have a TimeRange that includes the current playback position and enough data to ensure uninterrupted playback, then set the HTMLMediaElement.readyState attribute to HAVE_ENOUGH_DATA.

    Note

    Per HTMLMediaElement ready states [HTML51] logic, HTMLMediaElement.readyState changes may trigger events on the HTMLMediaElement.

  5. If the media segment contains data beyond the current duration, then run the duration change algorithm withnew durationset to the maximum of the current duration and thegroup end timestamp.

3.5.9 Coded Frame Removal Algorithm
       

Follow these steps when coded frames for a specific time range need to be removed from the SourceBuffer:

  1. Letstartbe the starting presentation timestamp for the removal range.

  2. Letendbe the end presentation timestamp for the removal range.

  3. For each track buffer in this source buffer, run the following steps:

    1. For each removed frame, if the frame has a decode timestamp equal to thelast decode timestampfor the frame's track, run the following steps:

    2. Unset thelast decode timestampon all track buffers.

    3. Unset thelast frame durationon all track buffers.

    4. Unset thehighest end timestampon all track buffers.

    5. Set theneed random access point flagon all track buffers to true.

    6. Letremove end timestampbe the current value of duration

    7. If this track buffer has a random access point timestamp that is greater than or equal toend, then updateremove end timestampto that random access point timestamp.

      Note

      Random access point timestamps can be different across tracks because the dependencies between coded frames within a track are usually different than the dependencies in another track.

    8. Remove all media data, from this track buffer, that contain starting timestamps greater than or equal tostartand less than theremove end timestamp.

    9. Remove all possible decoding dependencies on the coded frames removed in the previous step by removing all coded frames from this track buffer between those frames removed in the previous step and the next                random access point after those removed frames.

      Note

      Removing all coded frames until the next random access point
      is a conservative estimate of the decoding dependencies since it
      assumes all frames between the removed frames and the next random access
      point depended on the frames that were removed.

    10. If this object is in activeSourceBuffers, the current playback position is greater than or equal tostartand less than theremove end timestamp, and HTMLMediaElement.readyState is greater than                  HAVE_METADATA, then set the HTMLMediaElement.readyState attribute to HAVE_METADATA and stall playback.

      Note

      Per HTMLMediaElement ready states [HTML51] logic, HTMLMediaElement.readyState changes may trigger events on the HTMLMediaElement.

      Note

      This transition occurs because media data
      for the current position has been removed. Playback cannot progress
      until media for the                    current playback position is appended or the selected/enabled tracks change.

  4. Ifbuffer full flagequals true and this object is ready to accept more bytes, then set thebuffer full flagto false.

3.5.10 Coded Frame Eviction Algorithm
       

This algorithm is run to free up space in this source buffer when new data is appended.

  1. Letnew dataequal the data that is about to be appended to this SourceBuffer.

  2. If thebuffer full flagequals false, then abort these steps.

  3. Letremoval rangesequal a list of presentation time ranges that can be evicted from the presentation to make room for thenew data.

    Note

    Implementations MAY use different methods for selectingremoval rangesso web applications SHOULD NOT depend on a specific behavior. The web application can use the buffered attribute to observe whether portions of the buffered data have been evicted.

  4. For each range inremoval ranges, run the coded frame removal algorithm withstartandendequal to the removal range start and end timestamp respectively.

3.5.11 Audio Splice Frame Algorithm
       

Follow these steps when the coded frame processing algorithm needs to generate a splice frame for two overlapping audio          coded frames:

  1. Lettrack bufferbe the track buffer that will contain the splice.

  2. Letnew coded framebe the new coded frame, that is being added totrack buffer, which triggered the need for a splice.

  3. Letpresentation timestampbe the presentation timestamp fornew coded frame

  4. Letdecode timestampbe the decode timestamp fornew coded frame.

  5. Letframe durationbe the coded frame duration ofnew coded frame.

  6. Letoverlapped framebe the coded frame intrack bufferwith a presentation interval that containspresentation timestamp.

  7. Updatepresentation timestampanddecode timestampto the nearest audio sample timestamp based on sample rate of the audio inoverlapped frame. If a timestamp is equidistant from both audio sample timestamps, then use the higher timestamp (e.g.,            floor(x * sample_rate + 0.5) / sample_rate).

    Note

    For example, given the following values:

    presentation timestampanddecode timestampare updated to 10.0125 since 10.01255 is closer to 10 + 100/8000 (10.0125) than 10 + 101/8000 (10.012625)

    • The presentation timestamp ofoverlapped frameequals 10.

    • The sample rate ofoverlapped frameequals 8000 Hz

    • presentation timestampequals 10.01255

    • decode timestampequals 10.01255

  8. If the user agent does not support crossfading then run the following steps:

    1. Removeoverlapped framefromtrack buffer.

    2. Add a silence frame totrack bufferwith the following properties:

      Note

      Some implementations MAY apply fades to/from silence to coded frames on either side of the inserted silence to make the transition less jarring.

    3. Return to caller without providing a splice frame.

      Note

      This is intended to allownew coded frameto be added to thetrack bufferas ifoverlapped framehad not been in thetrack bufferto begin with.

  9. Letframe end timestampequal the sum ofpresentation timestampandframe duration.

  10. Letsplice end timestampequal the sum ofpresentation timestampand the splice duration of 5 milliseconds.

  11. Letfade out coded framesequaloverlapped frameas well as any additional frames intrack bufferthat have a presentation timestamp greater thanpresentation timestampand less thansplice end timestamp.

  12. Remove all the frames included infade out coded framesfromtrack buffer.

  13. Return a splice frame with the following properties:

    Note

    See the audio splice rendering algorithm for details on how this splice frame is rendered.

3.5.12 Audio Splice Rendering Algorithm
       

The following steps are run when a spliced frame, generated by the audio splice frame algorithm, needs to be rendered by the media element:

  1. Letfade out coded framesbe the coded frames that are faded out during the splice.

  2. Letfade in coded framesbe the coded frames that are faded in during the splice.

  3. Letpresentation timestampbe the presentation timestamp of the first coded frame infade out coded frames.

  4. Letend timestampbe the sum of the presentation timestamp and the coded frame duration of the last frame infade in coded frames.

  5. Letsplice timestampbe the presentation timestamp where the splice starts. This corresponds with the presentation timestamp of the first frame infade in coded frames.

  6. Letsplice end timestampequalsplice timestampplus five milliseconds.

  7. Letfade out samplesbe the samples generated by decodingfade out coded frames.

  8. Trimfade out samplesso that it only contains samples betweenpresentation timestampandsplice end timestamp.

  9. Letfade in samplesbe the samples generated by decodingfade in coded frames.

  10. Iffade out samplesandfade in samplesdo not have a common sample rate and channel layout, then convertfade out samplesandfade in samplesto a common sample rate and channel layout.

  11. Letoutput samplesbe a buffer to hold the output samples.

  12. Apply a linear gain fade out with a starting gain of 1 and an ending gain of 0 to the samples betweensplice timestampandsplice end timestampinfade out samples.

  13. Apply a linear gain fade in with a starting gain of 0 and an ending gain of 1 to the samples betweensplice timestampandsplice end timestampinfade in samples.

  14. Copy samples betweenpresentation timestamptosplice timestampfromfade out samplesintooutput samples.

  15. For each sample betweensplice timestampandsplice end timestamp, compute the sum of a sample fromfade out samplesand the corresponding sample infade in samplesand store the result inoutput samples.

  16. Copy samples betweensplice end timestamptoend timestampfromfade in samplesintooutput samples.

  17. Renderoutput samples.

Note

Here is a graphical representation of this algorithm.

Audio splice diagram
         

3.5.13 Text Splice Frame Algorithm
       

Follow these steps when the coded frame processing algorithm needs to generate a splice frame for two overlapping timed text          coded frames:

  1. Lettrack bufferbe the track buffer that will contain the splice.

  2. Letnew coded framebe the new coded frame, that is being added totrack buffer, which triggered the need for a splice.

  3. Letpresentation timestampbe the presentation timestamp fornew coded frame

  4. Letdecode timestampbe the decode timestamp fornew coded frame.

  5. Letframe durationbe the coded frame duration ofnew coded frame.

  6. Letframe end timestampequal the sum ofpresentation timestampandframe duration.

  7. Letfirst overlapped framebe the coded frame intrack bufferwith a presentation interval that containspresentation timestamp.

  8. Letoverlapped presentation timestampbe the presentation timestamp of thefirst overlapped frame.

  9. Letoverlapped framesequalfirst overlapped frameas well as any additional frames intrack bufferthat have a presentation timestamp greater thanpresentation timestampand less thanframe end timestamp.

  10. Remove all the frames included inoverlapped framesfromtrack buffer.

  11. Update the coded frame duration of thefirst overlapped frametopresentation timestampoverlapped presentation timestamp.

  12. Addfirst overlapped frameto thetrack buffer.

  13. Return to caller without providing a splice frame.

    Note

    This is intended to allownew coded frameto be added to thetrack bufferas if it hadn't overlapped any frames intrack bufferto begin with.

4. SourceBufferList Object
   

SourceBufferList is a simple container object for SourceBuffer objects. It provides read-only array access and fires events when the list is modified.

interface SourceBufferList : EventTarget {    readonly attribute unsigned long length;             attribute EventHandler  onaddsourcebuffer;             attribute EventHandler  onremovesourcebuffer;    getter SourceBuffer (unsigned long index);};

   

4.1 Attributes
     

4.2 Methods
     

  • getter

  • Allows the SourceBuffer objects in the list to be accessed with an array operator (i.e., []).

    Parameter Type Nullable Optional Description
    index unsigned long

    Return type: SourceBuffer

    When this method is invoked, the user agent must run the following steps:

    1. Ifindexis greater than or equal to the length attribute then return undefined and abort these steps.

    2. Return theindex'th SourceBuffer object in the list.

4.3 Event Summary
     

Event name Interface Dispatched when…
addsourcebuffer Event When a SourceBuffer is added to the list.
removesourcebuffer Event When a SourceBuffer is removed from the list.

5. URL Object Extensions
   

This section specifies extensions to the URL[FILE-API] object definition.

[Exposed=Window]
partial interface URL {    static DOMString createObjectURL(MediaSource mediaSource);};

   

5.1 Methods
     

  • createObjectURL, static

  • Creates URLs for MediaSource objects.

    Note

    This algorithm is intended to mirror the behavior of the createObjectURL()[FILE-API] method, which does not auto-revoke the created URL. Web authors are encouraged to use revokeObjectURL()[FILE-API] for any MediaSource object URL that is no longer needed for attachment to a media element.

    Parameter Type Nullable Optional Description
    mediaSource MediaSource

    Return type: DOMString

    When this method is invoked, the user agent must run the following steps:

    1. Return a unique MediaSource object URL that can be used to dereference themediaSourceargument.

6. HTMLMediaElement Extensions
   

This section specifies what existing attributes on the HTMLMediaElement MUST return when a MediaSource is attached to the element.

The HTMLMediaElement.seekable attribute returns a new static normalized TimeRanges object created based on the following steps:

The HTMLMediaElement.buffered attribute returns a static normalized TimeRanges object based on the following steps.

  1. Letintersection rangesequal an empty TimeRanges object.

  2. If activeSourceBuffers.length does not equal 0 then run the following steps:

    1. Letsource rangesequal the ranges returned by the buffered attribute on the current SourceBuffer.

    2. If readyState is "ended", then set the end time on the last range insource rangestohighest end time.

    3. Letnew intersection rangesequal the intersection between theintersection rangesand thesource ranges.

    4. Replace the ranges inintersection rangeswith thenew intersection ranges.

    5. Letactive rangesbe the ranges returned by buffered for each SourceBuffer object in activeSourceBuffers.

    6. Lethighest end timebe the largest range end time in theactive ranges.

    7. Letintersection rangesequal a TimeRange object containing a single range from 0 tohighest end time.

    8. For each SourceBuffer object in activeSourceBuffers run the following steps:

  3. If the current value of this attribute has not been set by this algorithm orintersection rangesdoes not contain the exact same
    range information as the current value of this attribute, then update
    the current value of this attribute tointersection ranges.

  4. Return the current value of this attribute.

7. AudioTrack Extensions
   

This section specifies extensions to the HTML AudioTrack definition.

partial interface AudioTrack {    readonly attribute SourceBuffer? sourceBuffer;};

     

Attributes

8. VideoTrack Extensions
   

This section specifies extensions to the HTML VideoTrack definition.

partial interface VideoTrack {    readonly attribute SourceBuffer? sourceBuffer;};

     

Attributes

9. TextTrack Extensions
   

This section specifies extensions to the HTML TextTrack definition.

partial interface TextTrack {    readonly attribute SourceBuffer? sourceBuffer;};

     

Attributes

10. Byte Stream Formats
   

The bytes provided through appendBuffer() for a SourceBuffer form a logical byte stream. The format and semantics of these byte streams are defined inbyte stream format specifications. The byte stream format registry [MSE-REGISTRY] provides mappings between a MIME type that may be passed to addSourceBuffer() or      isTypeSupported() and the byte stream format expected by a SourceBuffer
created with that MIME type. Implementations are encouraged to register
mappings for byte stream formats they support to facilitate
interoperability. The byte stream format registry [MSE-REGISTRY]
is the authoritative source for these mappings. If an implementation
claims to support a MIME type listed in the registry, its SourceBuffer implementation MUST conform to the      byte stream format specification listed in the registry entry.

Note

The byte stream format specifications in the registry
are not intended to define new storage formats. They simply outline the
subset of existing storage format structures that implementations of
this specification will accept.

Note

Byte stream format parsing and validation is implemented in the segment parser loop algorithm.

This section provides general requirements for all byte stream format specifications:

  • A byte stream format specification MUST define initialization segments and media segments.

  • A byte stream format SHOULD provide references for sourcing AudioTrack, VideoTrack, and TextTrack attribute values from data in initialization segments.

    Note

    If the byte stream format covers a format similar to one covered in the in-band tracks spec [INBANDTRACKS], then it SHOULD
    try to use the same attribute mappings so that Media Source Extensions
    playback and non-Media Source Extensions playback provide the same track
    information.

  • It MUST be possible to identify segment boundaries and segment type (initialization or media) by examining the byte stream alone.

  • The user agent MUST run the append error algorithm when any of the following conditions are met:

    1. The number and type of tracks are not consistent.

      Note

      For example, if the first initialization segment has 2 audio tracks and 1 video track, then all initialization segments that follow it in the byte stream MUST describe 2 audio tracks and 1 video track.

    2. Track IDs are not the same across initialization segments, for segments describing multiple tracks of a single type (e.g., 2 audio tracks).

    3. Codecs changes across initialization segments.

      Note

      For example, a byte stream that starts with an initialization segment that specifies a single AAC track and later contains an initialization segment that specifies a single AMR-WB track is not allowed. Support for multiple codecs is handled with multiple SourceBuffer objects.

  • The user agent MUST support the following:

    1. Track IDs changing across initialization segments if the segments describes only one track of each type.

    2. Video frame size changes. The user agent MUST support seamless playback.

      Note

      This will cause the <video> display
      region to change size if the web application does not use CSS or HTML
      attributes (width/height) to constrain the element size.

    3. Audio channel count changes. The user agent MAY support this seamlessly and could trigger downmixing.

      Note

      This is a quality of implementation issue
      because changing the channel count may require reinitializing the audio
      device, resamplers, and channel mixers which tends to be audible.

  • The following rules apply to all media segments within a byte stream. A user agent MUST:

    1. Map all timestamps to the same media timeline.

    2. Support seamless playback of media segments having a timestamp gap smaller than the audio frame size. User agents MUST NOT reflect these gaps in the buffered attribute.

      Note

      This is intended to simplify switching between
      audio streams where the frame boundaries don't always line up across
      encodings (e.g., Vorbis).

  • The user agent MUST run the append error algorithm when any combination of an initialization segment and any contiguous sequence of media segments satisfies the following conditions:

    For example, if I1 is associated with M1, M2, M3 then the above MUST hold for all the combinations I1+M1, I1+M2, I1+M1+M2, I1+M2+M3, etc.

    • Information that determines the intrinsic width and height of the video (specifically, this requires either the picture or pixel aspect ratio, together with the encoded resolution).

    • Information necessary to convert the video decoder output to a format suitable for display

    1. The number and type (audio, video, text, etc.) of all tracks in the media segments are not identified.

    2. The decoding capabilities needed to decode each track (i.e., codec and codec parameters) are not provided.

    3. Encryption parameters necessary to decrypt the content
      (except the encryption key itself) are not provided for all encrypted
      tracks.

    4. All information necessary to decode and render the earliest random access point in the sequence of media segments and all subsequence samples in the sequence (in presentation time) are not provided. This includes in particular,

    5. Information necessary to compute the global presentation timestamp of every sample in the sequence of media segments is not provided.

Byte stream specifications MUST at a minimum define constraints which ensure that the above requirements hold. Additional constraints MAY be defined, for example to simplify implementation.

11. Conformance
   

As well as sections marked as non-normative, all authoring
guidelines, diagrams, examples, and notes in this specification are
non-normative. Everything else in this specification is normative.

The key words MAY, MUST, MUST NOT, SHOULD, and SHOULD NOT are to be interpreted as described in [RFC2119].

12. Examples
   

Example use of the Media Source Extensions

<script>
  function onSourceOpen(videoTag, e) {    var mediaSource = e.target;    if (mediaSource.sourceBuffers.length > 0)        return;    var sourceBuffer = mediaSource.addSourceBuffer('video/webm; codecs="vorbis,vp8"');

    videoTag.addEventListener('seeking', onSeeking.bind(videoTag, mediaSource));
    videoTag.addEventListener('progress', onProgress.bind(videoTag, mediaSource));    var initSegment = GetInitializationSegment();    if (initSegment == null) {      // Error fetching the initialization segment. Signal end of stream with an error.
      mediaSource.endOfStream("network");      return;
    }    // Append the initialization segment.
    var firstAppendHandler = function(e) {      var sourceBuffer = e.target;
      sourceBuffer.removeEventListener('updateend', firstAppendHandler);      // Append some initial media data.
      appendNextMediaSegment(mediaSource);
    };
    sourceBuffer.addEventListener('updateend', firstAppendHandler);
    sourceBuffer.appendBuffer(initSegment);
  }  function appendNextMediaSegment(mediaSource) {    if (mediaSource.readyState == "closed")      return;    // If we have run out of stream data, then signal end of stream.
    if (!HaveMoreMediaSegments()) {
      mediaSource.endOfStream();      return;
    }    // Make sure the previous append is not still pending.
    if (mediaSource.sourceBuffers[0].updating)        return;    var mediaSegment = GetNextMediaSegment();    if (!mediaSegment) {      // Error fetching the next media segment.
      mediaSource.endOfStream("network");      return;
    }    // NOTE: If mediaSource.readyState == “ended”, this appendBuffer() call will
    // cause mediaSource.readyState to transition to "open". The web application
    // should be prepared to handle multiple “sourceopen” events.
    mediaSource.sourceBuffers[0].appendBuffer(mediaSegment);
  }  function onSeeking(mediaSource, e) {    var video = e.target;    if (mediaSource.readyState == "open") {      // Abort current segment append.
      mediaSource.sourceBuffers[0].abort();
    }    // Notify the media segment loading code to start fetching data at the
    // new playback position.
    SeekToMediaSegmentAt(video.currentTime);    // Append a media segment from the new playback position.
    appendNextMediaSegment(mediaSource);
  }  function onProgress(mediaSource, e) {
    appendNextMediaSegment(mediaSource);
  }</script><video id="v" autoplay> </video><script>
  var video = document.getElementById('v');  var mediaSource = new MediaSource();
  mediaSource.addEventListener('sourceopen', onSourceOpen.bind(this, video));
  video.src = window.URL.createObjectURL(mediaSource);</script>

     

13. Acknowledgments
   

The editors would like to thank Alex Giladi, Bob Lund, Chris Poole,
Cyril Concolato, David Dorwin, David Singer, Duncan Rowden, Frank
Galligan, Glenn Adams, Jer Noble, Joe Steele, John Simmons, Kevin
Streeter, Mark Vickers, Matt Ward, Matthew Gregan, Michael Thornburgh,
Philip Jägenstedt, Pierre Lemieux, Ralph Giles, Steven Robertson, and
Tatsuya Igarashi for their contributions to this specification.

A. VideoPlaybackQuality
   

This section is non-normative.

The video playback quality metrics described in previous revisions of this specification (e.g., sections 5 and 10 of the Candidate Recommendation) are now being developed as part of [MEDIA-PLAYBACK-QUALITY]. Some implementations may have implemented the earlier draft VideoPlaybackQuality object and the HTMLVideoElement extension method getVideoPlaybackQuality() described in those previous revisions.

B. References
   

B.1 Normative references
     

B.2 Informative references
     



0 条评论

切换注册

登录

忘记密码 ?

您也可以使用第三方帐号快捷登录

Q Q 登 录
微 博 登 录
切换登录

注册