5. RTP Media API
The RTP media API lets a web application send and receive MediaStreamTrack
s over a peer-to-peer connection. Tracks, when added to an RTCPeerConnection
, result in signaling; when this signaling is forwarded to a remote peer, it causes corresponding tracks to be created on the remote side.
Note
There is not an exact 1:1 correspondence between tracks sent by one RTCPeerConnection
and received by the other. For one, IDs of tracks sent have no mapping to the IDs of tracks received. Also, replaceTrack
changes the track sent by an RTCRtpSender
without creating a new track on the receiver side; the corresponding RTCRtpReceiver
will only have a single track, potentially representing multiple sources of media stitched together. Both addTransceiver
and replaceTrack
can be used to cause the same track to be sent multiple times, which will be observed on the receiver side as multiple receivers each with its own separate track. Thus it’s more accurate to think of a 1:1 relationship between an RTCRtpSender
on one side and an RTCRtpReceiver
‘s track on the other side, matching senders and receivers using the RTCRtpTransceiver
‘s mid
if necessary.
When sending media, the sender may need to rescale or resample the media to meet various requirements including the envelope negotiated by SDP.
Following the rules in [RFC8829] (section 3.6.), the video MAY be downscaled in order to fit the SDP constraints. The media MUST NOT be upscaled to create fake data that did not occur in the input source, the media MUST NOT be cropped except as needed to satisfy constraints on pixel counts, and the aspect ratio MUST NOT be changed.
Note
The WebRTC Working Group is seeking implementation feedback on the need and timeline for a more complex handling of this situation. Some possible designs have been discussed in GitHub issue 1283.
When video is rescaled, for example for certain combinations of width or height and scaleResolutionDownBy
values, situations when the resulting width or height is not an integer may occur. In such situations the user agent MUST use the integer part of the result. What to transmit if the integer part of the scaled width or height is zero is implementation-specific.
The actual encoding and transmission of MediaStreamTrack
s is managed through objects called RTCRtpSender
s. Similarly, the reception and decoding of MediaStreamTrack
s is managed through objects called RTCRtpReceiver
s. Each RTCRtpSender
is associated with at most one track, and each track to be received is associated with exactly one RTCRtpReceiver
.
The encoding and transmission of each MediaStreamTrack
SHOULD be made such that its characteristics (width
, height
and frameRate
for video tracks; sampleSize
, sampleRate
and channelCount
for audio tracks) are to a reasonable degree retained by the track created on the remote side. There are situations when this does not apply, there may for example be resource constraints at either endpoint or in the network or there may be RTCRtpSender
settings applied that instruct the implementation to act differently.
An RTCPeerConnection
object contains a set of RTCRtpTransceiver
s, representing the paired senders and receivers with some shared state. This set is initialized to the empty set when the RTCPeerConnection
object is created. RTCRtpSender
s and RTCRtpReceiver
s are always created at the same time as an RTCRtpTransceiver
, which they will remain attached to for their lifetime. RTCRtpTransceiver
s are created implicitly when the application attaches a MediaStreamTrack
to an RTCPeerConnection
via the addTrack
()
method, or explicitly when the application uses the addTransceiver
method. They are also created when a remote description is applied that includes a new media description. Additionally, when a remote description is applied that indicates the remote endpoint has media to send, the relevant MediaStreamTrack
and RTCRtpReceiver
are surfaced to the application via the track
event.
In order for an RTCRtpTransceiver
to send and/or receive media with another endpoint this must be negotiated with SDP such that both endpoints have an RTCRtpTransceiver
object that is associated with the same media description.
When creating an offer, enough media descriptions will be generated to cover all transceivers on that end. When this offer is set as the local description, any disassociated transceivers get associated with media descriptions in the offer.
When an offer is set as the remote description, any media descriptions in it not yet associated with a transceiver get associated with a new or existing transceiver. In this case, only disassociated transceivers that were created via the addTrack
()
method may be associated. Disassociated transceivers created via the addTransceiver
()
method, however, won’t get associated even if media descriptions are available in the remote offer. Instead, new transceivers will be created and associated if there aren’t enough addTrack
()
-created transceivers. This sets addTrack
()
-created and addTransceiver
()
-created transceivers apart in a critical way that is not observable from inspecting their attributes.
When creating an answer, only media media descriptions that were present in the offer may be listed in the answer. As a consequence, any transceivers that were not associated when setting the remote offer remain disassociated after setting the local answer. This can be remedied by the answerer creating a follow-up offer, initiating another offer/answer exchange, or in the case of using addTrack
()
-created transceivers, making sure that enough media descriptions are offered in the initial exchange.