Nabto Edge WebRTC Full Example

This guide dives in to more details on how to use the WebRTC demo device application. This assumes you already have the gone through the WebRTC Quick Demo tutoral. Specifically, you are expected to have the device fingerprint configured in the Nabto Cloud Console and are familiar with how the connect to the device from a client application.

Compared to the quick demo, this guide will expand on: Building the example, other options for streaming media feeds and media track negotiation.

Note: This guide explains the example application not the Nabto Edge WebRTC Device Library. All limitations described about RTSP, RTP, codecs, media track negotiations and network issues are limitation in the application, not general to the Nabto Edge WebRTC library.

Obtaining the source

The edge-device-webrtc Github repo references various 3rd party components as submodules. So to obtain the source code, you must clone the repo recursively:

git clone --recursive https://github.com/nabto/edge-device-webrtc.git

Building the example

To build the example in your own environment, you must have these tools installed:

CMake
C++ compiler
cURL library
OpenSSL library

The Dockerfile for the demo container can be used as a reference for a working linux environment.

All other dependencies are build with the example from git submodules.

The example is built using cmake from the root of this repo.

mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=`pwd`/install ..
make -j16 install

Note: On MacOS you may need to turn off sctp_werror so the full set of commands become:

mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=`pwd`/install -Dsctp_werror=OFF ..
make -j16 install

You should now be able to start the example device using a similar command to the one passed to the Docker container in the demo guide:

Note: The demo container runs an RTSP server with the demo feed, so you will NOT be able to see a video feed yet! This also assumes the IAM home directory created in the demo exists. To start on a fresh IAM state, simply remove the -H argument to make the example use your current folder as home directory.

./install/bin/edge_device_webrtc -r rtsp://127.0.0.1:8554/video -H ../webrtc-home -d <YOUR_DEVICE_ID> \
 -p <YOUR_PRODUCT_ID> -k <RAW_KEY_CREATED_ABOVE>

As the feed is not supposed to work at this time, we just want to check that the binary can be executed and the device is able to attach to the Nabto Edge Basestation. In the next section, we will go into details on how to start Media feeds for your device.

Alternative Media Feeds

The example device supports getting media feeds from both an RTSP server or from RTP feeds on UDP sockets. So far, we have only seen a 1-way video feed from the device to the client, however, the example supports both 2-way video and audio feeds.

RTSP feeds

Making the example use an RTSP server is controlled by providing a URL to the -r argument as shown in the demo. This should work with any RTSP server, as long as the proper codecs are used as described in the Media Track Negotiations section.

The implementation expects the RTSP server to only offer at most one video and at most one audio feed. If more than one feed of a given type exists, it will use a random one. With RTSP, 2-way feeds are not tested.

The device will connect to the RTSP server when a client connects to get the feeds. At this point, the device will connect to the provided URL to setup the stream. For the first client connecting to the device, it will setup an RTP video stream on UDP port 45222 with RTCP on UDP port 45223, and an RTP audio stream on UDP port 45224 with RTCP on UDP port 45225. So parallel connections do not clash, the second client connection will use UDP ports 45226-45229 (Connection n uses UDP ports 45222+4n - 45222+4n+3). These ports must be available for the connections.

The UDP socket is bound to 0.0.0.0, so it will also work with remote RTSP hosts.

RTSP virtual video source for Linux

Nabto provides a standalone RTSP demo Docker container to provide the same simulated video feed as used in the combined WebRTC demo. For the ports mentioned in the previous section to be opened to the host, the container must be started with --network host. This Docker feature is only available on Linux, also see note below.

The README in the container repo can be used to build the container. However, to start the container you should use this command:

docker run --rm -it --network host rtsp-demo-server

Like with the demo, this feed does not provide an audio feed.

Note: The RTSP demo Docker container can not be used on Windows and macOS with the WebRTC demo due to Docker network issues described above. The container runs the gst-rtsp-launch utility - you can build this locally to get around the network issues. Or use an RTP test video feed instead as outlined below.

RTP feeds

Removing the -r argument when starting the device will make it default to using RTP. This supports 2-way video as well as 2-way audio. This section shows how to start these 4 feeds using Gstreamer, but any RTP source/sink can be used as long as they use the proper UDP ports and codecs. This requires Gstreamer installed with the proper plugins.

By default, the example expects an RTP video feed on UDP port 6000, an RTP video sink on UDP port 6001, an RTP audio feed on UDP port 6002, and an RTP audio sink on UDP port 6003. These ports can be changed using the --rtp-port arg argument. The port configured by this argument will be the first port (defaulting to 6000). The 4 port numbers will always be consecutive. Examples below will use the default ports.

All examples below will produce feeds matching the codecs used by the example application. For codec related details, see the Media Track Negotiation section below.

If an RTP feed is missing (video feed, video sink, audio feed, audio sink), the example will still run just with the particular feed not working.

Start Gstreamer RTP video feed

The video feed used in RTSP can be started as an RTP feed using:

gst-launch-1.0 videotestsrc ! clockoverlay ! video/x-raw,width=640,height=480 ! videoconvert ! queue ! \
  x264enc tune=zerolatency bitrate=1000 key-int-max=30 ! video/x-h264, profile=constrained-baseline ! \
  rtph264pay pt=96 mtu=1200 ! udpsink host=127.0.0.1 port=6000

If you have a webcam available you can get a video stream as so (assuming a v4l2 video device at /dev/video0):

Note: If using 2-way video, the client will try to use the webcam. This will fail if the webcam feed is already in use by Gstreamer. For testing 2-way video it is recommended to use the videotestsrc above.

gst-launch-1.0 v4l2src device=/dev/video0 ! video/x-raw,width=640,height=480 ! videoconvert ! queue ! \
  x264enc tune=zerolatency bitrate=1000 key-int-max=30 ! video/x-h264, profile=constrained-baseline ! \
  rtph264pay pt=96 mtu=1200 ! udpsink host=127.0.0.1 port=6000

Start Gstreamer RTP video sink

Below is shown how to make Gstreamer listen for RTP video on a UDP socket and play it. To test this requires the client to send video to the device which is currently not supported in the demo client. To test this feature, use the web client accompanying this example as described below.

gst-launch-1.0 udpsrc uri=udp://127.0.0.1:6001 ! application/x-rtp, payload=96  ! rtph264depay ! \
  h264parse ! avdec_h264 ! videoconvert ! xvimagesink

Start Gstreamer RTP Audio feed

An audio feed can be started as shown below. This uses simple sine wave as source. Similarly to the video feed, Gstreamer can also use your mic as source (eg. using the pulsesrc plugin). However, when using 2-way audio, the web browser will use the mic as source so using a sine wave here will help distinguish the feeds if testing locally.

gst-launch-1.0 -v audiotestsrc wave=sine freq=220 volume=0.01 ! audioconvert ! opusenc ! \
  rtpopuspay name=pay0 pt=111 ! udpsink host=127.0.0.1 port=6002

Start Gstreamer RTP Audio sink

Below is shown how to make Gstreamer listen for RTP audio on a UDP socket, and play it on your speakers. To test this requires the client to send audio to the device. This is currently not supported in the demo clients. To test this feature, use the web client accompanying this example as described below.

gst-launch-1.0 -v udpsrc uri=udp://127.0.0.1:6003 \
  caps="application/x-rtp,media=(string)audio,clock-rate=(int)48000,encoding-name=(string)X-GST-OPUS-DRAFT-SPITTKA-00" ! \
  rtpopusdepay ! opusdec ! audioconvert ! autoaudiosink sync=false

Example Web Client

In addition to the deployed demo website, the device example comes with a simple static web client. This example is described here.

Media Track Negotiation

When adding a Media Track to a WebRTC connection, it must be negotiated to find a Media Codec both peers supports. For this, the example application defines a Media Track Negotiator interface it uses for negotiating different media codecs with the other peer. In this format, a specific Media Track Negotiator implementation only works for negotiating one specific media codec.

The Media Track Negotiator described here is part of the example application, not the Nabto Edge WebRTC Device library. For your own application, you do not need to use this format, however, it provides insights into what is required to handle media tracks through the library.

The Media Track Negotiator only implements the logic for negotiating media codecs on a particular track, it does not handle any encoding or decoding of on the tracks. This essentially boils down to ensuring the SDP strings in the WebRTC signaling are correct.

The Media Track Negotiators in the example application requires code updates to change. By default, the example uses a H264 constrained-baseline Media Track Negotiator for video (Specifically, level-asymmetry-allowed=1, packetization-mode=1, and profile-level-id=42e01f). For audio, a OPUS Media Track Negotiator is used. Additionally, Media Track Negotiators for PCMU and VP8 is available, but not currently used.

This section will go into details on how you can implement your own Media Track Negotiator for the example application. For reference, see the H264, OPUS, and PCMU examples.

The example implements Media Track Negotiators using the TrackNegotiator interface. This interface defines 5 methods to be implemented: match(), createMedia(), payloadType(), ssrc(), and direction().

Track Negotiator `match()`

The match() method is the most complicated to implement. This method is used when a client has made an offer to receive media from a track using this negotiator. This “client offers to receive media”-pattern is required by some systems (eg. Alexa). However, for systems without this requirement it is preferred to ask the device to add the track using a CoAP request and avoid the need for this method.

This method must parse the SDP string of the provided track, look through all the codecs supported by the client, and remove all SDP codecs except the one supported by this Negotiator. The resulting SDP must then be set on the track, and the payload type of chosen codec is returned. The SDP of the track must be updated before the method returns, as the WebRTC negotiation will continue immediately.

Track Negotiator `createMedia()`

The createMedia() method is the inverse of match() as this is used when the device wants to create an offer for the client. This must simply return the Media description specifying the particular codec supported by this Negotiator. The media description object returned by the method is a convenience object from the underlying WebRTC library libdatachannel, but is equivalent to the SDP string used by match(). The reason for using an SDP string in the MediaTrackPtr is simply to keep the 3rd party dependency out of the Nabto Edge Device WebRTC library interface.

Track Negotiator `payloadType()`

The payloadType() method must simply return the integer value of the payload type used by the local RTP stream. This payload type must match the one in the Gstreamer commands from the previous section if these are used as the local RTP stream source. If you have your own RTP/RTSP server, this should be whatever payload type that is using.

Track Negotiator `ssrc()`

The ssrc() method must return the SSRC to identify this source in the WebRTC connection. This value must be unique across all media sources in a particular WebRTC connection.

Track Negotiator `direction()`

This method is used to determine if the RTP client implementation needs to handle track data in both directions. In the demo, the video track is only sending from the device to the client, however, bidirectional streaming is supported, so the example Negotiators returns SEND_RECV. This makes the RTP client listen for data on the MediaTrack and forward it to the UDP socket.