My own spin of a video streaming service. The app allows a compliant client (streamer) to establish a WebRTC connection with the remote host and stream a H.264-encoded video. The video may then be shared with and watched by other remote WebRTC clients (viewers).
Some notable features:
- Establish WebRTC connections with streamer clients using the WHIP protocol
- Establish WebRTC connections with viewer clients using the WHEP protocol
- Forward streamer video packets down to respective viewer clients.
- Generate jpeg thumbnails for streamed videos.
- Server Sent Events API providing information about current streamer lobbies
This project is very much a learning exercise. I intentionally avoided using ready solutions (even though this could probably make the project easier to maintain). There are few dependencies which I don't think I could do without, and so some will just have to stay.
First thing you need is to get the server up and running. You can follow the Building & development guide for that.
The app runs two servers: one on a UDP socket (handles video-packets), the other on a TCP socket. The TCP socket has a barebones HTTP server with following resources available:
- POST
/whip- a WHIP protocol endpoint - POST
/whep- a WHEP protocol endpoint - GET
/rooms- get available rooms as a JSON. A room is basically a virtual lobby, stored in-app-memory, containing information such as the room'sidand theviewer_count. You'll need theidfor interacting with theWHEPendpoint. - GET
/notifications- a SSE endpoint. Streams rooms JSON every so often.
You'll need a WHIP-compliant client software. My personal choice is the OBS software. Assuming it's yours too, the setup is following:
- Open
Settings -> Stream - Choose
WHIPas a "Service" - Insert the WHIP endpoint as a "Server" value. The WHIP endpoint corresponds to an HTTP address constructed from
TCP_ADDRESSandTCP_PORTenvironment variables, and a/whipresource pathname. - Insert
WHIP_TOKENenvironment variable value as a "Bearer Token" value.
For any other clients just follow the WHIP endpoint specification.
Once a connection is established, the streamer should acquire it's own room. To get available rooms, check the /rooms endpoint.
First thing you need is the room's id you're interested in joining. Use the /rooms endpoint to check what's available.
A connection requires a WHEP-compliant software. My personal choice is, well, the browser. Checkout SigmaPlayer for an exemplary implementation. The WHEP endpoint expects a query parameter
target_id, which specifies a concrete room's id. If your client does not support streamer's codecs, the WHEP request will fail.
Once you establish a WebRTC connection with the remote server, you should be forwarded the streamer's video packets.
You'll need following system dependencies:
opensshopenssh-devellibsrtp
The openssh is used for establishing a DTLS connection with the remote peer. The libsrtp is used for encrypting and decrypting RTP/SRTP packets. If you're still having trouble compiling the app, file an issue (or if you feel adventurous
you may try and follow the compiler errors to figure out what dependencies are missing)
You'll need the following environment variables exported to your shell:
TCP_ADDRESSTCP_PORTUDP_ADDRESSUDP_PORTWHIP_TOKEN- A secret token used to authorize clients using theWHIProute. This token is required for all clients that wish to become streamers. This token is shared for all streamer clients.FRONTEND_URL- A URL of the frontend web-app that interacts with the HTTP server. Ideally this is the URL of a web-app that works as a streaming platform, allowing clients to become viewers. Used for CORS.STORAGE_DIR- System directory where temporary streamer video's thumbnail images will be generated and saved to. The app should have write permissions for that directory.CERTS_DIR- System directory where TLS key & certificate are stored. The files should be namedkey.pemandcert.pem. There is no good reason for this being so opinionated. These are used for establishing a DTLS connection with remote peers. You may use following command to generate needed files:openssl req -newkey rsa:2048 -new -nodes -x509 -days 3650 -keyout key.pem -out cert.pem
The STORAGE_DIR and CERTS_DIR directories need to actually exist in your file system - the app won't create them for you.
You may then compile and run the app using cargo run. If everything goes right, you should see the TCP & UDP server addresses printed out to your shell.
The build a production release, use cargo build --release.
The app implements following protocols:
Some of these are not followed "by the book", but rather by whatever felt absolutely necessary for my purposes. I do hope to make it, eventually, 100% RFC-compliant though.
The app binds to two network sockets, one UDP, one TCP. Both of these sockets are expected to be exposed to the public network. A HTTP server runs on a TCP socket. This server is responsible for opening client connections and sharing room data. The UDP server handles video & STUN packets. The high-level overview looks as follows:

When a remote client wishes to become a streamer, it sends a WHIP request containing an SDP offer. The SDP contains information like available video & audio codecs, ICE credentials, fingerprint etc.. If the server accepts the offer (e.g. the offer has all the valid codecs, demuxing audio and video), an SDP answer is sent back, indicating agreed upon codecs, ICE credentials, fingerprint and an ICE candidate. The ICE candidate is the UDP server.

An ICE protocol is now used to establish a UDP connection with the host. The host acts as an ICE-lite server, whilst the remote client acts as full ICE. The remote client will send STUN packets using a short-term credentials mechanism. The credentials are a combination of host and remote ICE-credentials exchanged just at the WHIP endpoint step. At some point, a remote client must nominate its peer. It does so by attaching a special attribute to the STUN message. When that happens, a remote client is officially recognized as a streamer by the host app. The STUN binding requests will continue for the lifetime of the connection, serving as "life-checks".

The next step is to establish a DTLS connection. A remote client will attempt to establish a DTLS connection with the host (as specified by the SDP offer/answer). A DTLS is encrypted using provided TLS keys. The fingerprint value exchanged in the SDP process presents a TLS certificate of both peers. The remote peer knows it's really the remote host when the remote host proves its certificate matches the one exchanged in the SDP. Since the SDP exchange happens at HTTPS level (production-wise), we trust the fingerprint values. Once a DTLS connection is established, we get the secret key that is used to encrypt messages sent through this channel. Note that neither the host nor remote will communicate through the DTLS. All data (STUN and video packets) are still sent through UDP. DTLS is used only to derive cryptographic keys.
The host app will now accept video packets. The remote peer sends video packets to the host UDP socket. The packets are encapsulated in RTP protocol. The RTP consists of a header with information like payload number, ssrc, sequence number and payload - raw codec data. The RTP itself is encrypted using the keys derived from the DTLS connection, hence the remote peer actually sends SRTP packets. The host app is capable of decoding these packets into RTP.

In a very similar manner, a host-remote connection is established with viewer clients, with few exceptions. The streamer client is linked with a room, i.e. a virtual lobby representing it's media stream. The viewer client is also linked with a room, but the client has no ownership over that data. Once the SRTP packet coming from a recognized streamer client is decoded, two things happen. One - video thumbnail data is generated (if enough data is available), two - the video packet is forwarded to all viewer clients linked to the streamer's room. For each viewer client, the RTP packet is encrypted into the SRTP packet using the associated DTLS keys. The packet is then sent through UDP to the remote peer.

