Universal playback and streaming support using MP4 and Range request headers

March 17, 2024

Universal playback and streaming support using MP4 and Range request headers

This is part 3 of our blog on how we are building NeetoRecord, a Loom alternative. Here are part 1 and part 2.

In the part 1 of our blog, we uploaded the recording from the browser to S3 in small parts and stitched them together to get the final WEBM video file. We could use this WEBM file to share our recording with our audience, but it has a few drawbacks:

WEBM is not universally supported. Though most modern browsers support WEBM, a few browsers, especially devices in the Apple ecosystem, do not play WEBM reliably.
Metadata for timestamps and duration are not present in WEBM videos. So, these videos are not "seekable." It means these videos do not show the video length, and we cannot move back and forth using the seek bar. The video starts playing back from the beginning when the user tries to push the seek bar.

Hence, we needed to convert the WEBM videos to a universally supported format to solve the above problems. We chose MP4.

MP4

MP4 is a widely used multimedia file storage format for video storage and streaming. It is an international standard that works with a vast range of devices. MP4 refers to the digital container file that acts as a wrapper around the video, not the video itself. The video content within MP4 files is encoded with MPEG-4, a common encoding standard.

We chose MP4 because:

MP4 works with HTML5 video player.
Supports multiple streaming protocols.
Comprehensive support for user devices and browsers.

WEBM to MP4 conversion

AWS MediaConvert service

Since our WEBM files were in an S3 bucket, our first idea was to use an AWS service to do the WEBM to MP4 conversion. We configured AWS Elemental MediaConvert service and connected it to our WEBM bucket. When a user uploads a WEBM file to the bucket, MediaConvert picks it up, converts it to MP4 and uploads it to a new bucket.

MediaConvert worked as expected, but we had to find another solution because:

Cost - we found it too expensive for our use case.
Performance - It took a long time to do the conversion. While the smaller recordings took about 20-30s, large ones took minutes. The time taken grew linearly with the size of the WEBM file.

Manual transcoding using FFMPEG using AWS Lambda

Converting WEBM to MP4 involves transcoding. Transcoding is the process of changing the audio/video codecs in a container file. Codecs are algorithms used to encode and decode digital media data. Converting to MP4 would mean using codecs which are part of the MPEG-4 family. Eg: H.264 for video and AAC for audio. FFMPEG is a popular open source tool that can be used for transcoding WEBM to MP4.

ffmpeg -i input.webm -c:v libx264 -c:a aac  output.mp4

-c:v libx264 sets the video codec to libx264, which is a widely supported H.264 codec.
-c:a aac sets the audio codec to AAC, which is a commonly used audio codec.

We could run FFMPEG on our webserver and run the transcoding process. But that will not be easy to scale. So, we decided to use a serverless solution that would automatically scale. Since our input files were on AWS S3, AWS Lambda was the obvious choice.

We installed FFMPEG on AWS Lambda using a Layer as described in this post.

We configured our input S3 bucket (one to which WEBM was uploaded) to trigger Lambda whenever a new file was uploaded. FFMPEG would then transcode WEBM to MP4 and store the output in another S3 bucket.

This worked as expected. But performance was still a problem. Time taken was proportional to the input file size and took longer than was acceptable for us.

Transmuxing instead of transcoding

Transmuxing or stream copy is a fast process that doesn't involve re-encoding but instead directly copies the existing audio and video streams into a new container format. This approach works well when the codecs used in the input file (WebM) are compatible with the output container format (MP4).

Popular browsers like Chrome, Brave, Safari etc. use the H264 codec for video encoding. This is compatible with MP4. So transmuxing works flawlessly. But Firefox uses the VP8 or VP9 codec which is incompatible with MP4. Since we were planning to build a Chrome extension for NeetoRecord we only needed to worry about Chrome and we could ignore Firefox users for now.

ffmpeg -i input.webm -c:v copy -c:a copy output.mp4

We modified the ffmpeg command as shown above. It now uses the -c:v copy and -c:a copy options, which copies the video and audio from the input file to the output file without re-encoding. MP4 conversion now became extremely fast, and the time taken did not increase significantly with size of the input file.

Streaming

Now that we successfully generated MP4 files, it was time to think of delivering the file efficiently to the client (browser) for playback. We had two problems to solve:

S3 is a storage service. It is not suitable for content delivery.
- Relatively high data transfer costs.
- Storage is in one geographical region, resulting in slower delivery over the network.
Video files are large in size. Downloading the entire file and then playing it back is not efficient in terms of speed and data transfer. We needed to find a way to allow streaming of the files. ie. deliver chunks of data as and when it was needed by the client.

Cloudfront as CDN

CloudFront is a content delivery network (CDN) service provided by AWS. It can be used as a CDN for S3, and this combination is a common architecture for distributing content globally with low latency and high transfer speeds.

We created Cloudfront distribution which is connected to our MP4 bucket. Once the distribution is deployed, we can access the MP4 files using the CloudFront domain name. When users request content through CloudFront, CloudFront checks its cache for the requested content. If the content is in the cache and is still valid (based on cache-control headers), CloudFront serves the content directly from its edge locations, reducing latency. If the content is not in the cache or is expired, CloudFront retrieves the content from the S3 bucket, caches it, and serves it to the user. This helps reduce the load on our S3 bucket and improves the performance of content delivery.

The HTTP Range request header

HTTP Range requests allow clients to request specific portions of a file from a server. This feature enables users to stream or download only parts of the file they need, reducing bandwidth usage and improving user experience. At first the client could request the range for beginning of the video file and then as the playback proceed, request for subsequent parts. If the user moves back and forth the video using the seek bar, corresponding ranges can be requested.

GET /example.mp4 HTTP/1.1
Host: example.com
Range: bytes=5000-9999

Range: bytes=5000-9999 is the Range header indicating the specific bytes the client wants to retrieve. In this case, the client requests bytes 5000 to 9999 of the MP4 file. The numbering starts from zero, so byte 5000 means the 5001st byte in the file.

Server responds with a 206 response (Partial content) with the request sequence of bytes in the body. If the server does not support range requests, then it responds with a 200 along with the full content.

Checking if the server supports Range requests

We can perform a check by issuing a HEAD request to the server to see if the server supports Range requests.

curl -I http://abc.com/1.mp4

If range requests are supported, then server responds with a Accept-Ranges: bytes header.

HTTP/1.1 200 OK
…
Accept-Ranges: bytes
Content-Length: 146515

We did the test on our S3 bucket directly first, and then through Cloudfront. Both S3 and Cloudfront supports Range request headers.

MP4, as mentioned above supports streaming. It has metadata to help the server deliver it in chunks as requested. The HTML5 video player supports progressive download automatically, by making use of HTTP Range headers.

So now we have our video in a file format that supports streaming (MP4), web server that supports Range headers (S3 and Cloudfront) and a client that uses Range headers for progressive download - all the ingredients needed to support streaming.

If this blog was helpful, check out our full blog archive.