mở cửa Source

Platforms Infrastructure Systems Physical Infrastructure Video & AR/VR Artificial Intelligence


People upload hundreds of millions of videos lớn Facebook every day. Making sure every Clip is delivered at the best chất lượng — with the highest resolution and as little buffering as possible — means optimizing not only when and how our đoạn phim codecs compress và decompress videos for viewing, but also which codecs are used for which videos. But the sheer volume of đoạn Clip nội dung on Facebook also means finding ways to lớn vày this that are efficient và don’t consume a ton of computing power & resources.

Bạn đang xem: How to create facebook live videos that people actually want to watch

To help with this, we employ a variety of codecs as well as adaptive sầu bitrate streaming (ABR), which improves the viewing experience và reduces buffering by choosing the best quality based on a viewer’s network bandwidth. But while more advanced codecs like VP9 provide better compression performance over older codecs, like H264, they also consume more computing power. From a pure computing perspective sầu, applying the most advanced codecs lớn every video uploaded lớn Facebook would be prohibitively inefficient. Which means there needs to be a way khổng lồ prioritize which videos need to be encoded using more advanced codecs.

Today, Facebook giao dịch with its high dem& for encoding high-quality đoạn phim nội dung by combining a benefit-cost mã sản phẩm with a machine learning (ML) Mã Sản Phẩm that lets us prioritize advanced encoding for highly watched videos. By predicting which videos will be highly watched and encoding them first, we can reduce buffering, improve sầu overall visual chất lượng, and allow people on Facebook who may be limited by their data plans to watch more videos.

But this task isn’t as straightforward as allowing nội dung from the most popular uploaders or those with the most friends or followers to jump to the front of the line. There are several factors that have khổng lồ be taken into consideration so that we can provide the best đoạn phim experience for people on Facebook while also ensuring that nội dung creators still have sầu their content encoded fairly on the platform.

How we used to encode Clip on Facebook

Traditionally, once a đoạn Clip is uploaded lớn Facebook, the process to lớn enable ABR kicks in and the original Clip is quickly re-encoded inlớn multiple resolutions (e.g., 360p, 480p, 720p, 1080p). Once the encodings are made, Facebook’s đoạn Clip encoding system tries khổng lồ further improve sầu the viewing experience by using more advanced codecs, such as VP9, or more expensive sầu “recipes” (a video clip industry term for fine-tuning transcoding parameters), such as H264 very slow profile, to compress the video clip file as much as possible. Different transcoding technologies (using different codec types or codec parameters) have sầu different trade-offs between compression efficiency, visual unique, & how much computing power is needed.

The question of how to order jobs in a way that maximizes the overall experience for everyone has already been top of mind. Facebook has a specialized encoding compute pool và dispatcher. It accepts encoding job requests that have a priority value attached to lớn them và puts them inlớn a priority queue where higher-priority encoding tasks are processed first. The video encoding system’s job is then khổng lồ assign the right priority lớn each task. It did so by following a danh sách of simple, hard-coded rules. Encoding tasks could be assigned a priority based on a number of factors, including whether a video is a licensed music đoạn Clip, whether the đoạn Clip is for a product, và how many friends or followers the video’s owner has.

But there were disadvantages to lớn this approach. As new video clip codecs became available, it meant expanding the number of rules that needed khổng lồ be maintained & tweaked. Since different codecs & recipes have different computing requirements, visual unique, and compression performance trade-offs, it is impossible khổng lồ fully optimize the over user experience by a coarse-grained mix of rules.

And, perhaps most important, Facebook’s video consumption pattern is extremely skewed, meaning Facebook videos are uploaded by people & pages that have sầu a wide spectrum in terms of their number of friends or followers. Compare the Facebook page of a big company like Disney with that of a vlogger that might have 200 followers. The vlogger can upload their video clip at the same time, but Disney’s Clip is likely lớn get more watch time. However, any đoạn Clip can go viral even if the uploader has a small following. The challenge is khổng lồ tư vấn nội dung creators of all sizes, not just those with the largest audiences, while also acknowledging the reality that having a large audience also likely means more views & longer watch times.

Enter the Benefit-Cost model

The new Mã Sản Phẩm still uses a mix of quick initial H264 ABR encodings to lớn ensure that all uploaded videos are encoded at good unique as soon as possible. What’s changed, however, is how we calculate the priority of encoding jobs after a đoạn Clip is published.

The Benefit-Cost Mã Sản Phẩm grew out of a few fundamental observations:

A đoạn phim consumes computing resources only the first time it is encoded. Once it has been encoded, the stored encoding can be delivered as many times as requested without requiring additional compute resources.A relatively small percentage (roughly one-third) of all videos on Facebook generate the majority of overall watch time.Facebook’s data centers have limited amounts of energy to lớn power compute resources.We get the most bang for our buchồng, so to lớn speak, in terms of maximizing everyone’s đoạn phim experience within the available power constraints, by applying more compute-intensive “recipes” và advanced codecs to videos that are watched the most.


Based on these observations, we came up with following definitions for benefit, cost, và priority:

Benefit = (relative sầu compression efficiency of the encoding family at fixed quality) * (effective predicted watch time)Cost = normalized compute cost of the missing encodings in the familyPriority = Benefit/Cost

Relative compression efficiency of the encoding family at fixed quality: We measure benefit in terms of the encoding family’s compression efficiency. “Encoding family” refers to lớn the mix of encoding files that can be delivered together. For example, H264 360p, 480p, 720p, and 1080p encoding lanes Cosplay one family, & VP9 360p, 480p, 720p, và 1080p Cosplay another family. One challenge here is comparing compression efficiency between different families at the same visual chất lượng.

To understand this, you first have to understvà a metric we’ve developed called Minutes of Video at High Quality per GB datapaông xã (MVHQ). MVHQ links compression efficiency directly to a question people wonder about their internet allowance: Given 1 GB of data, how many minutes of high-unique video clip can we stream?

Mathematically, MVHQ can be understood as:


For example, let’s say we have a Clip where the MVHQ using H264 fast premix encoding is 153 minutes, 170 minutes using H264 slow premix encoding, và 200 minutes using VP9. This means delivering the video using VP9 could extkết thúc watch time using 1 GB data by 47 minutes (200-153) at a high visual unique threshold compared lớn H264 fast premix. When calculating the benefit value of this particular video, we use H264 fast as the baseline. We assign 1.0 khổng lồ H264 fast, 1.1 (170/153) to lớn H264 slow, và 1.3 (200/153) khổng lồ VP9.

The actual MVHQ can be calculated only once an encoding is produced, but we need the value before encodings are available, so we use historical data lớn estimate the MVHQ for each of the encoding families of a given video clip.

Effective sầu predicted watch time: As described further in the section below, we have sầu a sophisticated ML Model that predicts how long a Clip is going khổng lồ be watched in the near future across all of its audience. Once we have the predicted watch time at the video cấp độ, we estimate how effectively an encoded family can be applied to lớn a Clip. This is to tài khoản for the fact that not all people on Facebook have sầu the lakiểm tra devices, which can play newer codecs.

For example, about đôi mươi percent of đoạn Clip consumption happens on devices that cannot play videos encoded with VP9. So if the predicted watch time for a đoạn phim is 100 hours the effective predicted watch time using the widely adopted H264 codec is 100 hours while effective sầu predicted watch time of VP9 encodings is 80 hours.

Normalized compute cost of the missing encodings in the family: This is the amount of logical computing cycles we need to lớn make the encoding family deliverable. An encoding family requires a minimum mix of resolutions lớn be made available before we can deliver a đoạn phim. For example, for a particular đoạn phim, the VP9 family may require at least four resolutions. But some encodings take longer than others, meaning not all of the resolutions for a Clip can be made available at the same time.

As an example, let’s say Video A is missing all four lanes in the VP9 family. We can sum up the estimated CPU usage of all four lanes và assign the same normalized cost lớn all four jobs. 

If we are only missing two out of four lanes, as shown in Video B, the compute cost is the sum of producing the remaining two encodings. The same cost is applied khổng lồ both jobs. Since the priority is benefit divided by cost, this has the effect of a task’s priority becoming more urgent as more lanes become available. Encoding lanes do not provide any value until they are deliverable, so it is important to get to a complete lane as quickly as possible. For example, having one đoạn phim with all of its VP9 lanes adds more value than 10 videos with incomplete (và therefore, undeliverable) VP9 lanes.



Predicting watch time with ML

With a new benefit-cost Model in place lớn tell us how certain videos should be encoded, the next piece of the puzzle is determining which videos should be prioritized for encoding. That’s where we now utilize ML lớn predict which videos will be watched the most and thus should be prioritized for advanced encodings.

Xem thêm: Hướng Dẫn Làm Video Youtube Và Up Video Cho Người Mới Bắt Đầu

Our Model looks at a number of factors to predict how much watch time a đoạn Clip will get within the next hour. It does this by looking at the Clip uploader’s frikết thúc or follower count & the average watch time of their previously uploaded videos, as well as metadata from the Clip itself including its duration, width, height, privacy status, post type (Live sầu, Stories, Watch, etc.), how old it is, and its past popularity on the platsize.

But using all this data to lớn make decisions comes with several built-in challenges:

Watch time has high variance and has a very long-tail skewed nature. Even when we focus on predicting the next hour of watch time, a video’s watch time can range anywhere from zero to lớn over 50,000 hours depending on its nội dung, who uploaded it, and the video’s privacy settings. The mã sản phẩm must be able to tell not only whether the video will be popular, but also how popular.

The best indicator of next-hour watch time is its previous watch time trajectory. Video popularity is generally very volatile by nature. Different videos uploaded by the same nội dung creator can sometimes have vastly different watch times depending on how the community reacts khổng lồ the content. After experimenting with multiple features, we found that past watch time trajectory is the best predictor of future watch time. This poses two technical challenges in terms of designing the Mã Sản Phẩm architecture and balancing the training data:

Newly uploaded videos don’t have a watch time trajectory. The longer a Clip stays on Facebook, the more we can learn from its past watch time. This means that the most predictive features won’t apply lớn new videos. We want our Model to persize reasonably well with missing data because the earlier the system can identify videos that will become popular on the platsize, the more opportunity there is lớn deliver higher-chất lượng nội dung.Popular videos have a tendency to lớn dominate training data. The patterns of the most popular videos are not necessarily applicable khổng lồ all videos.

Watch time nature varies by video clip type. Stories videos are shorter & get a shorter watch time on average than other videos. Live streams get most of their watch time during the stream or a few hours afterward. Meanwhile, videos on demvà (VOD) can have sầu a varied lifespan & can rack up watch time long after they’re initially uploaded if people start sharing them later.

Improvements in ML metrics vày not necessarily correlate directly lớn hàng hóa improvements. Traditional regression loss functions, such as RMSE, MAPE, và Huber Loss, are great for optimizing offline models. But the reduction in modeling error does not always translate directly lớn product improvement, such as improved user experience, more watch time coverage, or better compute utilization.

Building the ML model for đoạn Clip encoding


To solve these challenges, we decided to lớn train our Model by using watch time event data. Each row of our training/evaluation represents a decision point that the system has to lớn make a prediction for.

Since our watch time event data can be skewed or imbalanced in many ways as mentioned, we performed data cleaning, transformation, bucketing, & weighted sampling on the dimensions we care about.

Also, since newly uploaded videos don’t have a watch time trajectory khổng lồ draw from, we decided to build two models, one for handling upload-time requests & other for view-time requests. The view-time Model uses the three sets of features mentioned above. The upload-time mã sản phẩm looks at the performance of other videos a nội dung creator has uploaded and substitutes this for past watch time trajectories. Once a video is on Facebook long enough to have sầu some past trajectories available, we switch it khổng lồ use the view-time Mã Sản Phẩm.

During Model development, we selected the best launch candidates by looking at both Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE). We use both metrics because RMSE is sensitive lớn outliers while MAPE is sensitive sầu to small values. Our watch time label has a high variance, so we use MAPE to evaluate the performance of videos that are popular or moderately popular và RMSE to lớn evaluate less watched videos. We also care about the model’s ability to lớn generalize well across different đoạn Clip types, ages, and popularity. Therefore, our evaluation will always include per-category metric as well.

MAPE & RMSE are good summary metrics for Mã Sản Phẩm selection, but they don’t necessarily reflect direct hàng hóa improvements. Sometimes when two models have a similar RMSE và MAPE, we also translate the evaluation lớn classification problem khổng lồ underst& the trade-off. For example, if a video receives 1,000 minutes of watch time but Model A predicts 10 minutes, Model A’s MAPE is 99 percent. If Model B predicts 1,990 minutes of watch time, Model B’s MAPE will be the same as Model A’s (i.e., 99 percent), but Model B’s prediction will result in the video more likely having high-chất lượng encoding.

We also evaluate the classifications that videos are given because we want khổng lồ capture the trade-off between applying advanced encoding too often and missing the opportunity lớn apply them when there would be a benefit. For example, at a threshold of 10 seconds, we count the number of videos where the actual video clip watch time is less than 10 seconds and the prediction is also less than 10 seconds, & vice versa, in order to lớn calculate the model’s false positive sầu & false negative rates. We repeat the same calculation for multiple thresholds. This method of evaluation gives us insights inkhổng lồ how the model performs on videos of different popularity levels and whether it tends to suggest more encoding jobs than necessary or miss some opportunities.

The impact of the new video clip encoding model

In addition khổng lồ improving viewer experience with newly uploaded videos, the new Model can identify older videos on Facebook that should have been encoded with more advanced encodings and route more computing resources lớn them. Doing this has shifted a large portion of watch time khổng lồ advanced encodings, resulting in less buffering without requiring additional computing resources. The improved compression has also allowed people on Facebook with limited data plans, such as those in emerging markets, khổng lồ watch more videos at better quality.

What’s more, as we introduce new encoding recipes, we no longer have sầu lớn spend a lot of time evaluating where in the priority range to assign them. Instead, depending on a recipe’s benefit and cost value, the Mã Sản Phẩm automatically assigns a priority that would maximize overall benefit throughput. For example, we could introduce a very compute-intensive sầu recipe that only makes sense lớn be applied lớn extremely popular videos & the Mã Sản Phẩm can identify such videos. Overall, this makes it easier for us to lớn continue to lớn invest in newer & more advanced codecs to lớn give sầu people on Facebook the best-quality đoạn phim experience.

Xem thêm: Các Phần Mềm Edit Video Dễ Làm Nhất Trên Laptop Dành Cho Dân Nghiệp Laptopnew


This work is the collective result of the entire Video Infra team at Facebook. The authors would like to lớn personally thank Shankar Regunathan, Atasay Gokkaya, Volodymyr Kondratenko, Jamie Chen, Cosmin Stejerean, Denise Noyes, Zach Wang, Oytun Eskiyenenturk, Mathieu Henaire, Pankaj Sethi, & David Ronca for all their contributions.