Chromium. Web page rendering using Blink, CC and scheduler

The Chromium engine from Google consists of a vast number of internal mechanisms, subsystems, and other engines. In this article, we will delve into the process of composing and rendering web pages directly on the screen, as well as get a little closer acquainted with the Blink engine, the composer (or, as it is also called, the content collator), and the task scheduler.

Web Page Parsing

First of all, let's remember how the rendering of a web page actually happens.

After receiving the HTML document, the browser parses it. Since HTML was originally developed to be compatible with the traditional XML structure, there are no interesting features for us at this stage. As a result of parsing, the browser obtains a hierarchical tree of objects - the DOM (Document Object Model).

As the browser goes through the structure of the HTML and parses it into the DOM, it encounters elements such as styles and JS scripts (both inline and as remote resources). Such elements require additional processing. JS script is parsed by the JavaScript engine into an AST structure and then is laid out in memory in the form of internal objects of the engine itself. Styles, on the other hand, are arranged into a cascading CSSOM tree. Inline styles of elements will also be added to this tree.

Having obtained the DOM and CSSOM, the browser now has the possibility to perform all the necessary calculations for the positioning and display of elements and, as a result, to construct a single Render Tree, based on which the graphics will be rendered directly on the screen.

Web Page Rendering

At present, most browsers are multi-threaded, and browsers based on the Chromium engine are no exception.

Blink

A separate system called Blink is responsible for all content rendering in the browser tab. Overall, Blink is a large and complex engine that includes functions such as implementing the HTML specification in terms of DOM, CSS, and Web IDL, integrating the V8 engine and running JavaScript code, rendering graphics on the screen (via the Skia engine), requesting network resources, handling input operations, building trees (DOM, CSSOM, Render Tree), calculating styles and positioning, and much more, including the Chrome Compositor (CC).

The engine itself is not a standalone "out-of-the-box" solution and cannot be launched independently. It is a kind of fork of the WebCore component of the WebKit engine.

Blink is used in platforms such as Chromium, Android WebView, Opera, Microsoft Edge, and many other Chromium-based browsers.

Chrome Compositor (CC)

In the Chromium codebase, this mechanism is located in the cc/ directory. Historically, the abbreviation "CC" is interpreted as Chrome Compositor, although as of today, it does not function as a compositor itself. Dana Jensens (danakj) even proposed an alternative name - Content Collator.

The CC is launched by the Blink engine and can operate in both single-threaded and multi-threaded modes. The operation of the CC requires a separate article; we will not delve into all the mechanics of the system here. Let's just note that the main entities operated by the CC are layers and layer trees. These entities are available from the CC as an API. Layers can be of various types, such as picture layers, texture layers, surface layers, and more. The task of the CC client (in our case, Blink serves as the client for the CC) is to construct its layer tree (LayerTreeHost) and inform the CC that it is ready and can be rendered. This approach allows the process of generating the final composition to be atomic.

Fundamental Rendering Scheme

The Chromium is a multi-threaded engine. For many specific rendering operations, separate threads are allocated. The basic threads, can be considered the main thread and the compositor thread.

The main thread is the general thread in which Blink operates. It is here that the final RenderTree and LayerTreeHost are composed. Grouped input operations are also processed here, and JavaScript code is executed. After all necessary operations and calculations, Blink notifies the CC that the trees are ready for rendering.

The compositor thread is responsible for the operation of the CC, i.e. for the scheduling of the rendering tasks and direct drawing on a screen.

Blink strives to display graphics at a rate of 60 FPS (60 frames per second), i.e., one frame should be output to the screen in approximately 16.6 ms. This frame rate is considered optimal for human perception. A lower frequency can lead to junking, juddering, and jittering.

The simplified rendering scheme is shown in the diagram above. As I mentioned earlier, the CC runs in a separate thread. At a certain point, it decides that it's time to initiate frame rendering. The CC signals the main thread to start a new frame. Blink receives the signal in the main thread and performs necessary scheduled operations, such as batch processing of input, execution of JavaScript code (or more precisely, JavaScript tasks from the Event Loop), and updating the RenderTree. Once the RenderTree is ready, the engine makes corresponding changes in the LayerTreeHost (in its temporary copy) and sends a commit signal to the CC, where it retrieves all calculations and sends tasks to draw graphics on the final device using the APIs of the respective OS graphics libraries, such as OpenGL and DirectX.

This entire process is given a time window of 1000/60 = ~16.6 ms. If the engine cannot complete all necessary operations within this time, the frame will be delayed, resulting in a decrease in the frame rate. Therefore, a crucial task for Blink is to calculate and predict the execution time of upcoming tasks. Knowing how long a particular operation will take, the engine can only work on what it can accomplish within the allocated time, deferring the rest of the operations until later.

There are also operations that do not involve specific calculations and do not use JavaScript, such as scrolling. CC can perform such operations independently in its own thread, thereby not blocking the main thread. Conversely, for example, animations require intensive calculations on the main thread throughout a large number of frames. If the main thread is occupied with other high-priority tasks, a portion of the animation operations may be postponed or delayed.

Task Scheduler

The task scheduler is designed to minimize the likelihood of delays in updating the frame. It is launched in the main thread. Each task is placed by the engine mechanisms in one of the queues specific to that type of task. CC tasks go into their own queue, input processing operations go into their own queue, JavaScript code execution goes into its own queue, and processes for loading the page go into their own queue.

Tasks within a queue are executed in the order in which they are placed there (remember the Event Loop). However, the scheduler, which is free to dynamically choose from which queue to execute the next task, chooses the priority of the queue according to its own discretion. When and which priority to set, the scheduler decides based on signals received from multiple different systems. For example, if a page is still loading, network requests and HTML parsing will be given priority. And if a touch event is detected, the scheduler will temporarily increase the priority of input operations for the next 100 ms to correctly recognize possible gestures. It is assumed that during this time interval, the next possible events might include scrolling, tapping, zooming, etc.

Having full information about the queues and tasks in them, as well as signals from other components, the scheduler can calculate the approximate idle time of the system. Just above, we considered an example of frame rendering. Along with the signal from CC about the start of frame rendering, the estimated time of the next frame is also sent, if the frame is available (+16.6ms). Knowing if the operations necessary for rendering the frame, input processing, and the JavaScript code to be executed are available, the scheduler can estimate the duration of these tasks. And knowing the time of the next frame, it can also calculate the time of the idle time. In fact, this period is not exactly idle. It can be used to perform a number of low-priority tasks (idle tasks). These tasks are placed in their own queue and executed in portions only after other queues have emptied, and within a limited period of time. In particular, the garbage collector actively utilizes this queue. This is where most of the work on marking dead objects and memory defragmentation takes place. The conservative garbage collection is only triggered with high priority in extreme cases, such as when memory shortage is detected. We will discuss this in more detail in the article Garbage Collection in V8.

Frame Rate Regularity

I have already mentioned that Chromium aims to achieve a frame rate of 60 FPS. In order to achieve this goal, the engine is equipped with a scheduler, CC, and many other systems. However, in practice, achieving perfect regularity is almost impossible, as there can be a large number of unforeseen situations that can affect the process, both internal (one or more tasks may take longer to complete than estimated by the scheduler) and external (for example, CPU or GPU load from other processes).

This is approximately what the scrolling of the page https://www.google.com/chrome looks like in the tracer. Not all animation frames were able to fit into the allocated window of 16.6 ms.

Furthermore, in addition to delays in rendering, some frames may be rejected by the Blink engine itself. This often happens, for example, during animation. There are many reasons for dropping animation frames. Blink supposes about twenty such reasons (at the time of writing this article, the Chromium version was 124.0.6326.0).

/third_party/blink/renderer/core/animation/compositor_animations.h#65

enum FailureReason : uint32_t {
  kNoFailure = 0,
  
  // Cases where the compositing is disabled by an exterior cause.
  kAcceleratedAnimationsDisabled = 1 << 0,
  kEffectSuppressedByDevtools = 1 << 1,
  
  // There are many cases where an animation may not be valid (e.g. it is not
  // playing, or has no effect, etc). In these cases we would never composite
  // it in any world, so we lump them together.
  kInvalidAnimationOrEffect = 1 << 2,
  
  // The compositor is not able to support all setups of timing values; see
  // CompositorAnimations::ConvertTimingForCompositor.
  kEffectHasUnsupportedTimingParameters = 1 << 3,
  
  // Currently the compositor does not support any composite mode other than
  // 'replace'.
  kEffectHasNonReplaceCompositeMode = 1 << 4,
  
  // Cases where the target element isn't in a valid compositing state.
  kTargetHasInvalidCompositingState = 1 << 5,
  
  // Cases where the target is invalid (but that we could feasibly address).
  kTargetHasIncompatibleAnimations = 1 << 6,
  kTargetHasCSSOffset = 1 << 7,
  
  // This failure reason is no longer used, as multiple transform-related
  // animations are allowed on the same target provided they target different
  // transform properties (e.g. rotate vs scale).
  kObsoleteTargetHasMultipleTransformProperties = 1 << 8,
  
  // Cases relating to the properties being animated.
  kAnimationAffectsNonCSSProperties = 1 << 9,
  kTransformRelatedPropertyCannotBeAcceleratedOnTarget = 1 << 10,
  kFilterRelatedPropertyMayMovePixels = 1 << 12,
  kUnsupportedCSSProperty = 1 << 13,
  
  // This failure reason is no longer used, as multiple transform-related
  // animations are allowed on the same target provided they target different
  // transform properties (e.g. rotate vs scale).
  kObsoleteMultipleTransformAnimationsOnSameTarget = 1 << 14,
  
  kMixedKeyframeValueTypes = 1 << 15,
  
  // Cases where the scroll timeline source is not composited.
  kTimelineSourceHasInvalidCompositingState = 1 << 16,
  
  // Cases where there is an animation of compositor properties but they have
  // been optimized out so the animation of those properties has no effect.
  kCompositorPropertyAnimationsHaveNoEffect = 1 << 17,
  
  // Cases where we are animating a property that is marked important.
  kAffectsImportantProperty = 1 << 18,
  
  kSVGTargetHasIndependentTransformProperty = 1 << 19,
  
  // When adding new values, update the count below *and* add a description
  // of the value to CompositorAnimationsFailureReason in
  // tools/metrics/histograms/enums.xml .
  // The maximum number of flags in this enum (excluding itself). New flags
  // should increment this number but it should never be decremented because
  // the values are used in UMA histograms. It should also be noted that it
  // excludes the kNoFailure value.
  kFailureReasonCount = 20,
};

From the entire set, the most common causes are the absence of a visual animation effect (kCompositorPropertyAnimationsHaveNoEffect), when the result of the animation does not lead to changes in the graphics and therefore does not require redrawing.

Also, a frame reset can be caused by an unsupported CSS property (kUnsupportedCSSProperty). This can happen if the engine does not understand how to recalculate a certain property, even if the property itself looks perfectly valid.

<style>
  #block1 {
    animation: expand 1s linear infinite;
  }

  @keyframes expand {
    to {
      height: auto;
    }
  }
</style>

<div id="block1"></div>

In the example above, the engine does not know how to calculate the height of the block, as the final value is undefined and cannot be calculated during the animation stage. As a result, the engine will reset the first frame of the animation, and will not even attempt to calculate any further frames, since it makes no sense to do so.

In the tracer, in this case, we will find a record like this:

{"args":{"data":{"compositeFailed":8224,"unsupportedProperties":["height"]}},"cat":"blink.animations,...

All of this leads to the fact that the actual number of frames rendered per second may be less than 60.

Discrepancy as a Metric for Frame Rate Regularity

The regularity of frame rendering is important in animation-based applications, in addition to the average frame rate. If an application renders 60 (or close to it) frames per second, but the intervals between frames vary significantly, the user may experience junking, juddering, or jittering. There are numerous methods for evaluating this phenomenon, from simply measuring the longest frame to calculating the divergence in frame lengths. Each method has its advantages, but most only cover a range of cases and do not take into account the temporal order of frames. In particular, they cannot differentiate between situations where two dropped frames are close together and when they are far apart.

In response to this, Google developers proposed their own method for evaluating frame regularity. The method is based on the discrepancy in the sequence of frame durations, similar to the mathematical method of Monte Carlo integration.

The theoretical basis of the method was presented at the 37th annual ACM SIGPLAN conference in 2016.

The figure below shows an example of the discrepancy of frame regularity.

Each line represents a set of timestamps. The rendered frames are indicated by black dots, while the dropped frames are indicated by white dots. The distance between the dots is 1 VSYNC, which equals 16.6 ms for a 60 Hz refresh rate. The final discrepancies are calculated in terms of VSYNC intervals:

D(S1) = 1
D(S2) = 2
D(S3) = 2
D(S4) = 25/9
D(S5) = 3

In the perfect case (S1), the discrepancy is equal to the interval between frames (1 VSYNC). If one of the frames was dropped (S2), the discrepancy will be equal to the greatest distance between two rendered frames. In the case of S2, the greatest distance will be between points 2 and 3, equal to 2 VSYNC. The same applies if there are two dropped frames, but far apart from each other (S3). This is because the method aims to identify the worst-case performance, rather than the average, as reflected in the calculation formulas. Therefore, the method is combined with the average frame duration to differentiate a single dropped frame from a series of repeated dropouts (the last one is obviously worse). In case S4, we see two dropped frames that are close to each other. Such frames are considered as a single missed area, and the discrepancy here will be 25/9 (~2.7) VSYNC. The situation is even worse in case S5, as there were no rendered frames between the two dropped frames. The greatest distance between the rendered frames here will be between points 2 and 4, which equals 3 intervals (3 VSYNC).

My telegram channels:

EN - https://t.me/frontend_almanac
RU - https://t.me/frontend_almanac_ru

Русская версия: https://blog.frontend-almanac.ru/chromium-rendering