March 7

Garbage Collection in V8

In this article, we will delve into the process of garbage collection by the V8 engine in detail. We will familiarize ourselves with the concepts of generations, Minor and Major Garbage Collections, see how objects are allocated, traced, and marked in memory. We will also explore what happens to empty spaces after cleaning and how garbage collection is performed in the background.

What the garbage collector collects

Before diving into the garbage collection process itself, let's first understand what exactly V8 collects and how it traces objects.

During V8 work, there can be various data structures in memory. Some of these structures are internal representations of V8 itself, such as hidden classes, lists, queues, and more. Another part consists of JavaScript objects. Practically all data types inside V8 are objects, except for so-called SMI - small numbers not exceeding 2^30 - 1, which are stored in memory as direct binary values, and all variables that have such a value store a reference to the memory area with this specific SMI value. I have written more on this in the article Deep JS. In memory of data and types. All other types are represented in memory as objects and simulate their mutability/immutability in accordance with the ECMA-262 specification, issuing, as we commonly say, "primitive" or "reference" types.

Indeed, the garbage collection process is not limited to JavaScript objects. The Chromium engine provides garbage collection for anything that can be collected, including removed HTML elements, unused CSS stylesheets, completed animations, and more. Since Chromium is written in C++, its internal engine objects are, obviously, C++ classes. However, JavaScript objects, although created using the same language, are more abstract concepts and must adhere to the requirements of the ECMA-262 specification. In particular, JavaScript objects do not have a fixed size and can change dynamically. The structure of a JavaScript heap differs from that of a C++ heap. Let's take a closer look at how such objects are placed in memory. It is important to understand that the process of collecting dead JavaScript objects will somewhat differ from the collection of C++ objects.

A few years ago, in approximately 2018, the Chromium team began developing a system for collecting dead C++ objects. The system was named Oilpan and became part of the internal rendering engine Blink. Later, in 2020, the developers decided to move the system inside the V8 engine. On one hand, V8 uses the same garbage collection mechanisms as Blink, and on the other hand, migrating to V8 makes the system available to third-party developers who integrate the V8 engine but do not use Blink, such as Node.js. As a result, the cppgc directory appeared in the V8 API.

Objects Tracing

As I mentioned before, the essence of garbage collection is to free memory from "dead" objects. An object is considered dead when there is no longer any reference to it, meaning it cannot be accessed. Therefore, such an object can no longer be used and should be cleared. Conversely, a "live" object is an object that can be reached through references from the root.

The simplified tracing scheme is depicted in the figure above. References to all created objects are stored in a separate global table called GlobalGCInfoTable. During garbage collection, an iterative traversal of objects occurs from roots (there can be multiple roots, and the traversal is carried out through all active roots). Objects reachable by the garbage collector are considered "live", and they are marked in black. All other objects are considered "dead" and marked in white.

Tracing C++ Objects

As I said earlier, C++ objects differ from JavaScript objects. The key difference is that JavaScript objects are standardized and described in section 6.1.7 The Object Type of the specification, making them predictable for the garbage collector. On the other hand, C++ objects are not standardized, and their structures can vary significantly. Moreover, not every C++ object needs to be potentially collectible. Therefore, a special API was provided for C++ objects in Oilpan.

There is a post in the V8 blog titled High-performance garbage collection for C++. I won't retell the entire post; I'll just outline the main concept. C++ objects that are intended to be collected by the garbage collector must implement GarbageCollected or GarbageCollectedMixin interfaces. In fact, this means that the object must have a Trace() method in which the object itself must call the tracing of other objects it references.

For example, here is what the tracing of the RunningAnimation class looks like (at the time of writing the article, in Chromium version 124.0.6339.1):

src/third_party/blink/renderer/core/animation/css/css_animations.h

class RunningAnimation final : public GarbageCollected<RunningAnimation> {
 public:
  RunningAnimation(Animation* animation, NewCSSAnimation new_animation)
      : animation(animation),
        name(new_animation.name),
        name_index(new_animation.name_index),
        specified_timing(new_animation.timing),
        style_rule(new_animation.style_rule),
        style_rule_version(new_animation.style_rule_version),
        play_state_list(new_animation.play_state_list) {}

  AnimationTimeline* Timeline() const {
    return animation->TimelineInternal();
  }
  const std::optional<TimelineOffset>& RangeStart() const {
    return animation->GetRangeStartInternal();
  }
  const std::optional<TimelineOffset>& RangeEnd() const {
    return animation->GetRangeEndInternal();
  }

  void Update(UpdatedCSSAnimation update) {
    DCHECK_EQ(update.animation, animation);
    style_rule = update.style_rule;
    style_rule_version = update.style_rule_version;
    play_state_list = update.play_state_list;
    specified_timing = update.specified_timing;
  }

  void Trace(Visitor* visitor) const {
    visitor->Trace(animation);
    visitor->Trace(style_rule);
  }

  Member<Animation> animation;
  AtomicString name;
  size_t name_index;
  Timing specified_timing;
  Member<StyleRuleKeyframes> style_rule;
  unsigned style_rule_version;
  Vector<EAnimPlayState> play_state_list;
};

Here we see that the class implements the GarbageCollected interface, and its Trace method triggers tracing of internal objects animation and style_rule.

Tracing JavaScript Objects.

Unlike C++ objects and the Oilpan system, there are no official publications about JavaScript objects. Therefore, it is common to mistakenly equate tracing of JavaScript objects with C++ objects. In reality, JavaScript objects are fully controlled by the V8 engine and reside in a dedicated V8 heap. This means there is no need for an additional externally managed API implementation. Each JavaScript object extend the HeadObject class, which has Iterate, IterateFast, IterateBody, and IterateBodyFast methods. The heap itself has a Visit method that initiates tracing through objects in this heap (at the time of writing the article, in V8 version 12.4.147).

src/objects/heap-object.h#204

// HeapObject is the superclass for all classes describing heap allocated
// objects.
class HeapObject : public TaggedImpl<HeapObjectReferenceType::STRONG, Address> {
 public:
  
  ...
  
  // Iterates over pointers contained in the object (including the Map).
  // If it's not performance critical iteration use the non-templatized
  // version.
  void Iterate(PtrComprCageBase cage_base, ObjectVisitor* v);

  template <typename ObjectVisitor>
  inline void IterateFast(PtrComprCageBase cage_base, ObjectVisitor* v);

  template <typename ObjectVisitor>
  inline void IterateFast(Tagged<Map> map, ObjectVisitor* v);

  template <typename ObjectVisitor>
  inline void IterateFast(Tagged<Map> map, int object_size, ObjectVisitor* v);

  // Iterates over all pointers contained in the object except the
  // first map pointer.  The object type is given in the first
  // parameter. This function does not access the map pointer in the
  // object, and so is safe to call while the map pointer is modified.
  // If it's not performance critical iteration use the non-templatized
  // version.
  void IterateBody(PtrComprCageBase cage_base, ObjectVisitor* v);
  void IterateBody(Tagged<Map> map, int object_size, ObjectVisitor* v);

  template <typename ObjectVisitor>
  inline void IterateBodyFast(PtrComprCageBase cage_base, ObjectVisitor* v);

  template <typename ObjectVisitor>
  inline void IterateBodyFast(Tagged<Map> map, int object_size,
                            ObjectVisitor* v);
  
  ...
} 

Generations of Garbage Collection

The procedure of finding dead objects and cleaning them up can be quite slow. In modern applications, memory can contain thousands of objects, with sizes exceeding 1 Gb. When it comes to C++ object collection, work can be offloaded to separate threads and executed on-demand. However, when we talk about JavaScript, being single-threaded and without a mechanism for synchronization between threads, performing a full object tracing on a regular basis will likely lead to delays and reduced performance. From this point on, we will discuss the implementation of garbage collection specifically for JavaScript objects.

To avoid system overload and conduct the garbage collection process unnoticed by the user, the V8 team decided to break down the process into parts and work on each of them separately.

There is a hypothesis called the infant mortality or generational hypothesis, according to which, in most cases, young objects are more likely to die sooner than old ones.

Thus, in V8, the concept of generation was introduced. There are the young generation and the old generation.

The heap is conventionally divided into small young generations (up to 16 Mb), where all newly allocated objects are placed. The old generation is intended for old objects and can be up to 1.4 Gb in size.

Additionally, both generations are organized into so-called pages of 1 Mb each. Objects larger than 600 Kb are placed in separate pages and considered part of the old generation.

The old generation also includes code space with all executable code objects and map space with involved hidden classes.

Semi-space Objects Allocation in Young Generation

To move forward and start understanding the process of removing objects from memory, let's first grasp how these objects are stored in memory.

I have already mentioned that JavaScript operates in a single thread. It does not require synchronization, and each JavaScript context gets its own personal heap. The entire subsequent algorithm will be built based on these facts and the need to fit all processes into a single thread.

So, the young generation is divided into two semi-spaces, active and inactive. All new objects are initially placed in the current active semi-space using the bump-pointer method.

The method involves always having a pointer in the heap to the current free area. When creating a new object, this pointer will be the beginning of the object. The end of the object is determined simply by calculating its size. Having determined the size and, consequently, the end of the object, the pointer is shifted to the next free address.

Once the semi-space is fully occupied, the Scavenger mechanism kicks in. Its task is to go through the live objects and move them to the new, inactive semi-space. This operation is called Minor Garbage Collection.

After that, the two semi-spaces swap places. The current active semi-space becomes inactive, and the inactive one is cleared and becomes active. Therefore, starting from the second iteration, there might still be references to live objects in the inactive semi-space. During the next execution of Minor Garbage Collection, if the objects have already been transferred to the second semi-space once, they are considered old and moved to the old generation, while the dead ones are deleted.

The duration of Minor Garbage Collection depends on the number of live objects in the young generation. If all objects are dead, the process will take less than 1 ms. However, if all or most objects are alive, it will take significantly longer.

Old generation

The old generation also uses the bump-pointer method for objects allocation, with pointers stored in separate summary tables, similar to what I mentioned at the beginning of the article.

This generation is cleaned up by another process called Major Garbage Collection when the size of live objects in the old generation exceeds a heuristically calculated limit.

To reduce latency and memory consumption, the old generation uses the mark-sweep-compactor method. The delay during marking depends on the number of live objects that need to be marked. Marking the entire heap can take up to 100 ms for large web pages. To prevent this, V8 marks objects in small portions so that each step does not exceed 5 ms.

The scheme itself is called the tricolor marking scheme. More detailed information on the marking process can be found in the post Concurrent marking in V8 in the V8 blog. Again, retelling the entire post is meaningless. I will outline the general essence of the process. Each object is in one of three statuses, symbolically indicated by colors. In reality, the object's status is a 2-bit field, where:

  • 00 - white - the initial status of all new objects. It means that the object has not been detected by the garbage collector yet.
  • 01 - gray - the object transitions to this status as soon as the collector reaches it. Such an object is placed on the list for further tracing.
  • 11 - black - the final status, indicating that the collector has visited all child nodes of the object, and it can be removed from the tracing list.

This scheme allows marking tasks to be queued in parts without blocking the main thread for a long time. The process will be completed when the list of objects for tracing becomes empty.

However, there is one problem here. Between marking steps, JavaScript code is executed, which can add new objects or remove old ones. This makes the status of already marked objects irrelevant. To solve this problem, the engine must notify the collector of all changes in the objects tree. This is done through a so-called write barrier. An example of implementing such a barrier is provided in the V8 blog.

// Called after `object.field = value`.
write_barrier(object, field_offset, value) {
  if (color(object) == black && color(value) == white) {
    set_color(value, grey);
    marking_worklist.push(value);
  }
}

Every time a new object is added to an existing one, a barrier function is triggered which checks if the parent has already been marked. In case the child object is not associated with anything yet, the parent will be returned to the tracing list, and the object itself will immediately transition to the gray status.

This code is no longer relevant these days, of course. Barriers are much more complex and look somewhat different now, but the essence remains the same. There are various implementation options for different scenarios. For a more complete picture, I will provide a code of real V8 barrier.

src/heap/marking-barrier-inl.h#64

void MarkingBarrier::MarkValueShared(Tagged<HeapObject> value) {
  // Value is either in read-only space or shared heap.
  DCHECK(InAnySharedSpace(value));
  // We should only reach this on client isolates (= worker isolates).
  DCHECK(!is_shared_space_isolate_);
  DCHECK(shared_heap_worklist_.has_value());
  // Mark shared object and push it onto shared heap worklist.
  if (marking_state_.TryMark(value)) {
    shared_heap_worklist_->Push(value);
  }
}

void MarkingBarrier::MarkValueLocal(Tagged<HeapObject> value) {
  DCHECK(!InReadOnlySpace(value));
  if (is_minor()) {
    // We do not need to insert into RememberedSet<OLD_TO_NEW> here because the
    // C++ marking barrier already does this for us.
    // TODO(v8:13012): Consider updating C++ barriers to respect
    // POINTERS_TO_HERE_ARE_INTERESTING and POINTERS_FROM_HERE_ARE_INTERESTING
    // page flags and make the following branch a DCHECK.
    if (Heap::InYoungGeneration(value)) {
      WhiteToGreyAndPush(value);  // NEW->NEW
    }
  } else {
    if (WhiteToGreyAndPush(value)) {
      if (V8_UNLIKELY(v8_flags.track_retaining_path)) {
        heap_->AddRetainingRoot(Root::kWriteBarrier, value);
      }
    }
  }
}

...

bool MarkingBarrier::WhiteToGreyAndPush(Tagged<HeapObject> obj) {
  if (marking_state_.TryMark(obj)) {
    current_worklist_->Push(obj);
    return true;
  }
  return false;
}

After the entire object graph has been marked, those objects reached by the garbage collector are marked with a black status. All others remain white. As with Minor Garbage Collection, live objects are moved to a new semi-space or a new old generation page.

The duration of Major Garbage Collection is linear and depends on the number of live objects in the old generation. Using an incremental garbage collection algorithm, V8 tries to keep the Major Garbage Collection execution time within 6 ms.

Page fragmentation

After marking, the collector can assess the level of fragmentation of the page. Empty areas appear between live objects after cleaning. Depending on which and how many objects are cleared, the final page can end up with different level of fragmentation.

The examples of highly fragmented (left) and lowly fragmented (right) pages are shown in the image above. The V8 engine evaluates the degree of fragmentation and decides what to do next with the page.

Compaction

In case of high fragmentation, the collector copies live objects to another page that has not yet been compacted, concurrently shifting them, getting rid of empty areas. This process is called compaction. As a result, a defragmented memory area with live objects is obtained. Dead objects are cleared along with the old semi-space. However, there are often many long-living objects in memory. Compacting each page may be too costly. Therefore, in the case of low fragmentation, the collector takes a different path.

Sweeping

With low fragmentation, relatively large areas remain between live objects. Instead of copying and rearranging objects, the scheduler simply puts addresses of merged dead regions into a special free-list. Later, when the engine needs to allocate space for a new object in memory, it first checks the free-list, and if there is a suitable space, it will place the new object there. This process is called sweeping.

Idle Tasks Scheduling

In the article Chromium. Web page rendering using Blink, CC and scheduler, I have described in detail the process of rendering web pages and task scheduling. Let's briefly recall what was discussed.

The Blink engine initiates the CC process in a separate compositor thread. CC receives signals from various Chromium systems and decides when and what to display on the screen. For instance, when an animation or scrolling is initiated, CC sends a signal to the main thread indicating the start of a new frame. Along with this signal, a deadline is passed – the time by which the frame must be generated. Since Chromium aims to maintain a frame rate of around 60 FPS, each frame lasts approximately 16.6 ms (1000/60). Blink has exactly this amount of time to sequentially perform tasks like processing user input, executing JavaScript code, handling other priority tasks, and updating the RenderTree. If Blink exceeds the specified time, the frame will be delayed, impacting the visual user experience (resulting in animation stuttering, flickering, etc.). However, in many cases, the engine requires much less time. With a small number of tasks, Chromium can handle a frame in less than 1 ms.

The remaining unused time is called idle time. However, idle time is not really idle. It is a task, just like the other tasks in the queue, but with a low priority. The duration of this task is equal to the time remaining until the next frame is formed. If there is no next frame expected, idle time is set to 50 ms. This number was chosen based on research showing that a user perceives a response to user input within 100 ms as instantaneous. If no new frames appear after 50 ms, a new idle time is queued and so on. However, if a signal is received to start a new frame, idle time will be interrupted without waiting for the time to end.

When idle time starts, tasks from a special low-priority queue can begin to be executed. These tasks are called idle tasks. To avoid exceeding the deadline, the end time of the idle time is being sent to the task. The responsibility of the task itself is to be completed by the specified deadline. The task can perform part of the work within the allotted time, and the remaining part can be queued as a new task. Similarly, if a task cannot complete any significant work within the specified time, it should reposition itself in the queue as a new task.

Idle tasks are not only the prerogative of the V8 engine. This mechanism is also available to web developers. Web browsers provide a method through the Web API - requestIdleCallback, which queues a task in the idle tasks. However, at the time of writing this article, the method is still in the process of development and is not supported by the Safari browser.

Idle Time Garbage Collection Scheduling

Earlier, we talked about how the garbage collection process can be quite intensive. In order not to block the system with this utility operation, V8 breaks down the entire process into small steps, with each step potentially taking up to 5 ms of CPU time. To make the process as unobtrusive as possible for the rest of the system, V8 queues garbage collection tasks specifically as idle tasks.

Minor Garbage Collection cannot be broken into parts and must be completed in a single try.

Major Garbage Collection consists of three parts:

  • Start of the incremental marking
  • Multiple marking steps
  • Finalization

Each of the phases has its own metrics for latency and memory impact. While the start of incremental marking phase itself is a fast operation, it leads to multiple marking steps and finalization, which can cause long pauses. Therefore, the start of marking can lead to worsened latency but has a significant impact on memory consumption metrics.

Launching Minor Garbage Collection reduces latency but has a relatively minor impact on memory consumption. However, regular execution of Minor Garbage Collection helps prevent objects from entering the phase of conservative garbage collection, which is performed during non-idle time. Conservative garbage collection can be initiated in critical situations, such as when a lack of allocated memory is detected.

Therefore, the scheduler should balance between starting too early (resulting in prematurely promoting objects to the old generation) and starting too late, which can lead to the young generation size growing excessively.

Minor Garbage Collection Idle Time Scheduling

In order to implement idle time Minor Garbage Collection the following is needed:

  1. a predicate that tells whether to perform Minor Garbage Collection or not, given the idle task’s deadline and the state of the new generation
  2. a mechanism to post an idle task on the Blink scheduler that performs a Minor Garbage Collection at an appropriate time

V8 performs idle time Minor Garbage Collection only if its estimated time fits within the given idle task’s deadline and there are enough objects allocated in the young generation.

Let H be the total size in bytes of objects in the young generation.

S' be the average speed of previously observed mi- nor garbage collection in bytes per second

T be the current idle task deadline in seconds

T' be the average idle task deadline in seconds

V8 performs Minor Garbage Collection if:

max(T' * S' - N, Hmin) < H <= S' * T

where, N is the estimated number of bytes that will be allocated before the next idle task

Hmin is the minimum young generation size that warrants garbage collection.

The (T' * S' - N) < H condition can be rewritten as (T' * S') < (H + N), чwhich estimates if the next idle time Minor Garbage Collection would overrun the average idle task deadline. If it does then V8 needs to perform idle time Minor Garbage Collection in the current idle task. If not, in the young generation there is no enough objects for effective work.

In order to request idle time for Minor Garbage Collection from the Blink scheduler, V8 puts allocation bailout markers at every N bytes of the new generation. When an allocation from the JavaScript code crosses an allocation bailout marker, the allocation takes the slow path and invokes a V8 runtime function that posts an idle task to the Blink scheduler.

Through experimentation, it has been established that setting N = 512 Kb is sufficient to maintain throughput without compromising performance significantly. This value is low enough to enable V8 to have several opportunities to schedule idle tasks.

Major Garbage Collection Idle Time Scheduling

The process of launching the incremental marking will be discussed in more detail later. For now, let's assume that it is already running. Immediately upon initialization, V8 inserts an idle task into the Blink scheduler. This task directly handles the marking process.

Based on the average marking speed, the idle task aims to achieve the maximum amount of work possible during the current idle period.

Let Tidle be the deadline in seconds of the idle task,

M be the mark- ing speed in byte per second.

Then,

Tidle * M is a number of bytes will be marked in the idle task.

The idle task iteratively requeues itself until the entire heap is marked. Afterward, V8 schedules the task to finalize Major Garbage Collection. The finalization process will be executed if the estimated time for this task meets the deadline. V8 determines the finalization time based on the speed of previous finalizations.

Memory Reducer

Memory reducer is a controller for scheduling Major Garbage Collections which tries to reclaim memory for inactive webpages during idle time.

The Major Garbage Collector starts working when the heap reaches a certain predefined size. This size is calculated and set at the end of the previous Major Garbage Collection based on the heap growing factor f and the total size of live objects in the old generation.

limit' = f * size

The next Major Garbage Collection is queued when the number of bytes allocated since the previous collection exceeds this threshold limit.

Everything is fine as long as the web page steadily allocates memory. However, if the web page becomes idle, memory allocation stops, and thus, the set limit will not be reached, and Major Garbage Collection will not be triggered.

Often, many web pages demonstrate high memory allocation intensity at the moment of loading because they initialize their internal data structures. A few seconds or minutes after loading, the web page often becomes inactive, holding a large number of unnecessary objects in memory. This can lead to a decrease in memory allocation speed and, consequently, a decrease in the speed of executing JavaScript code.

The diagram above shows an example of Major Garbage Collection scheduling. The first collection occurs at time t1, as the heap size limit is reached. V8 sets a new limit based on the heap growing coefficient and the heap size. The next call for Major Garbage Collection will happen at time t2 when the new limit is reached, then at time t3, and so on. The dotted line represents the heap size without using of the memory reducer.

In this way, the memory reducer can trigger Major Garbage Collection without waiting for the allocation limit to be reached. However, uncontrolled reduction of the limit can lead to deteriorating latency performance. Therefore, the V8 team has developed a heuristic method that relies not only on the idle time provided by the Blink scheduler between frames but also on whether the page is currently inactive. The indicators of page activity are the JavaScript allocation speed and the rate of JavaScript calls from the browser. When both speeds drop below a specified threshold, the page is considered inactive.

The diagram illustrates the states and transitions of the memory reducer. In the done state, the controller waits for a signal indicating that a sufficient number of allocations have been made to guarantee the initiation of garbage collection. At this point, a so-called non-idle Major Garbage Collection is triggered upon reaching the allocation limit, signaling a transition to the wait(i) state. In this state, the controller waits for the web page to become inactive. Once this happens, the incremental marking process is initiated, transitioning to the run(i) state until the Major Garbage Collection completes its operation. If subsequent calculations suggest that another Major Garbage Collection could likely free up more memory, the controller moves to the wait(i+1) state. Otherwise, it returns to the initial done state.

The key part of the controller is a well-tuned allocation rate threshold. Initially, this threshold was of a fixed value. Overall, this approach worked well on desktop devices. However, on slower systems, such as mobile devices, it did not work. Therefore, developers had to implement dynamic adaptation for different hardware configurations.

Let g is the Major Garbage Collection speed,

a is the allocation rate.

Then,

μ = g/(g+a)

This ratio μ can be thought of as the mutator utilization for the time window from now until the next Major Garbage Collection, under the assumption that the current allocation rate stays constant and the heap is currently empty:

Tmu = limit/a               (mutator time)

  Tgc = limit/g             (GC time)
  
    μ = Tmu /(Tmu + Tgc )   (mutator utilization)
    
      = (limit /a)/(limit /a + limit /g)
      
      = g/(g + a)

This gives us the condition for inactive allocation rate: μ ≥ μinactive , where μinactive is a fixed constant.

In V8 code, µinactive has a value of 0.993.

bool Heap::HasLowOldGenerationAllocationRate() {
  double mu = ComputeMutatorUtilization(
      "Old generation",
      tracer()->OldGenerationAllocationThroughputInBytesPerMillisecond(),
      tracer()->CombinedMarkCompactSpeedInBytesPerMillisecond());
  const double kHighMutatorUtilization = 0.993;
  return mu > kHighMutatorUtilization;
}


My telegram channels:

EN - https://t.me/frontend_almanac
RU - https://t.me/frontend_almanac_ru

Русская версия: https://blog.frontend-almanac.ru/v8-garbage-collection