April 23, 2024

JavaScript Optimisation. Inline Caches

I think it is not a secret for anyone that all popular JavaScript engines have a similar code execution pipeline. It looks something like this. The interpreter quickly compiles JavaScript code into bytecode "on the fly". The resulting bytecode starts executing and is processed in parallel by the optimizer. The optimizer needs time for this processing, but in the end, highly optimized code can be obtained, which will work much faster. In the V8 engine, Ignition acts as the interpreter, and Turbofan acts as the optimizer. In the Chakra engine, which is used in Edge, instead of one optimizer, there are two - SimpleJIT and FullJIT. In JSC (Safari), there are three Baseline, DFG, and FTL optimizers. The specific implementation may vary in different engines, but the basic scheme is generally the same.

Today, we will talk about one of the many optimization mechanisms called Inline Caches.

So, let's take a very ordinary function and try to understand how it will work at runtime.

function getX(obj) {
  return obj.x;
}

Our function is rather primitive. It is a simple getter that takes an object as an argument and returns the value of the x property of that object. From the perspective of a JavaScript developer, the function looks absolutely atomic. However, let's dive under the hood of the engine and see what it represents in its compiled form.

For experiments, as usual, we will use the most popular JS engine V8 (as of the time of writing this article, version 12.6.72). For full debugging, in addition to the body of the function itself, we need to call it with a real argument. Also, let's add output of information about the object, which will be needed a bit later.

function getX(obj) {
  %DebugPrint(obj);
  return obj.x;
}

getX({ x: 1 });

I would like to remind that %DebugPrint is a built-in function in V8. In order to use it in the code, you need to run the d8 runtime with the --allow-natives-syntax flag. Additionally, let's print the bytecode of the executable script to the console (using the --print-bytecode flag).

%> d8 --print-bytecode --allow-natives-syntax index.js 
...
// байткод функции getX
[generated bytecode for function: getX (0x0b75000d9c29 <SharedFunctionInfo getX>)]
Bytecode length: 10
Parameter count 2
Register count 0
Frame size 0
         0x28c500002194 @    0 : 65 af 01 03 01    CallRuntime [DebugPrint], a0-a0
         0x28c500002199 @    5 : 2d 03 00 00       GetNamedProperty a0, [0], [0]
         0x28c50000219d @    9 : ab                Return
Constant pool (size = 1)
0xb75000d9dcd: [FixedArray] in OldSpace
 - map: 0x0b7500000565 <Map(FIXED_ARRAY_TYPE)>
 - length: 1
           0: 0x0b7500002b91 <String[1]: #x>
Handler Table (size = 0)
Source Position Table (size = 0)
// Далее информация о переданном объекте
DebugPrint: 0xb75001c9475: [JS_OBJECT_TYPE]
 - map: 0x0b75000d9d81 <Map[16](HOLEY_ELEMENTS)> [FastProperties]
 - prototype: 0x0b75000c4b11 <Object map = 0xb75000c414d>
 - elements: 0x0b75000006cd <FixedArray[0]> [HOLEY_ELEMENTS]
 - properties: 0x0b75000006cd <FixedArray[0]>
 - All own properties (excluding elements): {
    0xb7500002b91: [String] in ReadOnlySpace: #x: 1 (const data field 0), location: in-object
 }
0xb75000d9d81: [Map] in OldSpace
 - map: 0x0b75000c3c29 <MetaMap (0x0b75000c3c79 <NativeContext[285]>)>
 - type: JS_OBJECT_TYPE
 - instance size: 16
 - inobject properties: 1
 - unused property fields: 0
 - elements kind: HOLEY_ELEMENTS
 - enum length: invalid
 - stable_map
 - back pointer: 0x0b75000d9d59 <Map[16](HOLEY_ELEMENTS)>
 - prototype_validity cell: 0x0b7500000a31 <Cell value= 1>
 - instance descriptors (own) #1: 0x0b75001c9485 <DescriptorArray[1]>
 - prototype: 0x0b75000c4b11 <Object map = 0xb75000c414d>
 - constructor: 0x0b75000c4655 <JSFunction Object (sfi = 0xb7500335385)>
 - dependent code: 0x0b75000006dd <Other heap object (WEAK_ARRAY_LIST_TYPE)>
 - construction counter: 0

So, essentially, the compiled bytecode of the getX function looks as follows.

0x28c500002194 @    0 : 65 af 01 03 01    CallRuntime [DebugPrint], a0-a0
0x28c500002199 @    5 : 2d 03 00 00       GetNamedProperty a0, [0], [0]
0x28c50000219d @    9 : ab                Return

The first line is a call to the %DebugPrint function. It is purely auxiliary and does not affect the executable code; we can safely omit it. The next instruction is GetNamedProperty. Its purpose is to retrieve the value of the specified property from the passed object. The instruction takes three parameters as input. The first one is a reference to the object. In our case, the reference to the object is stored in a0, the first argument of the getX function. The second and third parameters are the address of the hidden class and the offset of the property in the descriptor array.

Object's Shape

In the article Object Structure in JavaScript Engines, I explained in detail what hidden classes, descriptor arrays, and how objects in JavaScript are structured. I won’t retell the whole article but will highlight only the essential facts. Every object has one or several hidden classes. Hidden classes are an internal engine mechanism inaccessible to JavaScript developers, at least without using special built-in engine methods. A hidden class represents the so-called shape of an object and all the necessary information about the object's properties. The object itself only stores property values and a reference to the hidden class. If two objects have the same shape, they will have a reference to the same hidden class.

const obj1 = { x: 1 }
const obj2 = { x: 2 }

//          {}     <- empty map
//          |
//        { x }    <- common map
//       /    \
// <obj1>      <obj2>

As properties are added to the object, its hidden class tree grows.

const obj1 = {}
obj1.x = 1;
obj1.y = 2;
obj1.z = 3;

// {} -> { x } -> { x, y } -> { x, y, z } -> <obj1>

Let's go back to the function we started with. Let's assume we passed it an object of the following type.

function getX(obj) {
  return obj.x;
}

getX({ y: 1, x: 2, z: 3 });

In order to access the value of property x, the interpreter needs to know: a) the address of the object's last hidden class; b) the offset of this property in the descriptors array.

d8> const obj = { y: 1, x: 2, z: 3};
d8>
d8> %DebugPrint(obj);
DebugPrint: 0x2034001c9435: [JS_OBJECT_TYPE]
 - map: 0x2034000d9cf9 <Map[24](HOLEY_ELEMENTS)> [FastProperties]
 - prototype: 0x2034000c4b11 <Object map = 0x2034000c414d>
 - elements: 0x2034000006cd <FixedArray[0]> [HOLEY_ELEMENTS]
 - properties: 0x2034000006cd <FixedArray[0]>
 - All own properties (excluding elements): {
    0x203400002ba1: [String] in ReadOnlySpace: #y: 1 (const data field 0), location: in-object
    0x203400002b91: [String] in ReadOnlySpace: #x: 2 (const data field 1), location: in-object
    0x203400002bb1: [String] in ReadOnlySpace: #z: 3 (const data field 2), location: in-object
 }
0x2034000d9cf9: [Map] in OldSpace
 - map: 0x2034000c3c29 <MetaMap (0x2034000c3c79 <NativeContext[285]>)>
 - type: JS_OBJECT_TYPE
 - instance size: 24
 - inobject properties: 3
 - unused property fields: 0
 - elements kind: HOLEY_ELEMENTS
 - enum length: invalid
 - stable_map
 - back pointer: 0x2034000d9cd1 <Map[24](HOLEY_ELEMENTS)>
 - prototype_validity cell: 0x203400000a31 <Cell value= 1>
 - instance descriptors (own) #3: 0x2034001c9491 <DescriptorArray[3]>
 - prototype: 0x2034000c4b11 <Object map = 0x2034000c414d>
 - constructor: 0x2034000c4655 <JSFunction Object (sfi = 0x203400335385)>
 - dependent code: 0x2034000006dd <Other heap object (WEAK_ARRAY_LIST_TYPE)>
 - construction counter: 0

In the example above, the hidden class is located at the address 0x2034000d9cd1 (back pointer).

d8> %DebugPrintPtr(0x2034000d9cd1)
DebugPrint: 0x2034000d9cd1: [Map] in OldSpace
 - map: 0x2034000c3c29 <MetaMap (0x2034000c3c79 <NativeContext[285]>)>
 - type: JS_OBJECT_TYPE
 - instance size: 24
 - inobject properties: 3
 - unused property fields: 1
 - elements kind: HOLEY_ELEMENTS
 - enum length: invalid
 - back pointer: 0x2034000d9c89 <Map[24](HOLEY_ELEMENTS)>
 - prototype_validity cell: 0x203400000a31 <Cell value= 1>
 - instance descriptors #2: 0x2034001c9491 <DescriptorArray[3]>
 - transitions #1: 0x2034000d9cf9 <Map[24](HOLEY_ELEMENTS)>
     0x203400002bb1: [String] in ReadOnlySpace: #z: (transition to (const data field, attrs: [WEC]) @ Any) -> 0x2034000d9cf9 <Map[24](HOLEY_ELEMENTS)>
 - prototype: 0x2034000c4b11 <Object map = 0x2034000c414d>
 - constructor: 0x2034000c4655 <JSFunction Object (sfi = 0x203400335385)>
 - dependent code: 0x2034000006dd <Other heap object (WEAK_ARRAY_LIST_TYPE)>
 - construction counter: 0
0x2034000c3c29: [MetaMap] in OldSpace
 - map: 0x2034000c3c29 <MetaMap (0x2034000c3c79 <NativeContext[285]>)>
 - type: MAP_TYPE
 - instance size: 40
 - native_context: 0x2034000c3c79 <NativeContext[285]>

From there, we can access the descriptors array at the address 0x2034001c9491 (instance descriptors).

d8> %DebugPrintPtr(0x2034001c9491)
DebugPrint: 0x2034001c9491: [DescriptorArray]
 - map: 0x20340000062d <Map(DESCRIPTOR_ARRAY_TYPE)>
 - enum_cache: 3
   - keys: 0x2034000daaa9 <FixedArray[3]>
   - indices: 0x2034000daabd <FixedArray[3]>
 - nof slack descriptors: 0
 - nof descriptors: 3
 - raw gc state: mc epoch 0, marked 0, delta 0
  [0]: 0x203400002ba1: [String] in ReadOnlySpace: #y (const data field 0:s, p: 2, attrs: [WEC]) @ Any
  [1]: 0x203400002b91: [String] in ReadOnlySpace: #x (const data field 1:s, p: 1, attrs: [WEC]) @ Any
  [2]: 0x203400002bb1: [String] in ReadOnlySpace: #z (const data field 2:s, p: 0, attrs: [WEC]) @ Any
0x20340000062d: [Map] in ReadOnlySpace
 - map: 0x2034000004c5 <MetaMap (0x20340000007d <null>)>
 - type: DESCRIPTOR_ARRAY_TYPE
 - instance size: variable
 - elements kind: HOLEY_ELEMENTS
 - enum length: invalid
 - stable_map
 - non-extensible
 - back pointer: 0x203400000061 <undefined>
 - prototype_validity cell: 0
 - instance descriptors (own) #0: 0x203400000701 <DescriptorArray[0]>
 - prototype: 0x20340000007d <null>
 - constructor: 0x20340000007d <null>
 - dependent code: 0x2034000006dd <Other heap object (WEAK_ARRAY_LIST_TYPE)>
 - construction counter: 0

The interpreter can then locate the desired property by its name and retrieve the offset, which in this case is 1.

Thus, the interpreter's path will look something like this.

Our case is quite straightforward. Here, the values of the properties are stored inside the object itself, which makes access to them pretty fast. However, the determination of the property's position in the array of descriptors will still be proportional to the number of properties themselves. Furthermore, in the article Object Structure in JavaScript Engines, I mentioned that values are not always stored in such a "fast" manner. In some cases, the storage type may be changed to a slower "dictionary."

Inline Cache

Of course, in isolated cases for a JS engine, this does not pose significant difficulties, and the time it takes to access property values is hardly noticeable to the naked eye. But what if our case is not isolated? Let's say we need to execute a function hundreds or thousands of times in a loop? The total time of operation becomes critically important here. Given that the function actually performs the same operations, JavaScript authors have proposed optimizing the property search process to avoid executing it every time. All that is required for this is to store the object's hidden class address and the "offset" of the desired property.

Let's go back to the beginning of the article and the GetNamedProperty instruction, which has three parameters. We have already determined that the first parameter is a reference to the object. The second parameter is the hidden class address. The third parameter is the found "offset" of the property. Once these parameters are defined, the function can memorize them and not perform the search procedure again during the next execution. This caching is called Inline Cache (IC).

TurboFan

However, it is worth considering that parameter memoization also requires memory and some CPU time. This makes the mechanism efficient only during intensive function execution (so-called hot function). The intensity of function execution depends on the quantity and frequency of its calls. The optimizer is responsible for assessing this intensity. In the case of V8, TurboFan acts as the optimizer. First of all, the optimizer constructs a graph from the AST and bytecode and sends it to the "inlining" phase, where metrics for calls, loads, and cache storage are gathered. Next, if a function is suitable for inline caching, it is queued for optimization. Once the queue is full, the optimizer must identify the hottest function and perform caching on it. Then move on to the next one and so on. By the way, unlike its predecessor Crankshaft, which took functions one by one starting from the first. In general, this topic is quite extensive and deserves a separate article; there is no point in discussing all the details and peculiarities of TurboFan's heuristics now. It's better to move on to examples.

IC Types

To analyze the IC operation, you need to enable logging in the runtime environment. For d8, you can do this by specifying the --log-ic flag.

%> d8 --log-ic

function getX(obj) {
  return obj.x;
}

for (let i = 0; i < 10; i++) {
  getX({ x: i });
}

As I have mentioned before, TurboFan starts to cache object properties only in "hot" functions. To make our function "hot," we need to run it multiple times in a loop. In my simple script in the d8 environment, it required a minimum of 10 iterations. In practice, in different conditions and with other functions present, this number may vary and is likely to be higher. The resulting log can now be analyzed using the System Analyzer.

Grouped by type: 1#
>100.00%	1	LoadIC
Grouped by category: 1#
>100.00%	1	Load
Grouped by functionName: 1#
>100.00%	1	~getX
Grouped by script: 1#
>100.00%	1	Script(3): index.js
Grouped by sourcePosition: 1#
>100.00%	1	index.js:3:14
Grouped by code: 1#
>100.00%	1	Code(JS~)
Grouped by state: 1#
>100.00%	1	0 → 1
Grouped by key: 1#
>100.00%	1	x
Grouped by map: 1#
>100.00%	1	Map(0x3cd2000d9e61)
Grouped by reason: 1#
>100.00%	1	

In the IC List, we see the notation of the type LoadIC, which indicates accessing an object property from the cache. Here we have the function itself functionName: ~getX, the address of the hidden class Map(0x3cd2000d9e61), and the property name key: x. However, the most interesting part is state: 0 -> 1.

There are several types of IC. The complete list looks as follows:

/src/common/globals.h#1481

// State for inline cache call sites. Aliased as IC::State.
enum class InlineCacheState {
  // No feedback will be collected.
  NO_FEEDBACK,
  // Has never been executed.
  UNINITIALIZED,
  // Has been executed and only one receiver type has been seen.
  MONOMORPHIC,
  // Check failed due to prototype (or map deprecation).
  RECOMPUTE_HANDLER,
  // Multiple receiver types have been seen.
  POLYMORPHIC,
  // Many DOM receiver types have been seen for the same accessor.
  MEGADOM,
  // Many receiver types have been seen.
  MEGAMORPHIC,
  // A generic handler is installed and no extra typefeedback is recorded.
  GENERIC,
};

In the system analyzer, these types are denoted by symbols

0: UNINITIALIZED
X: NO_FEEDBACK
1: MONOMORPHIC
^: RECOMPUTE_HANDLER
P: POLYMORPHIC
N: MEGAMORPHIC
D: MEGADOM
G: GENERIC

For example, the type X (NO_FEEDBACK) indicates that the optimizer has not yet gathered sufficient statistics to optimize the function. In our case, seeing state: 0 -> 1 means that the function has transitioned to the state of "monomorphic IC".

Monomorphic

As mentioned before, in practice, the same function can take an object as an argument. The shape of this object in a specific function call may differ from the previous one, which complicates the optimizer's task. The least trouble arises when we pass only objects of the same shape to the function, as in our last example.

function getX(obj) {
  return obj.x; // Monomorphic IC
}

for (let i = 0; i < 10; i++) {
  getX({ x: i }); // all objects have the same shape
}

In such cases, the optimizer simply needs to remember the address of the hidden class and the offset property. This type of IC is called monomorphic.

Polymorphic

Now, let's try to add a function call with object of a different shape.

function getX(obj) {
  return obj.x; // Polymorphic IC
}

for (let i = 0; i < 10; i++) {
  getX({ x: i, y: 0 });
  getX({ y: 0, x: i });
}

In the "IC List" now, we can see:

50.00%        1    0 → 1
50.00%        1    1 → P

A function in runtime can receive multiple different forms of an object. Access to property x will vary for each form. Each form has its own hidden class address and its own offset of this property. In this case, the optimizer allocates two slots for each form of the object, where it saves the necessary parameters.

Such structure can be conceptually represented as an array of parameter sets.

[
  [Map1, offset1],
  [Map2, offset2],
  ...
]

This type of IC is called polymorphic. Polymorphic IC has a limit on the number of permissible forms.

/src/wasm/wasm-constants.h#192

// Maximum number of call targets tracked per call.
constexpr int kMaxPolymorphism = 4;

By default, a polymorphic type can have from 2 to 4 forms per function. However, this parameter can be adjusted with the flag --max-valid-polymorphic-map-count.

Megamorphic

If the number of object forms exceeds what Polymorphic can handle, the type changes to megamorphic.

function getX(obj) {
  return obj.x; // Megamorphic IC
}

for (let i = 0; i < 10; i++) {
  getX({ x: i });
  getX({ prop1: 0, x: i });
  getX({ prop1: 0, prop2: 1, x: i });
  getX({ prop1: 0, prop2: 1, prop3: 2, x: i });
  getX({ prop1: 0, prop2: 1, prop3: 2, prop4: 3, x: i });
}

The result of the IC List:

40.00%        2    P → P
20.00%        1    0 → 1
20.00%        1    1 → P
20.00%        1    P → N

In this case, the search for the required set of parameters negates the saving of CPU time and, therefore, does not make sense. That is why the optimizer simply caches the MegamorphicSymbol symbol. For subsequent function calls, this will mean that the cached parameters are not available here and will have to be taken in the usual way. There is also no point in further optimizing the function and collecting its metrics.

Conclusion

You have probably noticed that there is an additional type called MEGADOM in the IC types list. This type is used for caching the DOM tree nodes. The thing is, the inline caching mechanism is not limited to functions and objects only. It is actively used in many other places, including outside V8. Covering all the information about caching at once is physically impossible. Since today we are talking about JavaScript objects, we will not discuss the MegaDom type in this article.

Let's conduct a couple of tests and see how effective Turbofan optimization works in V8. The experiment will be conducted in the Node.js environment (the latest stable version at the time of writing this article is v20.12.2).

Experimental code:

const N = 1000;

//===

function getXMonomorphic(obj) {
  let sum = 0;

  for (let i = 0; i < N; i++) {
    sum += obj.x;
  }

  return sum;
}

console.time('Monomorphic');

for (let i = 0; i < N; i++) {
  getXMonomorphic({ x: i });
  getXMonomorphic({ x: i });
  getXMonomorphic({ x: i });
  getXMonomorphic({ x: i });
  getXMonomorphic({ x: i });
}

console.timeLog('Monomorphic');

//===

function getXPolymorphic(obj) {
  let sum = 0;

  for (let i = 0; i < N; i++) {
    sum += obj.x;
  }

  return sum;
}

console.time('Polymorphic');

for (let i = 0; i < N; i++) {
  getXPolymorphic({ x: i, y: 0 });
  getXPolymorphic({ y: 0, x: i });
  getXPolymorphic({ x: i, y: 0 });
  getXPolymorphic({ y: 0, x: i });
  getXPolymorphic({ x: i, y: 0 });
}

console.timeEnd('Polymorphic');

//===

function getXMegamorphic(obj) {
  let sum = 0;

  for (let i = 0; i < N; i++) {
    sum += obj.x;
  }

  return sum;
}

//===

console.time('Megamorphic');

for (let i = 0; i < N; i++) {
  getXMegamorphic({ x: i });
  getXMegamorphic({ prop1: 0, x: i });
  getXMegamorphic({ prop1: 0, prop2: 1, x: i });
  getXMegamorphic({ prop1: 0, prop2: 1, prop3: 2, x: i });
  getXMegamorphic({ prop1: 0, prop2: 1, prop3: 2, prop4: 3, x: i });
}

console.timeLog('Megamorphic');

To begin with, let's run the script with optimization turned off and observe the "clean" performance of the functions.

%> node --no-opt test.js
Monomorphic: 68.55ms
Polymorphic: 69.939ms
Megamorphic: 85.045ms

For the sake of the experiment's integrity, I repeated the tests several times. The performance between monomorphic and polymorphic objects is approximately the same. Polymorphic objects can sometimes even be faster. This is not so much related to the workings of V8 itself as it is to system resources. However, the speed of megamorphic objects is slightly lower due to the formation of a hidden class tree at this stage, making access to the properties of these objects inherently more complex than in the first two cases.

Now let's run the same script with optimization enabled:

%> node test.js
Monomorphic: 9.313ms
Polymorphic: 9.673ms
Megamorphic: 29.104ms

The speed of monomorphic and polymorphic functions has increased by about 7 times. Similar to the previous case, the difference between these two types is insignificant, and in repeated tests, polymorphic functions are sometimes even faster. However, the speed of the megamorphic function has increased by only 3 times. Generally, based on the theory described above, megamorphic functions were not supposed to show an increase in speed through optimization. However, it is not that simple. Firstly, they do have a side effect from optimization since such functions are excluded from the process of further metric collection. This is a small but still an advantage over the other types. Secondly, the optimization of JS work is not limited to inline caching of object property access. There are several other types of optimizations that we did not consider in this article.

Moreover, in 2023, Google released the Chromium M117 version, which included the new optimizer Maglev. It was integrated as a middleware between Ignition (interpreter) and Turbofan (optimizer). Now the optimization process has acquired a three-tiers architecture and looks like Ignition -> Maglev -> Turbofan. Maglev contributes to function optimization, particularly working with function arguments. We will discuss this in more detail another time. For now, we can conclude that megamorphic functions are approximately twice as slow as monomorphic and polymorphic ones.


My telegram channels:

EN - https://t.me/frontend_almanac
RU - https://t.me/frontend_almanac_ru

Русская версия: https://blog.frontend-almanac.ru/js-optimisation-ic