January 4

Why typeof null === "object"?

The task of the unary typeof operator is to return a string representation of the operand type. In other words, typeof 1 will return the string "number", and typeof "" will return "string". All possible values of the types returned by the typeof operator are set out in the ECMA-262 - 13.5.1 specification. According to the idea, the value returned by the operator must correspond to the data types accepted in the same specification. However, upon closer examination, it can be noted that typeof null should return "object", despite the fact that Null is quite an independent type, it is described in section 6.1.2. The reason for this is the usual human factor, or, simply, an innocent error in the code. How this error could happen, let's try to figure out in this article.

Mocha

It is worth starting, perhaps, from the very beginning of JavaScript, namely, the prototype language Mocha, created by Brendan Eich in 1995 in just 10 days, which was later renamed to LiveScript, and even later, in 1996, JavaScript became known to us today.

Unfortunately, the source code of Mocha has been never published and we do not know exactly how it looked back in 1995, however, in the comments to an article on Dr. Alex Rauschmayer's blog, Eich wrote that he used the "Discriminated Union" technique, it is also "Tagged Union", where he used struct with two fields.

The structure could look like this, for example:

enum JSType {
  OBJECT, 
  FUNCTION, 
  NUMBER, 
  STRING, 
  BOOLEAN, 
};

union JSValue { 
  std::string value;
  // ... other details
};

struct TypeOf {
  JSType type;
  JSValue values;
};

In the same article, Alex Rauschmayer gives an example of the SpiderMonkey engine code (used in Mozilla Firefox) from 1996

JS_PUBLIC_API(JSType)
JS_TypeOfValue(JSContext *cx, jsval v)
{
    JSType type = JSTYPE_VOID;
    JSObject *obj;
    JSObjectOps *ops;
    JSClass *clasp;

    CHECK_REQUEST(cx);
    if (JSVAL_IS_VOID(v)) {
        type = JSTYPE_VOID;
    } else if (JSVAL_IS_OBJECT(v)) {
        obj = JSVAL_TO_OBJECT(v);
        if (obj &&
            (ops = obj->map->ops,
             ops == &js_ObjectOps
             ? (clasp = OBJ_GET_CLASS(cx, obj),
                clasp->call || clasp == &js_FunctionClass)
             : ops->call != 0)) {
            type = JSTYPE_FUNCTION;
        } else {
            type = JSTYPE_OBJECT;
        }
    } else if (JSVAL_IS_NUMBER(v)) {
        type = JSTYPE_NUMBER;
    } else if (JSVAL_IS_STRING(v)) {
        type = JSTYPE_STRING;
    } else if (JSVAL_IS_BOOLEAN(v)) {
        type = JSTYPE_BOOLEAN;
    }
    return type;
}

Although the algorithm differs from the original Mocha code, it illustrates the essence of the error well. It just doesn't have a Null type check. Instead, in the case of val === "null", the algorithm gets into the else if (JSVAL_IS_OBJECT(v)) branch and returns JSTYPE_OBJECT

Why "object"?

The fact is that the value of a variable in early versions of the language was a 32-bit unsigned number (uint_32), where the first three bits indicate the type of the variable. With this scheme, the following values of these first three bits were taken:

  • 000: object - the variable is a reference to an object
  • 001: int - the variable contains 31-bit integer number
  • 010: double - the variable is a reference to a number with floating point
  • 100: string - the variable is a reference to a sequence of chars
  • 110: boolean - the variable is a boolean value

In turn, Null was a pointer to a machine nullptr, which, in turn, looks like 0x00000000

Therefore, checking JSVAL_IS_OBJECT(0x00000000) returns true, because the first three bits are 000, which corresponds to the object type.

Attempts to fix the bug

Later, this problem was recognized as a bug. In 2006, Eich proposed to deprecate the typeof operator and replace it with the type() function, which would take into account, among other things, Null (an archived copy of the proposal). The function could be built-in or be part of an optional reflection package. However, in any case, such a fix would not be backward compatible with previous versions of the language, which would create many problems with existing JavaScript code written by developers around the world. It would have required creating a code version checking mechanism and/or custom language options, which did not look realistic.

As a result, the proposal was not accepted, and the typeof operator in the ECMA-262 specification remained in its original form.

Even later, in 2017, another proposal was put forward Builtin.is and Builtin.typeOf. The main motivation is that the instanceof operator does not guarantee that the types of variables from different realms are checked correctly. The proposal was not directly related to Null, however, its text suggested correcting this bug by creating a new Builtin.typeOf() function. The proposal was also not accepted, because the edge case demonstrated in the motivational part, although not very elegant, can be solved by existing methods.

Modern Null

As I wrote above, the bug appeared in 1995 in the prototype Mocha language, even before the advent of JavaScript itself and until 2006, Brendan Eich did not give up hope of fixing it. However, since 2017, neither the developers nor ECMA have tried to do this anymore. Since then, JavaScript has become much more complex, as have its implementations in popular engines.

SpiderMonkey

There is no trace of the SpiderMonkey code that Alex Rauschmayer published on his blog in 2013. Now the engine (at the time of writing, version FF 121) takes typeof values from a predefined variable tag

JSType js::TypeOfValue(const Value& v) {
  switch (v.type()) {
    case ValueType::Double:
    case ValueType::Int32:
      return JSTYPE_NUMBER;
    case ValueType::String:
      return JSTYPE_STRING;
    case ValueType::Null:
      return JSTYPE_OBJECT;
    case ValueType::Undefined:
      return JSTYPE_UNDEFINED;
    case ValueType::Object:
      return TypeOfObject(&v.toObject());
#ifdef ENABLE_RECORD_TUPLE
    case ValueType::ExtendedPrimitive:
      return TypeOfExtendedPrimitive(&v.toExtendedPrimitive());
#endif
    case ValueType::Boolean:
      return JSTYPE_BOOLEAN;
    case ValueType::BigInt:
      return JSTYPE_BIGINT;
    case ValueType::Symbol:
      return JSTYPE_SYMBOL;
    case ValueType::Magic:
    case ValueType::PrivateGCThing:
      break;
  }
  
  ReportBadValueTypeAndCrash(v);
}

Now the engine knows exactly what type of variable is passed to the operator, because after declaring, the variable object contains a bit indicating its type. For Null, the operator returns the value of JSTYPE_OBJECT explicitly, as required by the specification

enum JSValueType : uint8_t {
  JSVAL_TYPE_DOUBLE = 0x00,
  JSVAL_TYPE_INT32 = 0x01,
  JSVAL_TYPE_BOOLEAN = 0x02,
  JSVAL_TYPE_UNDEFINED = 0x03,
  JSVAL_TYPE_NULL = 0x04,
  JSVAL_TYPE_MAGIC = 0x05,
  JSVAL_TYPE_STRING = 0x06,
  JSVAL_TYPE_SYMBOL = 0x07,
  JSVAL_TYPE_PRIVATE_GCTHING = 0x08,
  JSVAL_TYPE_BIGINT = 0x09,
#ifdef ENABLE_RECORD_TUPLE
  JSVAL_TYPE_EXTENDED_PRIMITIVE = 0x0b,
#endif
  JSVAL_TYPE_OBJECT = 0x0c,

  // This type never appears in a Value; it's only an out-of-band value.
  JSVAL_TYPE_UNKNOWN = 0x20
};

V8

A similar approach is used in the V8 engine (at the time of writing, version 12.2.165). Here, Null is the so-called Oddball type, i.e. an object of the Null type is initialized even before the execution of the JS code, and all subsequent references to the Null value lead to this single object.

The initializer of the Oddball class looks like this

void Oddball::Initialize(Isolate* isolate, Handle<Oddball> oddball,
                         const char* to_string, Handle<Object> to_number,
                         const char* type_of, uint8_t kind) {
  STATIC_ASSERT_FIELD_OFFSETS_EQUAL(HeapNumber::kValueOffset,
                                    offsetof(Oddball, to_number_raw_));

  Handle<String> internalized_to_string =
      isolate->factory()->InternalizeUtf8String(to_string);
  Handle<String> internalized_type_of =
      isolate->factory()->InternalizeUtf8String(type_of);
  if (IsHeapNumber(*to_number)) {
    oddball->set_to_number_raw_as_bits(
        Handle<HeapNumber>::cast(to_number)->value_as_bits(kRelaxedLoad));
  } else {
    oddball->set_to_number_raw(Object::Number(*to_number));
  }
  oddball->set_to_number(*to_number);
  oddball->set_to_string(*internalized_to_string);
  oddball->set_type_of(*internalized_type_of);
  oddball->set_kind(kind);
}

In addition to the Isolate zone, a reference to the value of the variable itself and enum type, it also explicitly takes the values toString, toNumber and typeof, which it will then store inside the class. This allows, when initializing the global heap, to determine the necessary values of these Oddball parameters

// Initialize the null_value.
Oddball::Initialize(isolate(), factory->null_value(), "null",
                    handle(Smi::zero(), isolate()), "object", Oddball::kNull);

Here we see that when initializing Null, the following are passed to the class: toString="null", toNumber=0, typeof="object".

The typeof operator itself simply takes the value through the class getter type_of()

// static
Handle<String> Object::TypeOf(Isolate* isolate, Handle<Object> object) {
  if (IsNumber(*object)) return isolate->factory()->number_string();
  if (IsOddball(*object))
    return handle(Oddball::cast(*object)->type_of(), isolate); // <- typeof null === "object"
  if (IsUndetectable(*object)) {
    return isolate->factory()->undefined_string();
  }
  if (IsString(*object)) return isolate->factory()->string_string();
  if (IsSymbol(*object)) return isolate->factory()->symbol_string();
  if (IsBigInt(*object)) return isolate->factory()->bigint_string();
  if (IsCallable(*object)) return isolate->factory()->function_string();
  return isolate->factory()->object_string();
}


My telegram channels:

EN - https://t.me/frontend_almanac
RU - https://t.me/frontend_almanac_ru

Русская версия: https://blog.frontend-almanac.ru/T6L4f8J6RCa