samskivert: Stars (and brackets) on thars

09 February 2012

One of my colleagues recently committed a change with the comment:

Log Message:
-----------
Switching from "type *varname" to "type* varname"

It gave me hope for the future, that we might overcome the legacy of K&R's terrible mistake when they pioneered the styles foo *pointer and foo array[].

The world has mostly come around to the view that [] is part of the type, not the variable (i.e. foo[] is how you declare 'array of foo'). In spite of this enlightenment, people cling to the syntax foo *pointer even though obviously foo* is how you declare 'pointer to foo'.

The reasoning is that you can declare:

foo *pointerToFoo, stackAllocatedFoo;

which is plainly confusing and a bad idea.

Gosling was originally under the spell of K&R, but eventually came to his senses. Unfortunately, in Java the legacy of that madness lives on:

public class Test {
    public static void main (String[] args) {
        String[] one = {}, two = {}, three[] = {};
        // one and two are of type String[], three is of type String[][]
    }
}

Naturally, C clings firmly to the past. The following is perfectly legal, if not the most maintainable, C:

#include <stdio.h>

int main (int argc, char** argv) {
    char* foo = 0, bar = 0;
    printf("%ld %ld\n", sizeof(foo), sizeof(bar));
    return 0;
}

// prints 8 1 (or 4 1 if you're on ye olde 32-bit machine)

The perspective of the enlightened language designer is that special-purpose type modifiers like * and [] are a bad idea.

Arrays can be handled simply. They should be a parameterized type. Scala and Haskell (and probably other Haskell-influenced languages) get this right:

val one :Array[String] = { "foo", "bar" };

No need for special syntax. In Scala's case they've disallowed the declaration of multiple variables in the same clause, except for this weird construct which is a concession made to support Scala's highly unfortunate enum pattern:

val onePlusOne, twoTimesOne, fourDivTwo :Int = 2; // all vals bound to 2

Pointers are trickier business. For one, a civilized language doesn't have pointers, which cuts the conversion pretty short. However, civilized languages do have have reference types, which are basically the same thing without the promise of naughtiness and adventure.

One can mostly get by with glossing over the difference between reference types and value types. When you write:

String foo = "bar";
int bar = 0xf00;

it's not a great mystery that foo is a pointer to a string and bar is just an int on the stack. However, if you allow the creation of value types that are larger than the machine's register size (e.g. Vector3, Matrix4, etc.), you will invariably want to pass them by reference.

Java cleverly avoids this conundrum by restricting itself to only built-in value types which fit into registers. That's the sort of approach that's likely to get you zero credit on your math homework.

C# takes a crack at a non-zero solution with ref parameters. You can declare:

void invert (ref Matrix4 matrix) { ... }

and your calls to invert will not involve copying 16 doubles. We can deduce Hejlsberg's enlightened status by the fact that it's ref Matrix4 and not Matrix4 ref (the latter being the moral equivalent of Matrix4*).

However, I expect that if the JVM some day supports value types, Odersky's modeling of them in Scala will be something even more enlightened, like:

void invert (matrix :Ref[Matrix4]) { ... }

Such an approach nicely mirrors Array[Matrix4] and avoids the need to introduce a new keyword into the language. Scala already supports implicit conversions, so a conversion from A to Ref[A] is as simple as:

implict toRef[A <: AnyVal] (value :A) :Ref[A] = null // implemented by compiler magic!

Under the hood, the compiler would emit the appropriate byte codes to indicate by-reference-ness, but there's no need to sully the language with evidence of those machinations when there are already perfectly good abstractions available to model them.

C# also supports out parameters, allowing one to write:

bool getId (out int id) { ... }

int id;
if (getId(id)) {
  // I have an id!
}

id is considered uninitialized in the body of getId and must be initialized before getId returns (modulo exceptions).

I'm not sure such a thing could be modeled with a standard OOP type system, so perhaps this is a good case for special syntax. However, this whole mechanism exists solely to support multiple return values. The combination of a Tuple value type, deconstructing binding, and under-the-hood optimizations solve this problem without special cases. The following is already legal Scala code:

def getId() :(Int,Boolean) = ...

val (id, gotId) = getId()
if (gotId) {
  // I have an id!
}

Just add value types to the JVM and said under-the-hood optimizations, and you're good to go.

Anyhow, I seem to have rambled pretty far afield. I'll stop philosophizing and go back to being happy that the world contains a few fewer type *foo declarations. Some day we may live in a world where there are no *s upon ours.

©1999–2022 Michael Bayne