In The Grand Scheme Of Things

In which I try and describe what led me to develop Ki.

To develop an alternative is to implicly criticize the status quo, and while I won't shy away from that, I will explicitly say that while I think we (and specifically I) can do better, what has happened before is outstanding work. In no way do I think myself smarter, better educated, or superior to other programmers or language designers, and I think that such polemics are obstacles to progress. We are all in this together.

Ki offers no new ideas. It can barely be argued that Ki is even an incremental improvement. Is a taco salad an improvement over a taco? Different yes, but an improvement?

While "better" should be up to the user, we can assess some things objectively. One may not be able to say that Go is better than C, but we can say Go is safer than C and therefore a better choice if safety is paramount.

Other attributes are quite subjective. Simplicity is an important goal of Ki, perhaps even as important as safety. But what is simplicity? Most programmers think that Python is a simple language, but it has a huge feature set, a huge standard library, complicated parsing demands, and very complicated runtime behavior. Brainfuck, on the other hand, has only 8 commands, yet programs written in it are inscrutable and often simply useless. Complexity has to live somewhere, but the way a language presents and organizes it determines the cognitive load required to deal with it.

Furthermore, "better" is rarely a simple choice. The majority of the time, factors other than features and suitability determine the use of a given language. Developing an iOS application generally means using Objective-C or Swift; developing a Windows application generally means using C++ or C#; writing a high-performance video game generally means using C++; developing an embedded platform for IoT devices generally means using C; developing an Android application generally means using Java; developing a website generally means using JavaScript. Even if we can objectively determine that a different language would be better for our specific task, the realities of platform, ecosystem, support, etc. almost always conspire to make our choices for us.

In order to avoid unproductive biasing, I endeavor to be clear. For example, rather than refer to Python as a simple language, it is more clear to point out where the complexity lies: in the parser, the runtime, and the lack of access to machine-level details (atomicity of memory operations, integer size). I welcome any suggestions on improving the documentation here, or the writing anywhere.

And now, let's get on with it :)

Choosing a Language

Personally, I tend to choose C, Go, Python and Lua when I can choose a programming language. My reasoning is straightforward: these languages are simple to read, simple to write, and do not require large runtimes or development environments. These values occupy the same "aesthetic headspace", for lack of a better phrase. When I work in these languages, I feel as though I can implement any system with the small number of tools provided to me and that I can hold all of those tools in my head. The languages and tooling fit in my instruction cache, as it were.

Applications Language vs. Systems language

I've been differentiating between "applications languages" and "systems languages" lately, and the discriminating qualities are:

  • You require 100% performance potential
  • You cannot accept GC pauses and memory overhead
  • You must work with arbitrary memory addresses

For want of a term, let's call these systems language requirements, or SLReqs.

I rarely need a language that satisfies the SLReqs (like C or C++); usually what I'm looking for is an applications language. After articulating this difference, I realized that because I so rarely need the above-listed capabilities, I'm willing to give them up completely if I can achieve improvements in other areas.

Safety

Contemporary languages provide safety in three main ways:

  • Garbage collection
  • Runtime checks
  • Type safety

In fact, there are only two mainstream languages without garbage collection and runtime checks: C and C++. Notably, Rust is on its way to becoming a mainstream language without garbage collection, but it still has runtime checks for Vec indexes and so on.

There are many other ways programs can be unsafe: data races, NULL/nil/None values/pointers, out-of-bounds accesses, dangling pointers, uninitialized reads, integer overflow, etc. All mainstream languages have at least one of these issues, and nearly all of them have nearly all of them at runtime.

Runtime errors are a tradeoff. On the one hand, some errors are much easier to check for at runtime: data races, integer overflow, out-of-bounds accesses, and so on. On the other, it would be better if we could discover these problems without having to run first, or if we could somehow structure our runtimes and programs so that these problems couldn't even exist. Garbage collection is probably the most successful realization of this strategy. It fixes a large, seemingly intractable problem completely.

Garbage collection comes with well-known tradeoffs though: pauses and overhead. Applications that need the SLReqs therefore cannot use it, which means that the best method we have of providing memory safety is unavailable to the software which arguably needs it the most.

Such is the case with other safety features. Runtime checks like bounds checking, data race checking, and overflow checking come with moderate to severe performance penalties. Nearly all C++ code is compiled without RTTI and exceptions, foregoing much of the safety gained by its strong type system in the interest of performance.

Rust is bucking this trend. It is a systems language that guarantees memory safety without garbage collection. Granted it doesn't provide both at the same time; safe pointer arithmetic and manipulation of arbitrary memory is an oxymoron, which is why Rust requires such manipulation be done in an unsafe block. That aside, Rust's capabilities have the potential to be an extraordinary contribution to systems programming. It is usable, expressive, powerful and extensible, and I think that in the coming years, programmers releasing new C++ projects will have to defend their choice of C++ over Rust, and it seems as though the viable defenses are few and decreasing in number.

Compile-Time Safety, Runtime Safety, and the Lack of Safety

As briefly noted above, Rust still provides many safety features as runtime checks, which have two major downsides:

  • It is harder to discover problems using only runtime checks
  • Runtime checks almost always incur some kind of performance overhead

Like runtime checks, Compile-time checks are tradeoffs; they typically trade compiler complexity and overhead for improved determinism. Compared with runtime checks, their computational cost is paid at compile-time instead of runtime, which may be preferable or not depending upon the situation. For example, creating a compile-time check for out-of-bounds accesses is likely undecidable, but even if it is decidable, the computational effort required in order to ensure a program makes no out-of-bounds accesses is astronomical, on the order of increasing the time to compile a program from something like 10 minutes to hours, days, weeks, months, or even years.

When it comes to out-of-bounds accesses, faced with the choice between runtime performance degradation, unacceptably long compile times, or a lack of memory safety, C and C++ chose a lack of memory safety. Rust chooses runtime performance degradation, or the programmer can use the unsafe block. Go, Java, C# and modern scripting languages choose runtime performance degradation.

A Fourth Option

I think it might be possible to structure a program so that unsafe operations are impossible. Furthermore, I think it's possible to provide a higher standard of safety than just memory safety, as long as we're willing to give up some speed potential.

Finally, I think this is all possible while providing a simple, relatively small language.

Improving on Scripting Languages

Scripting languages all use garbage collection, very few have strong, static type systems, and nearly all are much slower than C. JavaScript has some of the fastest JIT compilers, and it averages about 50% of C's speed, uses far more memory, and requires a giant runtime.

Improving on Rust

Rust is not a hellaciously complicated language, but there are opportunities to simplify it if, again, we're willing to give up a small amount of performance potential. Rust is also occasionally inconsistent. Finally, Rust could be much safer.

Improving on C

It takes no small amount of hubris to consider improvements on something that has changed and shaped the world, so when I say "improving on C", I mean "things I personally wish C did". Whether other programmers consider these things improvements is up to them.

Improvements mitigated by Ki's focus on safety are omitted here, addressed elsewhere.

  • Data structures
    • string
    • array
    • list
    • table
    • set
    • queue (deque)
  • Better types
    • Type safety
    • First-class types
    • Derivative types
  • Nicer fixed-width types
  • Better support for working with functions
    • Nicer function pointer syntax
    • Anonymous functions
  • Arbitrary precision numbers
  • More powerful iterators
    • For-each loop
  • UTF-8 everywhere
  • Module system
    • No preprocessor
    • No headers
  • Operator overloading
  • Namespaces
  • Generators/Coroutines
  • In-place initialization
  • Type inference
  • Consistent error signaling/handling
  • Access control (beyond opaque structs)

Philosophical Design Decisions

Ki's guiding principles are rooted in empiricism. In designing it, I've endeavored to steal what has worked in other mainstream languages and avoid what hasn't. Again, of course, according to me.

"There should be one-- and preferably only one --obvious way to do it."

Ki is a pragmatic programming language. Some languages embody multiple programming paradigms, or allow the programmer to modify their syntax and semantics (Lisp-family macros). These features clearly are expressive and powerful. The Perl community has christined this paradigm as "there is more than one way to do it", or TIMTOWTDI. Lisp programmers tout macros as the one feature that unquestionably makes Lisp programming superior.

The tradeoff for TIMTOWTDI's expressiveness and power is consistency. A language that provides multiple constructs that fundamentally perform the same task requires more thought to process. Instead of looking for a single pattern or idiom, programmers have to look for multiple such patterns.

TIMTOWDI allows the programmer to express logic in ways they are comfortable with, or in ways that may better track the operations in various ways (folds, visitors, etc.). However, removing the possibility of choice from the programmer can paradoxically be freeing, as they no longer have to consider method.

For example, there are (arguably) no folds in C. A C programmer never has to consider whether to use a fold, and they also never have to look for one either. Consequently:

  • they spend no time considering the merits of a fold over a loop
  • they spend no time looking for folds in situations where either a loop or a fold could be used, only loops

The question is whether the benefits of folds outweigh the cognitive costs in having to deal with them. This is similar to the complexity issue: complexity has to exist somewhere, the only question is where. Supporting folds can bring clarity to many programming constructs, but settling on the loop as the only method of repeating an operation allows the programmer to focus on other matters.

Temporality

If the tradeoff for consistency is obsolescence, then I make that trade happily. I think that a situation involving two separate languages that are internally consistent is preferable to one language that is internally inconsistent.

Functional Design Decisions

Macros

Macros are confusing and unsafe. Macros are an admission that the language does or will do something badly and a plea to the programmer to keep using the language regardless. For example, most hash tables in C are implemented using macros. This is usually to maintain some semblance of type safety in a language without first-class types. In other words, macros in this example are, "we know C has no support for first-class types, but you can fake it with macros, so please keep using C regardless". The same can be said for Ruby DSLs, Rust's try! macros, and really macros in general.

Operator Overloading

I'm usually the first one to rant and rail against operator overloading, particularly C++'s implementation. But really what I dislike is C++'s downright malicious use of it. The most basic of operations, stream input/output, is a complete non sequitur: the bitwise left shift operator << moves data from one sink to the next.

We already had chained operations; they're called functions. But now we have a whole new way of calling a function that can't be disambiguated from a bitwise left shift.

So I am opposed to confusing use of operator overloading, in the same way that I'm opposed to confusing use of anything. The definition of "confusing use" is "it looks like something it's unrelated to". Just don't do that, and operator overloading is fantastic.

Generics

Generics are a great idea because (along with operator overloading) they allow the programmer to implement their own first-class data structures. Furthermore, with a few simple rules, they can be implemented in such a way as to avoid C++'s implementation pitfalls (infinitely recursive templates).

Variant Types

Really "variant types vs. option types"

Variant types are something of a fix. Rust uses option types (which are essentially a subset of variant types) to maintain type safety in the face of varying return values. The canonical example is searching; let's presume a function that searches for and returns a Vegetable. Search functions can return 0 or more sought-for values, and depending upon the situation, different value counts can represent errors. For example, if our design presumes that our GroceryStore will always have at least one Vegetable, but our search function with no filters returns zero results, that is an error condition. Similarly, if our GroceryList will only have a single instance of each Vegetable but we return more than one given a filter that should separate any Vegetable from any other, that too is an error condition. If the calling function doesn't handle these conditions, it might do something disastrous like dereference a NULL pointer, read from uninitialized memory, modify the wrong variable, etc.

To guard against this, Rust uses option types, which require the programmer to use the match construct, wherein all potential return values must be handled. This is expressive, safe, powerful, and an all-around good solution.

Ki approaches the searching problem a little differently. It considers GroceryStore to be different than GroceryList, and it considers them both to be separate from DietLog, which is a list of all foods eaten and may contain duplicates or (more often than we'd like to admit) no Vegetables at all.

Therefore a function that searches for a specific, unique Vegetable fails with an error if nothing is found or if multiple values are found. A function that filters a collection of Vegetables down based on some predicate is free to return a collection of Vegetables with a length ranging from 0 to the length of the original collection.

This is not to say that one cannot panic in Rust if no/multiple results are found, or that one cannot use variant types in Ki just like Rust uses option types. The difference is in the idiom used. Variant types and option types are explicit and verbose as they require the programmer to discriminate between variants. This explicitness can aid clarity.

However, the common case for searching is "do something if this value is found, otherwise I don't care", and because variant and option types require the programmer to handle all cases, the "otherwise I don't care" cases become a distraction from the main intent of the code.

The verbosity of variant types is warranted in cases where all cases should be handled equally. For a good example of this, see an example involving a file object.

NULL Pointers

NULL pointers are just a bad idea in general. They're either a boon to compiler writers or conflated with special "zero" pointer values. Either way, we can do better.

Smart/Unique/Owned Pointers

C++'s unique_ptr is an amazingly great idea. In Ki you get two kinds of pointers: unique_ptr and shared_ptr, although Ki calls them "heap values" and... haven't decided what to call reference counted pointers yet, haha. These correspond roughly to Rust's Box and Arc types.

Syntactic Sugar

The goal of syntactic sugar is to simplify the reading and writing of a common construct. While sometimes used as a derogatory term, many fundamental constructs such as function definitions, function calls, or loops could be considered syntactic sugar.

As decades go by, we find more and more things that programmers do a lot. At first it was loops and functions/procedures/subroutines, then it was composite data types, then it was the association of functions/procedures/subroutines with specific data types (objects), then it was inheritance, and so on.

Ki provides a fair amount of syntactic sugar, including shorthand for accessors/mutators, predicates, error handling, declaring constants, and so on. While this increases the complexity of the language and decreases its consistency, it also reduces the amount of boilerplate with two main goals:

  • reducing the potential for error, either via poor transcription or copy/paste errors
  • reducing the amount of irrelevant code or notation

Some notation and structure is important; the eye and the mind must be able to quickly ascertain the state of things based on keywords and the general structure of the code. However, too much structure becomes noise. Java's frequently trotted out for this, but other languages (C++, C#) can be just as bad.

Accessibility

While advanced features like operator overloading or introspection may lead to bad, wildly overcomplicated or 100% non-understandable code, they are so useful in some cases that leaving them out is counterproductive.

  • making advanced features more accessible so that programmers won't do crazy things like:
    • building their own object system
    • building an entire compile-time system annotation system as a library (which, while super cool, is full on terrifying)
    • using macros to implement higher-level concepts like coroutines and message passing
    • building their own systems for printing out the names of enumerated values
    • autogenerating accessor/mutator fields
    • eschewing accessor/mutator fields because of function call overhead
    • mandating accessor/mutator fields because it's going to happen someday, and someday may as well be today

I am least certain about design decisions in Ki that descend from this point. Every new bit of syntactic sugar is essentially a new, magic rule with no justification other than, "this is honestly just better". I'd like something a little more... intellectually grounded than that.