Urbit: A Primer | Red Vice

If you’re here reading this shit thinking it’s about a visual novel or reverse engineering, turn back; beyond here lies nothing of value.

I, as many of my friends will tell you, am enamored by the idea of Urbit. Fundamentally, Urbit is about what would you do if you were given a supercomputer black box and told to invent all of computing.

Right now, our entire tech stack is built on projects from the ’70s, with things duct-taped and stapled on top. Your computer is still, at its core, running a 8086 Intel Processor and running the UNIX operating system. Everything is in C. The Internet is a series of tubes. There is no god.

If you had never heard of UNIX or C and had never needed to worry about processor cycles or saving bytes of RAM, you would obviously not end up with anything even slightly resembling the current computing environment. That’s what Urbit is: looking at all the problems that we currently have, noticing that they stem from things too ingrained to fix, and writing a whole new world.

Instead of Assembly, which mimics a universal Turing machine (memory tape, corresponding opcodes), Urbit’s base language is Nock, which mimics the SKI combinators. The memory model, instead of addressing RAM, is accessing axes of a binary tree of bignums. There are no pointers.

These concepts form the core of Urbit, and nearly all of the project is built on top of these first principles. If your entire computer can be described using a reduction of combinators on a context, then all computation is pure. Reducing nock(1 [4 0 1]) will never perform any task outside of directly computing 2. In fact, nock(1 [4 0 1]) will never not give you 2, no matter the state of the rest of your computer. Computation is both referentially transparent and pure; there is no hidden state or global variables influencing running functions, only what you pass in. Consequentially, if you run a function with some arguments, the output is always the same when provided the same arguments a second time.

Sample Urbit install with some Nock run

Ok, so you have a weird language that can add numbers and is based on binary trees. So what? Purely functional languages like the Lambda Calculus or SKI combinators are neat, but they’re also pretty useless: you give it a program, it gives you an answer. Urbit solves this by building an event loop that makes use of Nock. Your machine has a global state, and it takes in various events (keyboard input, http response, etc.). It provides an answer by reducing nock(current-state events-in), and the result is in the form of [new-state events-out] – a new global state, which it will use for the next event it receives, and a list of events for the operating system to execute (print a character to the screen, issue an http request, etc.) You have a normal C runtime that collects the events, and then passes them into the pure, shiny, and chrome world of Nock, which gives events back to C to act upon.

This results in two really, really neat features. For one, you can describe the life of your entire computer as a list of events. And if you have a list of events, say by logging everything that your computer receives, and replay them from the prime state (which is just a downloaded kernel), then you will have a byte-wise exact replica of your end state. You can verify that computer wasn’t changed by the NSA by replaying your entire life. You can copy your computer from Google’s datacenter to Amazon and ensure that no party has tampered with it because the value that Nock returns called new-state is your entire computer, independent of the non-persistent, difficult to serialize memory in RAM.

Which gives us the second neat feature: Urbit is crash-safe. If you get a bad event, say a malformed http request that causes Nock to crash (it’s not C, so you won’t have buffer overflows or dangling pointers or the like, but it can still crash to keep semantics), then your operating system can just roll back to the old state and throw away the event that crashed, purging it from the log. Because your entire computer is stored in the state value, there isn’t any concept of “turn if off and on again” – if you do, your computer just loads up the exact same state, so there’s no problem. If your operating system kernel panics, your entire event log is an ACID database that can’t be corrupted, so you can just replay the last couple events since your last state checkpoint. It’s basically magic.

With that said, a system built in this way forces you to fundamentally rethink pretty much everything else. You can’t use C, because Nock has the concept of functional purity and reduction built-in, so you need a new language. You could probably use Haskell for this, or F# or something. Anything side-effect free. However, Nock also has the concept of a world-state that you can touch and replace. Think fenv in Lua, where you can setfenv(f, {add = sub}) and now calling f(4,2) gives you 2 instead of 6. So instead, Urbit has a language called Hoon, which compiles down to Nock. It is, vaguely, Lisp-like in that your program is a textual representation of AST nodes. You don’t have if a==b return 1 else return 2 end, you have :if(=(a b) 1 2): remember, everything is still a reduction, so evaluating an if has to return the value of reducing one of its arms. In Hoon, all AST nodes only have expressions under them, with no differention between r-values or l-values like C, and unlike Lisp nearly all AST nodes know how many expressions they take. In Lisp, (if true a b) doesn’t know that it only takes three expressions because it’s just a generic function call. This leads to death by parenthesis overdose, a most harrowing disease. In Hoon, :if knows it takes 3 values, and so it doesn’t need parenthesis, making syntax and whitespace entirely optional. A function is a :gate a b, with a and b being arbitrary expressions. You don’t need to have piles of ))))) at the end, unless you’re doing a function call like (add 5 (sub 4 2)) or other n-ary expressions.

Urbit's online Hoon dojo

So you have an event loop and a programming language. You want to write IRC. Because your operating system event loop is [state events-in] -> [state events-out], your program should also has to have this format. Luckily, this has a name: a state machine, and they are basically great. Because your operating system never reboots, your program won’t either. And because your program’s entire state is just a single value, much like your operating system, if you update your programs code you can do it while it’s still running, so long as the type of the state nests within the definition of the new code’s state. If it doesn’t, then you can write a strongly typed adapter (f) from old-state to new-state (cast(new-state f(old-state)). This allows you to get code updates from the network, update your running code with no reboots or problems, and continue on. In fact, because programs mimic your operating system, you can do this with your entire kernel. Now the idea of “you can’t turn it off and on again” makes more sense, no?

Your IRC program, which is purely functional and magic, now wants to be able to send messages to your friend Bob. In 1970s UNIXLAND, this is done through a central server, probably ran by The Goog or BookFace, which gives your Bob’s IP address that you can send messages into the ether for. But if you’re re-inventing the rest of the world, why stop there? Let’s see how deep the rabbit hole goes.

You want to send a message to Bob, entirely P2P and encrypted because that’s what all the cool people want. You don’t know Bob’s public key, though, and want it. HTTPS does this by having a chain of trust: you have Root CA, which has a public key that was distributed with your kernel, mini-CAs underneath that which have certificates (hencefort “cert”) signed by a Root CA, and then Bob with a cert signed by a mini-CA.

The problem with using this for routing the Internet is that if you don’t have IP addresses (because why would you? this isn’t 1970s UNIXLAND) you have no way to contact Bob or to verify his cert, and thus you don’t know who is in charge of his traffic and cannot confirm that he isn’t just the FBI. In Urbit, however, it’s simple. Instead of Bob picking an arbitrary identity and then getting a certificate, the Mini-CA gives Bob a number and that is his identity, reversibly juggled and memorable. The number is postpended to the Mini-CA’s identity, which is in turn postpended to the original CA’s. So if you want to send a message to Bob, and you know Bob is ~winpes-ladsyr or something stupid, you can chop off the back 16 bits, which is Bob’s Mini-CA Inc. equivalent, or back 8 bits for the Root CA[¹]. You can then just shoot off to your Mini-CA “get the public key for ~winpes-ladsyr”. If your Mini-CA doesn’t know, they can ask their own parent authority (The Goog). If The Goog doesn’t know, they can ask the root CA behind ~winpes-ladsyr (say, BookFace), who has to know how to contact Bob since they are the ones providing signatures for Bob’s certificate.

Various users chatting

Now that you have Bob’s public key, you can just send messages assigned to ~winpes-ladsyr to your Mini-CA, who is probably your ISP, encrypted with his public key, and receive encrypted messages in return. Mini-CA can route messages to Bob, because it knows who is in charge of his traffic by simply looking at the bytes. The FBI can’t MITM the key exchange, because Bob’s cert has to be signed by Bob’s ISP and Bob’s Root CA, so it’s just as secure as HTTPS. And now that Bob has a canonical identity that we use for routing, we also have a canonical identity that we can use for his urbitwitter username, and just show a pretty handle next to it. Bam. P2P and Public Key Infrastructure, tied with a persitant identity sytem and ISP. I explained this horribly.

Urbit does this for everything. Why are UNIXLAND filesystems not immutable and versioned like Git? Because they are from 1970s UNIXLAND, and it’s hard to reverse a speeding train. Urbit filesystems should be, because Nock is referentially transparent and there’s no reason the filesystem shouldn’t be as well. Why are all files serialized to text, leading to tons of errors in C programs due to serializing and unserializing, instead of having files be fundementally typed? Because UNIX says it’s so, and C’s type system is basically nonexistant[²]. Why can’t you simply cat /net/bob/home/file.txt and get Bob’s file, transparently over the internet? Probably because your operating system doesn’t have architecture for identity and public keys baked in the way Urbit does.

The combination of all of this should make it extremely easy to write decentralized apps. Events-out from an app are, fundementally, typed RPC calls simple addressed to ~winpes-ladsyr or the like, because the operating system does all the hard work and the language supports it, giving you P2P encrypted typed data transfer basically for free. Your app shouldn’t have weird non-deterministic behavior that warrents a reboot, because Nock and your operating system are, by defininition, deterministic and don’t require reboots. Your app, which is running individually on N number of computers because, hello, decentralized, can be updated by all of them just by their computer pulling down the code, making sure the app state works with the new program or converting it if it doesn’t, and then replacing it at runtime. All of this is basically black fucking magic.

If you got to this paragraph, congratulations. Most of the people on Hacker News or whatever have taken one look of at pitch like this made by the creators of Urbit and then went screaming for the hills, because Urbit is different and not like anything in UNIXLAND. They see Urbit reinventing not just the wheel, but the pully and fire too, and (quite reasonably) concur that Mr. Curtis Yarvin is insane.

But can you imagine if it actually works?

Footnote:

1: Each three-letter syllable has 256 different combinations, and so maps to one byte. There are 256 Root CAs, called galaxies, for example, so getting the last byte of an identity always gets you who is the root CA that signed their certificate. ↩
2: Nock also doesn’t have cycles, because its memory is just a binary tree of bignums; you can read an arbitrary Nock value, then test if it’s a valid instance of a Hoon type based off the shape of the tree. ↩