Debu.gs: Inferno Part 0: Namespaces

Inferno Part 0: Namespaces

Pete Elmore, 2012-05-09 17:44

I’ve promised but not yet delivered on a post about getting up and running with Inferno, and doing incredible things; I’ve bloated it into a series, of which this is the introductory part. I’m going to talk a little about namespaces, hopefully in such a way that it makes sense. It’s a critical piece of the OS, and it one of the things that set Plan 9 and Inferno apart from what you’re running on your desktop now.

Inferno at times gets dismissed, and I don’t intend to evangelize per se. I want to help close the gap between the backstory and the practice, to explain how to use Inferno and why it works the way it does.

I’ve got to make some excursions; please bear with me. I’m going to talk about the evolution of the namespace, not quite in chronological order, but in ascending order of power, and then illustrate how to use Inferno’s.

DOS

Remember running DOS? You have A: and B:, those are your floppies. You have C:, your hard disk, and so on. You’ve got a directory hierarchy under the drives. (I gloss over the details somewhat.) You can think of this as a flat, top-level, single-letter, hardware-based namespace. One disk, one letter.

Unix, Linux

Unix, though, had a top-level root directory, “/”. A root disk was mounted and other disks were mounted in subdirectories of root, subdirectories of subdirectories, wherever you felt you wanted the disk to be. You don’t need to know what’s on which partition of what disk unless you’re putting the system together or your program has filled up a disk. You can think of this as a hierarchical namespace, shared across the OS.

This is a much more flexible, powerful approach. Applications and their users need not know or care what disk something is on.

Later, Unix got filesystems of a different sort: synthetic filesystems. /proc, for example. Linux has /sys, and sometimes /dev is a synthetic filesystem. There is no physical disk storing the files in these filesystems, although no program needs to know that. The files show up in the namespace as any other file would, and programs need not even be modified to read or write files in these systems.

Slightly between these are filesystems that actually do represent files on disk, but not so far as the machine mounting them can prove. NFS, for example, presents a remote service containing a hierarchy of files and directories.

Then there’s the FUSE , which provides filesystems in user-space. Speaking FUSE to the kernel locally and backed by whatever you wish to write a driver for, you can create filesystem servers for arbitrary synthetic filesystems without stuffing new drivers into the kernel. (I gloss over BSD; fewer features seem to have made it there from Plan 9, although DragonFly has developed some interesting ideas.)

On the other hand, since the namespace is shared across the machine, you need to be root to modify it. Imagine a random user mounting something atop /sbin, obscuring even the unmount command. Per-user mounts are a minor exception, a sort of workaround that doesn’t violate the semantics: you can mount a FUSE filesystem, for example, on any directory you own. (And Linux now supports as a kernel module 9P2000, the protocol Plan 9 and Inferno speak.)

Previously, a filesystem could only be mounted in one place. “Bind mounts” in Linux get you slightly more flexibility; once a filesystem (or piece of a filesystem) is somewhere in the namespace, it can put placed elsewhere with `mount —bind`. It is a semi-recent addition, and it wasn’t until Linux 2.6.26 that a filesystem could be mounted read-write but bound read-only in a different spot.

Plan 9, Inferno

Plan 9 and Inferno take several large chunks of the historical Unix semantics and toss them completely out the window, by tossing out some of the core assumptions of the OS, assumptions that have been largely implicit. I’m going to talk about Inferno, but nearly all of this also applies to (and appeared first in) Plan 9. Wherever I say “in Inferno”, feel free to add a parenthetical “and also in Plan 9”. The assumptions are not listed in any specific order, but have been given ordinals so that they can be referred to.

The first assumption to go, the easiest to break and perhaps the most obvious in retrospect, is that files and directories inside the namespace represent on-disk objects.

The second assumption to go is that of a one-to-one correspondence between filesystems and mountpoints. That is to say, a given filesystem is mounted in exactly one place, and exactly one filesystem is mounted in that place. Getting rid of this assumption means that a filesystem and the place where it is mounted are independent of each other and that any number of filesystems may be mounted on the same mountpoint.

The third assumption to go is that of a one-to-one correspondence between a running system and a namespace. If changes to the namespace were seen only by a process and its children, there would be no need to protect users from changes to the namespace made by other users, and so no reason to restrict when and where a user could mount things. The user just needs read access to the filesystem to mount.

Dropping these assumptions about the namespace a given process inhabits allows for a much more powerful model of a system, in ways that are not immediately obvious when taken as a whole, but which represent as large a shift as that from the DOS concept of a namespace to the Unix concept.

Those three assumptions have been broken to various degrees in Unix-like operating systems, and with varying degrees of success. The first assumption of a given filesystem as a representation of objects on a disk has been successfully broken in Unix. You have /proc, for example, in nearly every Unix. FUSE allows users to mount filesystems, even separating them (not perfectly, of course) from other users, allowing for a sort of private namespace, almost breaking the third assumption. Bind mounts allow for parts of the namespace to appear elsewhere in the namespace, coming somewhat close to the second, but requiring root. Union mounts have also made it into mainstream Unix, sort of. But, with the exception of the first assumption, the solutions are all incomplete. What if we started from the beginning, without the Unix baggage?

Since the namespace is per-process (strictly speaking, any process can but does not have to fork the namespace), anyone can mount anything anywhere, or bind anything anywhere, without creating security problems. Since a mountpoint can contain any number of mounts, all of them visible, filesystems can be overlaid on each other. Since any filesystem may be “real” or “synthetic” provided it speaks the appropriate protocol (9P2000 in the case of Inferno), anything can be a filesystem (and everything is), and any program that interacts with files can use it.

A lot of things that are either hacky or distasteful suddenly disappear, by virtue of no longer being necessary. Obviously, a lot of what we delegate to root disappears, as does sudo. chroot disappears. Symbolic links disappear. A huge number of in-kernel drivers disappear. NFS and /etc/exports disappear (good riddance). `mount -o loop` disappears. `mount —bind` disappears. FUSE disappears. sshfs disappears. Even $PATH disappears, and along with it the need to be root to install software.

…So?

One of the things that irks me about documentation for Inferno (and Plan 9) is that there are research papers, man pages, and source code, but very few tutorial- or “howto”-style documents. Maybe I’m spoiled by Linux, which came with a HOWTO for making coffee at least as far back as 1998. Getting started is a bit of a task, especially if you don’t know enough about the system to use the system to investigate itself. You jump in, there’s an unfamiliar shell that runs unfamiliar commands, and the man pages, being written as a reference, are less than helpful.

So I can appreciate a document aimed at addressing someone saying “So? I’ve heard promises before. How do I use this?” and I’m going to tie all of this together with a few concrete examples, explaining as I go.

After much ado, actual things you can really type

Remember all of the things that disappeared? They’ve been each been replaced with something better, and some of them with several things that are all better. I’m going to (for the sake of simplicity among other reasons) ignore authentication in these examples, but suffice it to say that you have your choice of authentication and encryption methods in Inferno.

root privileges to mount disappear

This one’s easy enough:

bind '#p' /n/prog

That mounts the prog filesystem (Inferno’s equivalent to the /proc filesystem) on /n/prog. As a side note, you can manage, debug, and do all sorts of things to running processes through the /prog filesystem, just like in Unix. The difference here is that, if you mount a remote machine’s /prog locally, you can do all of those things from where you are, without ssh, and without even having debugging tools on the remote machine. That is, you can ‘kill 15’ and it will kill pid 15 on the remote machine.

Loopback devices and FUSE disappear

I have a bunch of .iso files all over my computer. Let’s say that they’re in /n/cds, and I want to mount one of them:

; 9660srv /n/cds/dfly-i386-3.0.2_REL.iso /n/dfly
; ls /n/dfly

Now the current process and its children can see the contents of the CD. Nothing special needed; 9660srv(4)‌ knows how to read an ISO-9660 filesystem on one end and speak 9P2000 on the other.

Inferno also ships with, for example, tarfs(4)‌, which treats a tar archive as a read-only filesystem.

In-kernel drivers disappear

Well, most of them. The ones that don’t just export a filesystem, a welcome break from ever having to ioctl() again. The drivers provided in the kernel can be listed by looking at /dev/drivers, which is a somewhat sparse list, more sparse at least than the output of lsmod on a typical Ubuntu or even Arch install.

Where you would need a kernel driver to support a new filesystem in most OSs, Inferno requires no modprobe (and thus no superuser privileges) and no touching of the kernel to get a new filesystem. You can get ext2 support (read-only) by downloading ext2fs , much like how the kernel doesn’t need to know about ISO-9660 filesystem in the example above. (Please do not get me started on the duplication of code involved when

chroot disappears

Since a process can craft a namespace for itself, anything that might be harmful for an untrusted process to see can be unmounted from the namespace, and the special system call pctl(2)‌ can be employed to prevent the process from mounting devices. An extreme example which unmounts everything (and thus can do nothing but run shell builtins):

; load std
; pctl newns
; pctl nodevs
; unmount /
; ls /
sh: ls: './ls' file does not exist

Slightly more precision than unmounting / is needed if you want to do anything useful of course, but the idea is simple.

You can also, in fact, construct a temporary sandbox if you want, by mounting an in-memory filesystem on top of root:

; mount -bc {memfs -s} /
; echo Mad with power! > /asdf

And, of course, /asdf does not appear elsewhere. (You may want to note, though, that this only applies to newly created files.)

Symlinks disappear

bind takes care of this nicely by allowing you to place a directory on top of another one. For example, I like the tinytk icon set more than the default icon set, so I bind them over the defaults before starting the window manager:

; bind -bc /icons/tinytk /icons/tk

You can even bind a file on top of another file:

; echo Hello > some-file
; bind some-file /NOTICE
; cat /NOTICE
Hello

Since symlinks are not needed and not present in Inferno, the problems with cyclic filesystem references very nearly disappear.

$PATH, $LDPATH and root privileges for installation disappear

I hate fiddling with $PATH. Vast chunks of my .bashrc are devoted to the practice of figuring out which comes first, and not adding redundant paths or paths that point at binaries for different architectures. I have the Plan 9 Ports installed, but don’t want the Plan 9 version of man to interfere with the Linux version, for example.

In Inferno, everything you want to run is in /dis (/bin in Plan 9). If you want to run executables from elsewhere, you bind it before or after /dis.

; bind -bc $home/dis /dis

So, if you want to install something but can’t write to /dis, install it wherever, and bind it over /dis. No sudo needed. It’s the same with /lib and /module.

NFS disappears

I’m going to ignore authentication (the -A flag does this) and encryption for now, so we’re going to just export something that doesn’t matter: memfs(4)‌ again. The first machine:

; mount -c {memfs -s} /n/mem
; echo something > /n/mem/asdf
; listen -A 'tcp!*!1234' { export /n/mem & }

The second machine:

; mount -A 'tcp!thefirstmachine!1234' /n/remotememfs
; cat /n/remotememfs/asdf
something

Since most file-serving commands can be instructed to communicate over stdin/stdout rather than staking out a mountpoint, the filesystem need not even be mounted locally to be available to the network:

; styxlisten -A 'tcp!*!1234' {memfs -s}

And on the second machine:

; mount -Ac 'tcp!thefirstmachine!1234' /n/remotememfs

The listen(1) and dial(1) commands act as convenient wrappers around interactions with the /net filesystem, which is the interface to the network in Inferno. Sockets are handled by the BSD sockets library (with accompanying system calls) in Unix, and are significantly trickier to use. You can think of listen and dial as analogous to netcat. styxlisten is the same, but expects to speak the 9P2000 protocol (referred to as Styx before its adoption in Plan 9) instead of plain bytes.

Powerful primitives mean never having to say you’re sorry

All of this is accomplished with a few brief commands which are themselves thin wrappers around Inferno’s syscalls. You have bind(1), mount(1), and unmount(1)‌. If you want introspection ns(1)‌ is well-designed.

Every OS-level object being represented as a filesystem means that programs like cat, sed, and echo can be used to, for example, access the network, using the ip(3) filesystem . ftpfs(4)‌ lets you ls and cp without using an interactive client, which means it’s easier to script, no piping into the FTP client, and in fact, no need for your script to even realize it is talking to an FTP server rather than any other file on disk. zipfs lets you mount a zip file. Combining them, you can mount an ISO that is inside a zip file that is on an FTP server, and then copy some files out of it.

Somewhere in the future

We’ve already hit an interesting spot. A large chunk of the population owns at least a laptop and a smartphone, and of them, a large chunk also owns a desktop, has a work computer, has a tablet, etc. In the heyday of Unix®, there was one computer and many users. It seems to be stating the obvious, but we’re rapidly approaching a point where the common case is one user with many computers rather than vice versa.

Managing all of the computing resources (disk and CPU, particularly) is already a pain. Syncing files, syncing contacts, etc. If your phone hasn’t the computes to process something, you have to go to your desktop (physically or by means of ssh). If you want to watch a movie on your phone, you have to copy it over. What’s your solution to reading an article on your phone and then deciding you’d rather read it on your laptop? Send an email or IM yourself? (Still better than DBus, but that is a digression.)

I don’t see this as sustainable. The stopgap is handing all of your data to Google, Facebook, or whatever “cloud” service, but that solution leaves much to be desired (and enumerating the reasons is another digression).

A software solution to the problem of managing computing resources and getting all of these machines (local and remote) to cooperate is inevitable. At least for the time being, I see Inferno as the best candidate for a platform that makes this happen. That may not go all the way to explaining why I’m suddenly in love with the system, but it should at least go part of the way.

Next on the menu

This barely scratches the surface. “Everything is a filesystem” is only one of the big ideas built into Inferno.

I’ve written a pile of code for Limbo (a predecessor of Go and very similar) and Inferno’s brilliant sh(1)‌, by far the nicest shell I have ever used. I’m sorting out some authentication problems with clustering the machines in my house and plan to document how to build an Inferno cluster here, with some map-reduce examples.

I won’t write much about Limbo; plenty has been written already, and it is easy enough to get started writing code. Brian Kernighan’s article is a great place to start, Dennis M. Ritchie (requiem in pax) has written a somewhat more formal overview, and Inferno Programming with Limbo is a good reference.

I’d like to write some about the shell, mostly in the form of examples. If you are interested but do not want to wait, the article written by the author of the shell is a great introduction, and I have a solution on Github for my favorite Project Euler problem which should serve as a fun example of doing math in the shell. It may look a bit messy; after all, the shell is not for math. But comparing it to the roughly equivalent bash solution (including the timing information) is illuminating.

I’m tempted to write a comparison between BEAM and Dis, Inferno’s VM, since there are some very close similarities (e.g., message-passing, cheap concurrency, a secure VM) and some interesting differences (the type system, Inferno’s channels versus Erlang’s messages, the speed difference between the two VMs, Inferno’s sh versus the Erlang shell, and the language choice in Inferno versus on BEAM). I think that might tend towards evangelism, though.

So, expect some time later a guided tour of the shell and a brief introduction to building a cluster out of scattered Inferno VMs, although not necessarily in that order, and there is a non-zero probability a comparison of BEAM and Dis will show up here at some point, although that is slightly less likely.

An aside about my favorite Project Euler problem

It’s number 31. The algorithm is easy to understand and implementations can range from slow to fast, and the problem seems to lend itself, like 99 Bottles of Beer on the Wall , to giving a tour of the language, so it’s usually one of the first problems I do in any given language.

Update for clarification

I’ve gotten a question about mountpoint conflicts. The -a and -b flags passed to mount and bind are for mounting “after” or “before” whatever filesystems are already present on the mountpoint. You can think of the directories as linked lists, and the -a and -b options append or prepend, respectively. I’ll demonstrate with an example:

; mkdir /tmp/a
; mkdir /tmp/b
; echo 'a' > /tmp/a/foo
; echo 'b' > /tmp/b/foo
; bind /tmp/a /n/dasha
; bind -ac /tmp/b /n/dasha
; lc /n/dasha
foo foo
; cat /n/dasha/foo
a
; bind /tmp/a /n/dashb
; bind -bc /tmp/b /n/dashb
; cat /n/dashb/foo
b

Translations

Russian translation by Alex Efros.

Tags: historic inferno

<< Previous: "Plan 9 and Inferno"
Next: "Making Music with Computers: Two Unconventional Approaches" >>

It's user error.

Debu.gs

Navigate

Tags