fork() vs Mach task API

Actually I do know one compelling alternative to fork(), and it is — you've guessed it — the Mach task API.

The reason fork() is so useful on Unix is because Unix has virtually no API to control other processes. Most of the syscalls work on the calling process, so you cannot manipulate file descriptors and memory mappings of another process, you cannot make another process chroot or drop capabilities, et cetera.

The only way to workaround these limitations is to run code controlled by you on behalf of, and in the context of, another process. One hack to do this is ptrace'ing the other process; but this is first of all, a hack, second, it requires the process to already have a mostly sane internal state, e.g. mapped and initialized libc, unless, of course, you choose to load your own temporary libc into its address space... The other way to do this is what fork() lets you do: it's not technically the parent running its code in the child, but it's the same piece of code that controls both, so it can perform the exact manipulations that parent wants to be performed.

In contrast, all Mach task APIs (with very few exceptions) work on whatever task port you invoke them on. It can be mach_task_self(), the calling task, or it may be not; having access to a task's task port is enough to fully control it. Of course, you can only get access to another task's task port under some controlled circumstances, but creating the other task is one of them.

Upon being created with task_create(), a new task has no threads, so it's not running. The parent task gets the task port for the new task, and using this task port, it can set the new task up any way it deems necessary, using the exact same APIs that the task would use to set up itself, but passing child_task instead of mach_task_self(). Then, when everything's ready, it can create and start the initial thread in the child task.

Mach still has support for having the new task inherit the virtual memory from the parent task. I'm not sure why, perhaps for making it possible to efficiently implement fork() on top of it. Hurd does that, but it still has to copy port rights to the child task, and that's multiple context switches between the userspace (parent task) and the kernel, so that's where Hurd's fork() is slow.

Mach API is somewhat cumbersome and nowhere as convenient to use as a fork()/exec() pair; but this is easily fixable with library wrappers.

P.S. I'm guessing other capability-based microkernels have similar APIs for manipulating tasks as well, and I'm definitely planning to learn more about them. In particular, I'm interested in seL4 — expect some posts about it!