Of course, this is a bit of apples-to-oranges comparison, since we're comparing a high-level, ergonomic framework to an HTTP implementation shipped with Go. There are high-level web frameworks for Go as well; for example, check out Echo.
This post is designed to be read side-by-side with the Go tutorial. Here's the link again. All the code, with step-by-step commits, will be available in this repo on my GitHub.
Let's create a repo:
$ cargo new rustwiki
$ cd rustwiki
cargo new
has generated a Hello World for us, so let's compile and run it:
$ cargo run
Compiling rustwiki v0.1.0 (/home/sergey/dev/rustwiki)
Finished dev [unoptimized + debuginfo] target(s) in 0.89s
Running `target/debug/rustwiki`
Hello, world!
Let's define a struct
to represent a wiki page:
#[derive(Debug, PartialEq, Eq)]
struct Page {
title: String,
body: String,
}
and two methods to load and save it to and from a text file:
use std::io::{self, Read, Write};
use std::fs::File;
impl Page {
fn load(title: String) -> io::Result<Page> {
let file_name = format!("{}.txt", title);
let mut file = File::open(file_name)?;
let mut body = String::new();
file.read_to_string(&mut body)?;
Ok(Page { title, body })
}
fn save(&self) -> io::Result<()> {
let file_name = format!("{}.txt", self.title);
let mut file = File::create(file_name)?;
write!(file, "{}", self.body)
}
}
Now, in main()
, let's create a page, save it to a text file and then load it back:
fn main() -> io::Result<()> {
let page = Page {
title: String::from("Test"),
body: String::from("This is a sample page"),
};
page.save()?;
let page = Page::load(String::from("Test"))?;
println!("{:#?}", page);
Ok(())
}
I've used the {:#?}
format specifier to pretty-print the page. Now run
$ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.04s
Running `target/debug/rustwiki`
Page {
title: "Test",
body: "This is a sample page",
}
$ ls
blog.md Cargo.lock Cargo.toml src target Test.txt
We see it has indeed created the file and read it back correctly. Great; now it's time to actually start using Rocket!
At the moment (Jun 2019), Rocket still requires nightly Rust. Thankfully, that's really easy to set up with rustup:
$ rustup override set nightly
info: override toolchain for '/home/sergey/dev/rustwiki' set to 'nightly-x86_64-unknown-linux-gnu'
this will automatically download and install the nightly Rust toolchain (if you don't have it already) and install it as a default toolchain for this project (directory).
Now, let's set up a Rocket 🚀 Hello World. Following this guide, let's add add this to the Cargo.toml
:
[dependencies]
rocket = "0.4.1"
and this to main.rs
:
#![feature(proc_macro_hygiene, decl_macro)]
#[macro_use] extern crate rocket;
// ... Page stuff goes here ...
#[get("/")]
fn index() -> &'static str {
"Hello, world!"
}
and replace our main()
with:
fn main() {
rocket::ignite().mount("/", routes![index]).launch();
}
Now, running cargo run
again causes Cargo to download and compile a number of crates; and then Rocket starts 🚀:
Compiling rustwiki v0.1.0 (/home/sergey/dev/rustwiki)
Finished dev [unoptimized + debuginfo] target(s) in 36.80s
Running `target/debug/rustwiki`
🔧 Configured for development.
=> address: localhost
=> port: 8000
=> log: normal
=> workers: 16
=> secret key: generated
=> limits: forms = 32KiB
=> keep-alive: 5s
=> tls: disabled
đź›° Mounting /:
=> GET / (index)
🚀 Rocket has launched from http://localhost:8000
If you visit http://localhost:8000 in your browser now, you should see "Hello, world!
" displayed in the browser and the following appear in the log:
GET / text/html:
=> Matched: GET / (index)
=> Outcome: Success
=> Response succeeded.
GET /favicon.ico image/webp:
=> Error: No matching routes for GET /favicon.ico image/webp.
=> Warning: Responding with 404 Not Found catcher.
=> Response succeeded.
Let's add another route underneath index()
:
#[get("/view/<title>")]
fn view(title: String) -> io::Result<String> {
let page = Page::load(title)?;
let res = format!("<h1>{}</h1><div>{}</div>", page.title, page.body);
Ok(res)
}
and register it inside main()
like this:
fn main() {
rocket::ignite()
.mount("/", routes![index, view])
.launch();
}
Now, if you restart the app and open http://localhost:8000/view/Test, you should see "<h1>Test</h1><div>This is a sample page</div>
" in your browser. The reason you're seeing raw, unrendered HTML code is that by default Rocket serves String
as text/plain
and not text/html
.
Let's ask for HTML explicitly by wrapping our String
into Html
:
use rocket::response::content::Html;
#[get("/view/<title>")]
fn view(title: String) -> io::Result<Html<String>> {
let page = Page::load(title)?;
let res = format!("<h1>{}</h1><div>{}</div>", page.title, page.body);
Ok(Html(res))
}
Now it should work and render like this:
Test
This is a sample page
Nice!
Let's add another route:
#[get("/edit/<title>")]
fn edit(title: String) -> Html<String> {
let page = Page::load(title.clone())
.unwrap_or(Page::blank(title));
let res = format!("
<h1>Editing {title}</h1>
<form action=\"/save/{title}\" method=\"POST\">
<textarea name=\"body\">{body}</textarea><br>
<input type=\"submit\" value=\"Save\">
</form>", title = page.title, body = page.body);
Html(res)
}
Here, I'm using a helper method for creating blank pages, so let's also add that:
impl Page {
fn blank(title: String) -> Page {
Page {
title,
body: String::new()
}
}
// ...
Don't forget to add it to the list in main()
!
fn main() {
rocket::ignite()
.mount("/", routes![index, view, edit])
.launch();
}
Again, if you try opening http://localhost:8000/edit/Test in your browser, you should be able to edit the text in a <textarea>
. Submitting it doesn't work though, since we haven't yet implemented /save/<title>
. But before we do that, we need to deal with the hardcoded HTML. While it's nice that we're able to use Rust's multi-line string literal and formatting, it would be a lot better to put the template into its own file and use a proper templating engine.
Rocket has built-in support for templates; but it doesn't include its own templating engine — we can use whichever one we like. The two mentioned in the guide section on templates are Handlebars and Tera. I'm going to use Tera for this post.
Let's put this into templates/edit.html.tera
:
<h1>Editing {{title}}</h1>
<form action="/save/{{title}}" method="POST">
<div>
<textarea name="body" rows="20" cols="80">{{body}}</textarea>
</div>
<div>
<input type="submit" value="Save">
</div>
</form>
Then add this to your Cargo.toml
:
[dependencies.rocket_contrib]
version = "0.4.1"
default-features = false
features = ["tera_templates"]
(or "handlebars_templates"
if you're using Handlebars instead).
Now, let's change our edit()
method to use the template:
use rocket_contrib::templates::Template;
#[get("/edit/<title>")]
fn edit(title: String) -> Template {
let page = Page::load(title.clone())
.unwrap_or(Page::blank(title));
Template::render("edit", page)
}
Nice and tidy, isn't it? We don't have to wrap the Template
in Html
anymore, and we don't have to manually format the string.
We need to do two more things to get this to work. First, we have to make our Page
type serializable in order for Template
to be able to pass it to the templating engine. To do that, add Serde to Cargo.toml
:
[dependencies.serde]
version = "1.0"
features = ["derive"]
and derive the Serialize
trait for Page
in addition to the ones we already derive:
use serde::Serialize;
#[derive(Debug, PartialEq, Eq, Serialize)]
struct Page {
title: String,
body: String,
}
If you run the app now and try accessing /edit/<title>
, you'll see this in the log:
GET /edit/Test text/html:
=> Matched: GET /edit/<title> (edit)
=> Error: Attempted to retrieve unmanaged state!
=> Error: Uninitialized template context: missing fairing.
=> To use templates, you must attach `Template::fairing()`.
=> See the `Template` documentation for more information.
=> Outcome: Failure
=> Warning: Responding with 500 Internal Server Error catcher.
=> Response succeeded.
That is the second thing we need to fix — we need to attach the template fairing to our Rocket app:
fn main() {
rocket::ignite()
.mount("/", routes![index, view, edit])
.attach(Template::fairing())
.launch();
}
With this, it will work.
Let's also switch the view page to a Tera template:
#[get("/view/<title>")]
fn view(title: String) -> io::Result<Template> {
let page = Page::load(title)?;
let res = Template::render("view", page);
Ok(res)
}
and in templates/view.html.tera
:
<h1>{{title}}</h1>
<p>[<a href="/edit/{{title}}">edit</a>]</p>
<div>{{body}}</div>
If somebody tries to open a non-existent page, we should suggest them to create it instead of returning an error from failing to open the file. Let's do that:
use rocket::response::Redirect;
#[get("/view/<title>")]
fn view(title: String) -> Result<Template, Redirect> {
if let Ok(page) = Page::load(title.clone()) {
let res = Template::render("view", page);
Ok(res)
} else {
Err(Redirect::to(uri!(edit: title)))
}
}
Notice the uri!
macro which allows us to create URIs in a type-safe manner instead of retyping and manually filling in a URI template. This way, we declare how a URI for a route looks like and what arguments it accepts once, and then reference that definition from other places with the uri!
macro.
Finally, let's implement /save/<title>
. Since the new body is submited to us as an HTML form, we're going to need to define a structure to represent that form and derive FromForm
for it:
#[derive(Debug, FromForm)]
struct SaveForm {
body: String
}
Then we can use it in the route arguments like so:
use rocket::request::Form;
#[post("/save/<title>", data = "<form>")]
fn save(title: String, form: Form<SaveForm>) -> io::Result<Redirect> {
let form = form.into_inner();
let page = Page {
title: title.clone(),
body: form.body,
};
page.save()?;
Ok(Redirect::to(uri!(view: title)))
}
The first (and the only) thing we do with the Form<>
wrapper is we unwrap it using form.into_inner()
. Its purpose is to serve as a type guard telling Rocket how to collect the input for this argument (from an HTML form), not to be a fancy container full of functionality. It does implement Deref
, so we could use it as-is, but we need to move form.body
out of it, so that's what the form.into_inner()
call is for.
And again, we need to remember to add the new route to main()
:
fn main() {
rocket::ignite()
.mount("/", routes![index, view, edit, save])
.attach(Template::fairing())
.launch();
}
Now you can edit and save some pages!
Actually, we don't need to do anything here; we're already dealing with errors properly! Rocket will automatically return an error if we return an Err
value of io::Result
and if a template fails to render. To verify this works, try changing the requested template name, e.g.
#[get("/edit/<title>")]
fn edit(title: String) -> Template {
let page = Page::load(title.clone())
.unwrap_or(Page::blank(title));
Template::render("foo", page)
}
I don't think we have to do anything here, either. Rocket and Tera already handle everything for us. Not only will they preload and cache the templates, they will actually watch the filesystem for changes and live-reload the templates when we edit them.
This is another one of those things Rocket gives us for free.
Try opening http://localhost:8000/view/foo/bar — you'll get a 404. That's because the <title>
part in our /view/<title>
route only matches a single path segment. If you want to allow passing multiple path segments, you have to write it this way:
#[get("/view/<path..>")]
fn view(path: PathBuf) -> ...
Even then, it's smart enough to not accept paths like ../../etc/passwd
. Mindblowing, isn't it?
Side note: if you try opening http://localhost:8000/view/../../etc/passwd in your browser, your browser may decide to automatically collapse that into http://localhost:8000/etc/passwd since it believes the first ..
is undoing the view/
part, and the second ..
has nothing more to undo and is thus ignored. You can use curl --path-as-is
which doesn't do this:
$ curl --path-as-is http://localhost:8000/view/../../etc/passwd
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>404 Not Found</title>
</head>
<body align="center">
<div align="center">
<h1>404: Not Found</h1>
<p>The requested resource could not be found.</p>
<hr />
<small>Rocket</small>
</div>
</body>
</html>
Check the address Rocket is actually seeing in the Rocket log.
Well, that concludes our tutorial! Again, you can find the repo on GitHub.
]]>Let's start with a brief definition of a microkernel. A microkernel is an OS kernel that implements a significantly smaller part of functionality than conventional ("monolithic") kernels such as Linux or BSD kernels — in particular, no file system and no network protocol stacks. Instead, a microkernel provides the functionality (primarily scheduling, virtual memory and an IPC mechanism) required to implement these missing parts, and more, in user space.
Among oft-cited advantages of microkernel-based systems — where a lot of the code is run as userspace programs rather than being in the kernel — are increased security due to more isolation and better resilience to crashes and other kinds of errors. Personally, I'm more intrigued by their flexibility and extensibility.
There are a number of problems with microkernels too. I'll describe some of them in more detail later; but the often mentioned ones are poor IPC performance and concerns around scalability.
They must be the reason that, despite microkernels being widely deployed and used by many people without them even knowing it, no popular user-facing operating system is built on top of a true microkernel (Darwin and NT do not count). This could change with Fuchsia, a new operating system from Google, based on its own microkernel, Zircon; it is expected that Fuchsia will replace Android as Google's primary operating system for consumer devices.
The idea of a microkernel is nothing new. As Wikipedia helpfully tells me, the first attempt at building a microkernel, RC 4000, was made back in 1969, about the same time Unix was being created.
In 1975-1976 the Aleph kernel (part of a system called RIG) was created at the University of Rochester. In 1797-1985, the Carnegie Mellon University developed Accent based on the ideas from Aleph. Starting from 1985, they have worked on Mach, which started as an attempt to make it possible to run/reimplement BSD Unix on top of Accent design concepts, and ended up being one of the two most famous microkernels in existence, used by more than a billion users today (as a part of the iOS kernel).
I may write a longer post on capability-based security later; but here are the basics.
In a capability-based system, in order to be able to access a resource (such as a file, a process or an on-screen window) a program (process, task) has to be explicitly granted the right to access this resource by some other program that already has the right (for example, the creator of the resource), as opposed to just having authority to access this kind of resources (such as being root on Unix). A capability then is this unforgeable access token; it both names/designates/references the resource and grants its holder rights to access/manipulate the resource; programs grant each other rights to access resources by exchanging capabilities, and there's no way to name a resource without having a capability to it (so, no global PIDs and window IDs). To access/manipulate any resource, a program needs to name, using a capability, which resource is that it wants to access/manipulate; the program that implements the resource (be it the kernel or a userspace task) then doesn't even have to ask itself whether the calling program should be allowed to perform the operation — it has the capability, so, yes.
Capability-based security is much more flexible and dynamic than authority-based security; moreover, it solves some issues that the latter has: notably, it's not prone to race conditions and confused deputies. It's also a "mechanism, not policy", and in fact it's simpler and easier to implement on the kernel side than even UID checks (essentially, the kernel leaves the question of access control up to userspace), which makes it a natural fit for microkernels.
In Mach, capabilities are tightly integrated with IPC. Mach ports are essentially kernel-maintained message queues, the userspace accesses them via port rights, which are capabilities — a lot like Unix programs access open files via file descriptors. While each port is just a message queue, it typically represents some resource maintained by whoever has receive right for the port (aka a server); to manipulate the resource, other programs (clients) send specially crafted messages to the port, which the server interprets as commands to perform actions on the resource. There are tools — primarily, MIG, the Mach interface generator — to implement this model using convenient remote procedure calls (RPC), which makes it look like the client directly invokes functions in the server.
The giant, glaring problem with these first-generation microkernels is their low IPC performance. That is not to say that traditional monolithic kernels have much faster IPC (Linux does, but that's because of massive effort that went into optimizing it), but this is not much of a problem for monolithic kernels because there IPC is used occasionally, when there is, indeed, a need for inter-process communication (for example, for sending logs to a syslog daemon). In contrast, microkernel-based systems use IPC pervasively to implement all kinds of operations, so IPC performance starts to matter a lot.
Furthermore, on a monolithic kernel, IPC would typically be thought of in terms of (asynchronous) message passing; and on a microkernel in terms of (synchronous) remote procedure calls. Even if RPC is internally implemented on top of message passing, RPC brings with it an expectation that the calls would be reasonably fast. Maybe not as fast as direct function calls, but you still wouldn't want each of these calls to involve the kernel parsing, copying and buffering the message, then waiting for the target process to get scheduled, then copying the message back to the target processes memory, then the other process parsing it, and then doing it all over again to send the response.
In short, microkernels need IPC to be fast because they rely on it way more than monolithic kernels do.
Central to the second generation of microkernels is the work of Jochen Liedtke. In his 1993 paper titled "Improving IPC by Kernel Design" he describes an experimental microkernel named L3 which was able to pass messages up to 22 times faster than Mach. That's right, not 20% faster, not 2 times faster, but jaw-dropping, mind-boggling twenty-two times faster. Can you even begin to imagine that?
The way to bring about such an exciting performance improvement, it turns out, was to make the kernel even more micro-, to make it handle even less. L3 is top-to-bottom designed for high IPC performance, not for the features. It does away with capabilities and capability-passing; this way the kernel does not have to parse the messages. Instead of sending a message to a port, you explicitly specify which thread you're sending the message to. This additionally removes one level of indirection and frees the kernel from maintaining bookkeeping information about ports and port rights. Another notable and familiar feature of Mach ports that L3 does not implement is message buffering in the kernel: instead of copying the message to a kernel buffer and subsequently to the destination buffer, as it's done in Mach, L3 directly switches control to the receiving thread without going through re-scheduling and buffering.
L3 also employs a number of tricks to make IPC and context-switching faster. One worth mentioning is that for short messages, L3 uses an additional optimization where it would pass the message body to another thread entirely in CPU registers, further avoiding copying and RAM access; it was found that between 50% and 80% of messages contain 8 bytes or less, so this optimization is worthwhile. This additionally simplifies the work the userspace has to do, because there's no need to serialize and deserialize short messages into and from buffers. Instead, the RPC routine can be inlined, and values placed into the registers automatically by the compiler. In the same way, in the server program, the register values can be directly used by the compiler to perform some computation on them.
As a result of all those simplifications, L3 is so much smaller than the first-generation microkernels that it becomes feasible to count, and then try and minimize, the number of processor instructions executed on the fast path of message sending.
Kernels this small are also called nanokernels, to emphasize their small size even in comparison to "traditional" microkernels. Nanokernels generally follow the principle of minimality, as formulated by Liedtke: "A concept is tolerated inside the microkernel only if moving it outside the kernel, i.e., permitting competing implementations, would prevent the implementation of the system's required functionality." Essentially, a nanokernel contains the code necessary to manage virtual memory and perform context switches and not much else; this is why nanokenels are sometimes described as being "CPU drivers".
Of course, in absence of capabilities and other niceties provided by the kernel, it becomes much more difficult and troublesome to build a complete operating system on top of a nanokernel. A traditional monolithic operating system may consist of the kernel and "some utilities" like a shell and a text editor on top of it (though modern operating systems have a much, much larger userspace with many daemons/services, graphical desktop environments and so on). In an operating system based on a nanokernel basically all of the operating systems resides in the userspace, and the kernel itself is more of an utility. A common approach to get a microkernel-based system to do something useful is to run a version of Unix or Linux virtualized (or paravirtualized, i.e. aware it's not running directly on the hardware); this lets one achieve some working results fast, but is "cheating", because the resulting system is still monolithic even though the guest kernel does not run in the hardware kernel mode, which means that all of the supposed advantages — security, crash resilience, flexibility and extensibility — are lost.
A reworked version of the L3 kernel described in "Improving IPC by Kernel Design" was named L4, and it spawned a whole family of L4 microkernels that differ in the implementation and some aspects of the design, but share the same basic idea and principles — most importantly, the principle of minimality. Microkernels of the L4 family (OKL4 in particular) are widely deployed to billions of users inside embedded devices such as secure coprocessors in modern smartphones.
So this is what the second generation of microkernels is about. An order of magnitude better performance and a complete lack of features. The problems with the second-generation microkernels fall into two categories: access control and resource accounting.
Without capabilities, the system is in need of another way to manage access control. One approach is implementing capabilities in userspace, either in a central server, or by having each task track the capabilities it has given out to other tasks. The former approach has a large performance overhead and somewhat nullifies the IPC performance improvements made possible by removing capabilities from the kernel space. The latter approach works, but has other complications around passing capabilities between tasks and accepting capabilities from unknown tasks.
There are also capability-less approaches, and L4 implements one of them, known as clans and chiefs, in the kernel. Basically, like Unix and unlike Mach, L4 tracks parent-child relationships between tasks. A group of tasks created by a given task forms a clan, and the creating task is called its chief. Within a clan, messages can be sent freely; but messages to other clans have to go through their chiefs which are able to filter, manipulate or redirect the message. This model is quite simple and does not impose a noticeable performance overhead on L4 IPC, which is why L4 adopted it. Still, it's not very beautiful, flexible or useful. In particular, either each message has to go through multiple redirections, or all the threads on the system need to know each other's thread IDs, which is bad for encapsulation and stability, and even then messages have to go through the chiefs.
There are more questions to access control than IPC messaging. For example, there needs to be a way to let tasks share memory, and yet there should be some access control on when a task is allowed to access the memory of another task. L4 solves this using an interesting mechanism called recursive memory mapping, which is simple — basically, a thread can grant another thread access to a part of its own memory, and these grants form an inheritance tree — but has many issues of its own.
The other large problem is resource accounting. It's not actually specific to second-generation microkernels — first-generation microkernels and monolithic kernels suffer form it as well, but as it's the case with IPC performance, resource accounting does not pose as much of a problem for monolithic kernels.
The system has only a limited amount of resources available. This primarily refers to the physical RAM and the CPU time, but can include other resources such as threads that need their data to be allocated in that RAM and scheduled to take up a part of that CPU time. In a traditional monolithic system, the kernel can just allocate as much memory as it needs to service the userspace requests. To prevent a single (possibly malicious) process from consuming all of the system resources there usually are various limits and quotas in place.
In the same way, in a microkernel-based system, the kernel will try to allocate as much memory as it needs, and so will userspace servers. Often, these resources will be needed not for the service itself, but for it to be able to service a request made by another task. The problem is, it is much harder to track these resource requirements in such a distributed system where many servers all perform some kind of services for each other, so it makes little sense to put quotas on resource usage of each particular server.
The third-generation microkernels are characterized by:
It turned out that capabilities bring more solutions than problems, after all. It is possible to have extremely fast IPC even with a level of indirection that endpoints (message queues, like Mach ports, except without actually queuing messages) are. In return, capabilities free the kernel and the userspace from trying to build an access control system on top of an inflexible mechanism such as clans and chiefs, and also, it turns out, can be used to solve resource accounting in the same elegant way they solve access control.
The idea is simple: the kernel does not allocate dynamic memory at all. Whenever it needs some memory to service a userspace request — such as to create a new thread — the userspace has to allocate memory itself and pass a kind of pointer to this memory to the kernel, via capabilities. The way this works is on startup, the kernel enumerates the available physical memory, then creates a bunch of capabilities that represent this memory and gives them to the initial thread. The userspace is then free to pass these capabilities between threads as usual and implement any allocation strategies it wants. Creating a kernel object, such as a new thread requires passing in a free memory ("untyped" in seL4 parlance) capability of sufficient size, and may in fact be implemented as a method call on the free memory capability.
This mechanism, on the one hand, helps reduce the kernel even further by freeing it form dynamic memory management concerns. And indeed in accordance with the minimality principle this makes it possible for the userspace to use several competing strategies for memory allocation, i.e. several allocator implementations. On the other hand, it provides a generic and flexible framework for resource accounting. The resource pools, in form of capability sets, can be passed between tasks/threads; it is expected that each application-level task would be provided with such a pool upon startup (or on request), and each server would require the caller to provide, along with each request, a resource pool to service the request. The server would use the pool to allocate its own resources that it needs for handling this request, as well as transitively passing parts of it to other servers when making requests to them. This way, the total number of resources allocated to performing all the actions necessary to fulfill one top-level request is limited by the resource pool explicitly passed when making that request, and the total resource pool given to a task serves as a quota for how much of the system resources the task can use.
The most widely known, and the most successful third-generation microkernel is seL4 from the L4 family, which was developed in the late 2000s and publicly released as free software in 2014. IPC on seL4 is among the fastest, if not the fastest, among microkernels including L3 / original L4. In addition to that, seL4 is the first and still the only microkernel (or a kernel of any kind) whose correctness has been successfully verified using formal methods. This guarantees that seL4 is essentially bug- and vulnerability-free, which is a rare quality for software of any kind, enabling seL4 to be widely used in high-assurance systems such as in militarily helicopters.
The only problem preventing further seL4 adoption seems to be that there doesn't exist a general-purpose, user-friendly operating system built on seL4. There are some prototypes of such a system though. There is RefOS, which is more of a demo of what's possible than an actual OS, and there exists a port of the Genode operating system framework to run on top of seL4.
The microkernels of the third generation, and seL4 in particular, solve the problems and the shortcomings of the previous generations, and provide a solid foundation for building secure, high-assurance systems on top of them.
I haven't mentioned all the microkernels; I'm not even familiar with all of them. Here are some names for you to do your own searching:
Still, I think it's a fairly good post, and I know it helped some people to "get" the borrow checker, so...
Here's my take on describing these things. Once you grasp it, it all seems intuitively obvious and beautiful, and you have no idea what part of it you were missing before.
I am not going to teach you from scratch, nor repeat what The Book says (although sometimes I will) — if you haven't yet, you should read the corresponding chapters from it now. This post is meant to complement The Book, not replace it.
I can also recommend you to read this excellent article. It actually talks about similar topics, but focuses on other aspects of them.
Let's talk resources. Resource is something valuable, "heavy", something that can be acquired and released (or destroyed) — think a socket, an open file, a semaphore, a lock, an area of heap memory. All these things are traditionally created by calling a function that returns some kind of reference to the resource itself — a memory pointer, a file descriptor — that needs to be explicitly closed when the program considers itself done with the resource.
There are problems with this approach. First, it's all too easy to forget to release some resource, causing what is known as leak. Even worse, one might attempt to access a resource that has already been released (use-after-free). If lucky, they would get an error message that will hopefully help them to identify and fix the bug. Otherwise, the reference they have — while invalid as far as the logic goes — might still refer to some "place" which has already been taken by some other resource: memory where something else is already stored, file descriptor some other open file uses. Trying to access the old resource via an invalid reference can corrupt the other resource or completely crash the program.
These issues I'm talking about are not imaginary. They happen all the time. Look, for example, at the Google Chrome release blog: there are lots of vulnerabilities and crashes getting fixed that were caused by use-after-free — and it costs them a lot of time and work (and money) to identify and fix those.
It's not that developers are dumb and oblivious. The logic flow itself is error-prone: it requires you to release resources, but doesn’t enforce it. Furthermore, you do not usually notice that you forgot to release a resource as there rarely is an observable effect.
Sometimes achieving simple goals requires inventing complex solutions, and those bring complicated logic. It's hard not to get lost in a giant codebase, and it’s not surprising that bugs always pop out here and there. Most of them are easy to spot. These resource-related bugs are however hard to notice, yet very dangerous if they are exploited in the wild.
Of course, a new language like Rust cannot fix your bugs for you. What it can do though — and it perfectly succeeds in it — is influence your way of thinking, bringing some structure into your thoughts, thus making these kinds of errors a lot less likely to appear.
Rust provides you with a safe and clear way to manage your resources. And it doesn't let you manage them in any other way. This is, well, very restrictive, but this is what we came for.
These restrictions are awesome for several reasons:
They make you think in the right way. After some Rust experience, you will often find yourself trying to apply the same concepts when developing in other languages, even if they are not built right into the syntax.
They make your code safe. Except for several pretty rare corner cases all of your "safe" Rust code is guaranteed to be free of the bugs we're talking about.
Rust feels as pleasurable as high-level languages with garbage collection (who am I kidding by saying JavaScript is pleasurable?), yet being as fast and as native as other low-level compiled languages.
With that in mind, let's look at some goodies Rust has.
In Rust, there are very clear rules about which piece of code owns a resource. In the simplest case, it's the block of code that created the object representing the resource. At the end of the block the object is destroyed and the resource is released. The important difference here is that the object is not some kind of a "weak reference" that is easy to "just forget". While internally the object is just a wrapper for the exact same reference, from the outside it appears to be the resource it represents. Dropping it — that is, reaching the end of the code that owns it — automatically and predictably releases the resource. There is no way to “forget to do it” — it is done for you, automatically, in a predictable and fully specified manner.
(At this point you might be asking yourself why I am describing these trivial, obvious things instead of just telling you that smart guys call it RAII. Okay, you're right. Let’s proceed.)
This concept works fine for temporary objects. Say, we need to write some text into a file. The dedicated block of code (say, a function) would open a file — getting a file object (that wraps a file descriptor) as a result — then do some work with it, then at the end of the block the file object would get dropped and the file descriptor closed.
But in many cases this concept doesn't work. You may want to pass your resource to someone else, share it among several "users" or even between threads.
Let's go over these. First, you may want to pass the resource to someone else — transfer ownership — so that it’s them who now own the resource, do whatever they want with it and, perhaps more importantly, are responsible for releasing it.
Rust supports this very well — in fact, this is what happens to resources by default when you give them to someone else.
fn print_sum(v: Vec<i32>) {
println!("{}", v[0] + v[1]);
// v is dropped and deallocated here
}
fn main() {
let mut v = Vec::new(); // creating the resource
for i in 1..1000 {
v.push(i);
}
// at this point, v is using
// no less than 4000 bytes of memory
// -------------------
// transfer ownership to print_sum:
print_sum(v);
// we no longer own nor anyhow control v
// it would be a compile-time error to try to access v here
println!("We're done");
// no deallocation happening here,
// because print_sum is responsible for everything
}
The process of transferring ownership is also called moving, because resource is moved from the old location (say, a local variable) to the new location (a function argument). Performance-wise, it's only the "weak reference" being moved, so everything is still blazing fast; yet to the code it seems like we actually moved the whole resource to the new place.
Moving is different from copying. Under the hood, they both mean copying the data (which in this case would be the "weak reference", if Rust allowed copying resources), but after a move, the contents of the original variable are considered no longer valid or important. Rust actually pretends the variable is "logically uninitialized" — that is, filled with some garbage, like those variables that were just created. It is forbidden to use such variable (unless you re-initialize it with a new value). When it gets dropped, there is no resource deallocation: whoever owns the resource now is responsible for cleaning up when they're done.
Moving is not limited to passing arguments. You can move to a variable. You can move to the "return value" — or from the return value — or from a variable, or a function argument, for that matter. Basically, it's everywhere where there is an explicit or implicit assignment.
While move semantics can be the perfectly reasonable way to deal with a resource — and I'm going to demonstrate it in a moment — for plain old primitive (numeric) variables they would be a disaster (imagine not being able to copy one int
value to another!). Fortunately, Rust has the Copy
trait. Types that implement it (all the primitive ones do) use copy semantics when assigning, all the other types use move semantics. Pretty straightforward. You can implement Copy
trait for your own type if you want it to be copied — that’s an opt-in.
fn print_sum(a: i32, b: i32) {
println!("{}", a + b);
// the copied a and b are dropped and deallocated here
}
fn main() {
let a = 35;
let b = 42;
// copy the values and transfer
// ownership over the copies to print_sum:
print_sum(a, b);
// we still retain full control over
// the original a and b variables here
println!("We still have {} and {}", a, b);
// the original a and b are dropped and deallocated here
}
Now, why would move semantics ever be useful? It's all so perfect without them. Well, not quite. Sometimes it’s the most logical thing to do. Consider a function (like this one) that allocates a string buffer and then returns it to the caller. The ownership is transferred, and the function doesn’t care about the buffer’s fate anymore, whereas the caller gets full control over the buffer, including being responsible for its deallocation.
(It's the same in C. Functions like strdup()
would allocate memory, hand it to you, and expect you to manage and eventually deallocate it. The difference is that it’s just a pointer and the most they can do is ask/remind you to free()
it when you’re done — and the linked documentation above almost fails to do it — whereas in Rust it’s an unalienable part of the language.)
Another example would be an iterator adapter like this one that consumes the iterator it gets, so it would make no sense to access the iterator afterwards anyway.
The opposite question is under which circumstances we would need to have multiple references to the same resource. The most obvious use case is when you're doing multithreading. Otherwise, if all of your operations are performed sequentially, move semantics might almost always work. Still, it would be very inconvenient to move things back and forth all the time.
Sometimes, despite the code being run strictly sequentially, it still feels like there are several things happening simultaneously. Imagine iterating over a vector. The iterator could transfer you the ownership over the vector in question after the loop is done, but you wouldn't be able to get any access to the vector inside the loop — that is, unless you kick around the ownership between your code and the iterator on the each iteration, which would be a terrible mess. It also seems like there would be no way to traverse a tree without destructuring it onto the stack — and then constructing it back, provided that you want to do something else with it afterwards.
And we wouldn't be able to do multithreading. And it’s not convenient. Even ugly. Thankfully, there is another cool Rust concept that is going to help us. Enter borrowing!
There are multiple ways to reason about borrowing:
It allows us to have multiple references to a resource while still adhering to the "single owner, single place of responsibility" concept.
References are similar to pointers in C.
A reference is an object too. Mutable references are moved, immutable ones are copied. When a reference is dropped, the borrow ends (subject to the lifetime rules, see the next section).
In the simplest case references behave "just like" moving ownership back and forth without doing it explicitly.
Here's what I mean by the last one:
// without borrowing
fn print_sum1(v: Vec<i32>) -> Vec<i32> {
println!("{}", v[0] + v[1]);
// returning v as a means of transferring ownership back
// by the way, there's no need to use "return" if it's the last line
// because Rust is expression-based
v
}
// with borrowing, explicit references
fn print_sum2(vr: &Vec<i32>) {
println!("{}", (*vr)[0] + (*vr)[1]);
// vr, the reference, is dropped here
// thus the borrow ends
}
// this is how you should actually do it
fn print_sum3(v: &Vec<i32>) {
println!("{}", v[0] + v[1]);
// same as in print_sum2
}
fn main() {
let mut v = Vec::new(); // creating the resource
for i in 1..1000 {
v.push(i);
}
// at this point, v is using
// no less than 4000 bytes of memory
// transfer ownership to print_sum and get it back after they're done
v = print_sum1(v);
// now we again own and control v
println!("(1) We still have v: {}, {}, ...", v[0], v[1]);
// take a reference to v (borrow it) and pass this reference to print_sum2
print_sum2(&v);
// v is still completely ours
println!("(2) We still have v: {}, {}, ...", v[0], v[1]);
// exacly the same here
print_sum3(&v);
println!("(3) We still have v: {}, {}, ...", v[0], v[1]);
// v is dropped and deallocated here
}
Let's see what’s going on here. First, we could get away with always transferring ownership — but we’re already convinced that’s not what we want.
The second one is more interesting. We take a reference to the vector, then pass it to the function. Just like in C, we explicitly dereference it to get to the object behind it. Since there is no complicated lifetime stuff going on, the borrow ends as soon as the reference is dropped. While it otherwise looks just like the first example, there is an important difference. The main function is responsible for the vector all the time — it's just a bit limited in what it can do to the vector while it’s borrowed. In this example the main function doesn’t have a chance to even observe the vector while it’s borrowed, so it’s not a big deal.
The third function combines the nice parts of the first one (no need to dereference) and the second one (no messing with ownership). It works due to Rust auto-dereferencing rules. Those are a little complicated, but for the most part they allow you to write your code almost as if references were just the objects they point to — thus being similar to C++ references.
Out of the blue, here is another example:
// takes v by (immutable) reference
fn count_occurences(v: &Vec<i32>, val: i32) -> usize {
v.into_iter().filter(|&&x| x == val).count()
}
fn main() {
let v = vec![2, 9, 3, 1, 3, 2, 5, 5, 2];
// borrowing v for the iteration
for &item in &v {
// the first borrow is still active
// we borrow it the second time here!
let res = count_occurences(&v, item);
println!("{} is repeated {} times", item, res);
}
}
You don't need to care what is happening inside the count_occurrences()
function, suffice it to say that it borrows the vector (again, without moving it). The loop is borrowing the vector too, so we have two borrows being active at the same time. After the loop ends, the main function drops the vector.
(I am going to be a bit evil. I mentioned multithreading as a primary reason to have references, yet all the examples I show are single-threaded. If you are really interested, you can get some details on multithreading in Rust here and here.)
Acquiring and dropping references seems to work as if there was garbage collection involved. This is not the case. Everything is done at compile-time. To accomplish this, Rust needs one more magical concept. Let's consider this sample code:
fn middle_name(full_name: &str) -> &str {
full_name.split_whitespace().nth(1).unwrap()
}
fn main() {
let name = String::from("Harry James Potter");
let res = middle_name(&name);
assert_eq!(res, "James");
}
It works, while this doesn't:
// this does not compile
fn middle_name(full_name: &str) -> &str {
full_name.split_whitespace().nth(1).unwrap()
}
fn main() {
let res;
{
let name = String::from("Harry James Potter");
res = middle_name(&name);
}
assert_eq!(res, "James");
}
First, let me clarify the confusion with string types. A String
is an owned string buffer, and a &str
— a string slice — is a "view" into someone else's String
, or into some other memory (it doesn’t really matter here).
To make it even more obvious, let me write something similar in pure C:
(Unrelated note: in C, you cannot have a "view" into a middle of a string, because marking its end would require changing the string, so we're limited to only finding the last name here.)
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
const char *last_name(const char *full_name)
{
return strrchr(full_name, ' ') + 1;
}
int main() {
const char *buffer = strcpy(malloc(80), "Harry Potter");
const char *res = last_name(buffer);
free(buffer);
printf("%s\n", res);
return 0;
}
You see it now? The buffer is dropped and deallocated before the result is used. That's a trivial example of use-after-free. This C code compiles and runs just fine provided that the printf()
implementation doesn’t immediately reuse the memory for something else. Still, in a less trivial example it would be a source of crashes, bugs and security vulnerabilities. That’s exactly what we talked about before introducing ownership.
You wouldn't even be able to compile it in Rust (the Rust code above, I mean). This static analysis machinery is built right into the language and works via lifetimes.
Resources in Rust have lifetimes. They live from the moment they are created to the moment they are dropped. The lifetimes are usually thought of as being scopes, or blocks, but that is not actually an accurate representation because a resource can be moved between blocks, as we have already seen. It's not possible to have a reference to an object that hasn’t yet been created or has already been dropped, and we’ll soon see how this requirement is enforced. Otherwise, it’s all pretty obvious and not really different from the concept of ownership.
So here's the hard part. References, among other objects, have lifetimes too, and those can be different to the lifetime of the borrow they represent (called the associated lifetime).
Let me rephrase it. A borrow may last longer than the reference it is controlled by. That is generally because it's possible to have another reference that is dependent on the borrow being active — either borrowing the same object or its part, like a string slice in the example above.
In fact, each reference remembers the lifetime of the borrow it represents — that is, there is a lifetime attached to each and every reference. Like all the "borrow checking"-related things, this is done at compile time and accounts for exactly zero runtime overhead. Unlike other things, you must sometimes specify lifetime details explicitly.
With all of that said, let's dive right in:
fn middle_name<'a>(full_name: &'a str) -> &'a str {
full_name.split_whitespace().nth(1).unwrap()
}
fn main() {
let name = String::from("Harry James Potter");
let res = middle_name(&name);
assert_eq!(res, "James");
// won't compile:
/*
let res;
{
let name = String::from("Harry James Potter");
res = middle_name(&name);
}
assert_eq!(res, "James");
*/
}
We didn't have to explicitly denote lifetimes in the previous examples because those were trivial enough for the Rust compiler to automatically figure out (see lifetime elision for details). Here we’ve done it anyway in order to demonstrate how they work.
The <>
thing means that the function is generic over a lifetime we call a, that is, for any input reference with an associated lifetime it would return another reference with the same associated lifetime. (Let me remind you again that an associated lifetime means the lifetime of the borrow, not that of the reference.)
It might not be immediately obvious as to what it means in practice, so let's look at it the reverse way. The returned reference is being stored in the res
variable which lives for the whole scope of main()
. That is the lifetime of the reference, so the borrow (the associated lifetime) lives at least as long. This means that the associated lifetime of the function input argument must have been the same, so we can conclude that name
must be borrowed for the whole function. And this is exactly what happens.
In the use-after-free example (commented out here) the lifetime of res
is still the whole function, whereas name
just "doesn't live long enough" for the borrow to last the whole function. This is the exact error you would get if you try to compile this code.
So what happens is Rust compiler tries to make the borrow lifetime as short as possible, ideally ending as soon as the reference is dropped (this is "the simplest case" I was talking about at the beginning of the Borrowing section). The constraints like "this borrow lives as long as that one" — working in the reverse way, from the lifetime of the result to that of the original borrow — drag the lifetime to be longer and longer. This process stops as soon as all the constraints are satisfied, and if it's impossible to achieve you’re left with an error.
Oh, and you can't fool Rust by saying your function returns a borrowed value with a completely unrelated lifetime, because then you would get the same "does not live long enough" error within the function, since that unrelated lifetime can be a lot longer than the input one. (OK, I’m lying. Actually, the error would be different, but it’s nice to think it’s the same one.)
Let's go over this example:
fn search<'a, 'b>(needle: &'a str, haystack: &'b str) -> Option<&'b str> {
// imagine some clever algorithm here
// that returns a slice of the original string
let len = needle.len();
if haystack.chars().nth(0) == needle.chars().nth(0) {
Some(&haystack[..len])
} else if haystack.chars().nth(1) == needle.chars().nth(0) {
Some(&haystack[1..len+1])
} else {
None
}
}
fn main() {
let haystack = "hello little girl";
let res;
{
let needle = String::from("ello");
res = search(&needle, haystack);
}
match res {
Some(x) => println!("found {}", x),
None => println!("nothing found")
}
// outputs "found ello"
}
The search()
function accepts two references with totally unrelated associated lifetimes. While there is a constraint on the haystack
, the only thing we require about the needle
is that the borrow must be valid while the function itself is executed. After it's done, the borrow immediately ends and we can safely deallocate the associated memory, while still keeping the function result around.
The haystack
is initialized with a string literal. Those are string slices of type &'static str
— a "borrow" that is always "active". Thus we are able to keep the res
variable around for as long as we need it. This is an exception to the borrow lasts as short as it can rule. You can think of it as of another constraint on the “borrowed string” — the string literal borrow must last for the whole execution time of the program.
Finally, we're returning not the reference itself, but a compound object to which it is an internal field. This is totally supported and doesn’t influence our lifetime logic.
So in this example, the function accepted two arguments and was generic over two lifetimes. Let's see what happens if we force the lifetimes to be the same:
fn the_longest<'a>(s1: &'a str, s2: &'a str) -> &'a str {
if s1.len() > s2.len() { s1 } else { s2 }
}
fn main() {
let s1 = String::from("Python");
// explicitly borrowing to ensure that
// the borrow lasts longer than s2 exists
let s1_b = &s1;
{
let s2 = String::from("C");
let res = the_longest(s1_b, &s2);
println!("{} is the longest if you judge by name", res);
}
}
I've made an explicit borrow outside the inner block so that the borrow has to last for the rest of main()
. That is clearly not the same lifetime as &s2
. Why is it OK to call the function if it only accepts two arguments with the same associated lifetimes?
Turns out that associated lifetimes are a subject to type coercion. Unlike in most languages (at least those known to me) primitive (integer) values in Rust do not coerce — you have to always cast them explicitly. You can still find coercion in some less obvious places, like these associated lifetimes and dynamic dispatch with type erasure.
I'm going to bring this piece of C++ code for comparison:
struct A {
int x;
};
struct B: A {
int y;
};
struct C: B {
int z;
};
B func(B arg)
{
return arg;
}
int main() {
A a;
B b;
/* this works fine:
* a B value is a valid A value
* to put it another way, you can use a B value
* whenever an A value is expected
*/
a = b;
/* on the other hand,
* this would be an error:
*/
// b = a;
// this works just fine
C arg;
A res = func(arg);
return 0;
}
Derived types coerce to their base types. When we're passing an instance of C
, it coerces to B
, only to be returned back, coerced to A
and then stored in the res
variable.
Similarly, in Rust longer borrows can coerce to be shorter. It won't affect the borrow itself, but only make it accepted wherever a shorter borrow is wanted. So you can pass a function a borrow with a longer lifetime than it expects — it will be coerced — and you can coerce the borrow it returns to be even shorter.
Considering this example one more time:
fn middle_name<'a>(full_name: &'a str) -> &'a str {
full_name.split_whitespace().nth(1).unwrap()
}
fn main() {
let name = String::from("Harry James Potter");
let res = middle_name(&name);
assert_eq!(res, "James");
// won't compile:
/*
let res;
{
let name = String::from("Harry James Potter");
res = middle_name(&name);
}
assert_eq!(res, "James");
*/
}
One would often wonder whether such function declaration means that the argument's associated lifetime must be (at least) as long as the return value’s — or vice versa.
The answer should be obvious now. To the function, both lifetimes are exactly the same. But due to coercion, you can pass it a longer borrow and even possibly shorten the associated lifetime of the result after you obtain it. Thus the right answer is — argument must live at least as long as the return value.
And if you create a function that takes several arguments by reference and declare they must be of an equal associated lifetime — like in our previous example — the actual arguments the function will be given would be going to be coerced to the shortest lifetime among them. It simply means that the result can't outlive any of the argument borrows.
This plays nicely with the reverse constraints rule we were talking about earlier. The callee does not care — it just gets and returns borrows of the same lifetime. The caller, on the other hand, makes sure that arguments' associated lifetimes are never shorter than that of the result, achieving it by extending them.
You can't move out of a borrowed value, because after the borrow ends the value must stay valid. You can’t move out of it even if you move something back in the very next line. But there is mem::replace()
that lets you do both at the same time.
If you want an owning pointer — something like unique_ptr
in C++, there is the Box type.
If you want some basic reference counting — like shared_ptr
and weak_ptr
in C++, there is this standard module.
If you really really need to get around the restrictions Rust puts on you, you can always resort to unsafe code.
Thanks to Meredith Summer for proofreading this post.
]]>fork()
, and it is — you've guessed it — the Mach task API.
The reason fork()
is so useful on Unix is because Unix has virtually no API to control other processes. Most of the syscalls work on the calling process, so you cannot manipulate file descriptors and memory mappings of another process, you cannot make another process chroot or drop capabilities, et cetera.
The only way to workaround these limitations is to run code controlled by you on behalf of, and in the context of, another process. One hack to do this is ptrace
'ing the other process; but this is first of all, a hack, second, it requires the process to already have a mostly sane internal state, e.g. mapped and initialized libc, unless, of course, you choose to load your own temporary libc into its address space... The other way to do this is what fork()
lets you do: it's not technically the parent running its code in the child, but it's the same piece of code that controls both, so it can perform the exact manipulations that parent wants to be performed.
In contrast, all Mach task APIs (with very few exceptions) work on whatever task port you invoke them on. It can be mach_task_self()
, the calling task, or it may be not; having access to a task's task port is enough to fully control it. Of course, you can only get access to another task's task port under some controlled circumstances, but creating the other task is one of them.
Upon being created with task_create()
, a new task has no threads, so it's not running. The parent task gets the task port for the new task, and using this task port, it can set the new task up any way it deems necessary, using the exact same APIs that the task would use to set up itself, but passing child_task
instead of mach_task_self()
. Then, when everything's ready, it can create and start the initial thread in the child task.
Mach still has support for having the new task inherit the virtual memory from the parent task. I'm not sure why, perhaps for making it possible to efficiently implement fork()
on top of it. Hurd does that, but it still has to copy port rights to the child task, and that's multiple context switches between the userspace (parent task) and the kernel, so that's where Hurd's fork()
is slow.
Mach API is somewhat cumbersome and nowhere as convenient to use as a fork()
/exec()
pair; but this is easily fixable with library wrappers.
P.S. I'm guessing other capability-based microkernels have similar APIs for manipulating tasks as well, and I'm definitely planning to learn more about them. In particular, I'm interested in seL4 — expect some posts about it!
]]>