2016-10-06

The Arduous Journey of Porting C to Rust

Hey, have you heard the great news about our lord and saviour Rust?

Jokes aside, Rust is a seriously good language and is worthy of the endless evangelism you've probably seen around the internet. If you already know a fair amount of Rust, great! If not, oh dear.

This blog post is going to be aimed at programmers who are fans of Rust, and want to spread it until it blocks out the sun. C knowledge would've probably been useful to have whilst I was porting it, but since I didn't know it you probably don't need to know much either.

The first question you probably want to ask yourself is, why even port C to Rust? If you're already a big Rust fan, you could probably list a ton of reasons off the top of your head, but in the end they all boil down to one thing:

Safety. Safety is a big, huge, mega-deal. Safety is the reason Rust's so rad. Rust was made by very smart language designers to deal with the kinds of problems well versed to very smart language designers. In particular, a major goal of Rust is to eliminate an entire class of software bugs.

How does it do this? By being clever.

...What? I'm not smart enough to explain the intricate details of Rust. Plenty of other people have done it better than I could already.

Right, that's enough talking about great languages, let's talk about C.

C is a fractal minefield. On top of each of C's footgun-shaped mines, there's another complete minefield of smaller mines on top of it. No matter how far you go down you will still find more and more footgun mines. All C code in production is a precarious network of support beams that could topple into an explosion of segfaults, memory corruption, faulty protocols, and security vulnerabilities at any second.

To sum it up, reducing the amount of C code in the world is a very good thing to do. Before Rust, there weren't many great alternatives better than C itself, but once Rust stepped in, a perfect candidate was born.

So to answer my earlier question: Yes! Porting C to Rust is almost always an excellent idea. The only thing to consider now is, is it worth it?

Porting C to Rust is not easy. Even if an entire codebase of C follows sensible programming standards and paradigms (which is extremely rare), Rust tends to do things very differently, simply due to how it has more tools under it's belt. Rust doesn't need to string together horse hair to make rope when it's got carbon nanotubes pouring out of its ears.

So, what makes a C project worth porting? This is largely a context sensitive question to ask, but generally if it's used a lot and is of poor quality, porting it to Rust will leave it in a far more maintainable state. Other reasons include ease of use and the safety of Rust's multithreading capabilities, and having far simpler access to libraries. Even though Rust's ecosystem is young, the experience of using Rust libraries is far superior to that of using C libraries.

Once you've figured out what you want to port and why you want to port it, you can finally take step 0:

Getting your feet wet🔗

You ain't gonna be a C porting whizz in a single day. Hell, first you have to figure out how you're even going to go about porting, which just itself can be pretty hard. I'm far from an expert in this, so I'll be illustrating my thought process over time:

"Okay, let's do this bottom-up! I'm going to try and figure out how to replace functions one by one!"

Two weeks of suffering later...

"Augh! Hmm, let's try porting from the top down this time, maybe that'll go better. See, if I redefine all the top level functions and have them just call their C counterparts until I port them, I can just... Wait, no. Hmm. If i-- Dammit, no_mangle. But-- ugh. Yeah, this ain't working."

A closer look at the bottom-up approach later...

"Oh, hey, I just have to do some basic things to get this horrible build chain to stuff my Rust code into it, now we might be getting somewhere!"

Now to go into detail about what I did that finally worked!

On the Rust side: Make a new cargo project (I imaginatively named mine "rustport") and change it to a dylib:

[package]
# ...

[lib]
crate-type = ["dylib"]

[dependencies]
# ...

On the C side: Oh jeez, I have no idea what your project is up to so this part is hard. Assuming you have a ./configure, and a Makefile, it will probably look like this:

./configure LDFLAGS="-L/absolute/path/to/rustport/target/release/ -lrustport"
make

(Remember to run cargo build --release on the Rust side first)

What this does is kinda-inject the librustport.so built by cargo into the building of whatever C code you're porting. This is gonna enable all the cool stuff we're about to do.

Before anything, we gotta understand how dynamic linking works. I don't. You might not either, so the basics are that if a function signature is declared in a .h, it'll first look for it in a .c file somewhere, but if it's not there (because you deleted it, mwahaha) it'll instead look for it in the libraries it's linked with! This essentially means we don't even need to write any header files ourselves, and can just piggy-back off of the original ones.

Pick out your first function to port. A good choice would be one that just does a bit of maths, and doesn't rely on anything else. (It's not impossible to port functions that need other C code, but it's more complicated)

Once you've picked one, delete it from the C code! Poof, boosh, it's gone forever (not really). Say you're porting this abnormally simple C function:

float sq(float x) {
    return x * x;
}

If it's defined in maths.c, what I'd port it to look like in Rust is:

// src/lib.rs
extern crate libc; // you will inevitably need this

// if it's not pub, it wont be externally visible. i think.
pub mod maths;
// src/maths.rs
#[no_mangle] // needed for C code to know what the heck this is
pub unsafe extern fn sq(x: f32) -> f32 {
    // patience! no need to rush this complicated implementation!
    unimplemented!()
}

Once this is down, recompile the Rust and run make clean; make (probably) on the C side. If all's gone well, it'll be happy with a beheaded sq() function that's ready for us to replace.

Running stuff that relies on this Rust-augmented C library is a bit more work, but this will probably do it:

# tell everything you run in this shell (yes, everything. it
# probably won't cause issues) to use your freshly compiled
# library .so instead of whatever it was using before
set -x LD_PRELOAD /absolute/path/to/c/libwhatever.so
# add our rust library to the list of places the dynamic
# linker looks to search
set -x LD_LIBRARY_PATH /absolute/path/to/rustport/target/release/
./run_your_whatever

(Sorry, I'm a big fan of fish-shell)

If all goes well after running this, it'll probably work for a while and then panic with the following message as soon as anything needs squaring:

thread 'main' panicked at 'not yet implemented'
(and something about not being able to unwind properly because c)

Success! I know a panic doesn't seem like much of a success at first, but take a moment to think about what we just did: We buried a piece of Rust code deep down in the call chain of an application that might be going from Python, through Swig, into C++, etc, but it still gets all the way to your Rust code!

With this function it's a simple case of replacing unimplemented!() with x * x, but you can easily see how the process would continue on for the rest of the C code you want to port.

Part 1: Racing along🔗

I highly recommend starting off by porting file by file, by making a new module for each .c file you're porting and reimplementing all the functions in it. Once you're done with all the functions and the .c file is empty, delete it entirely! (and make sure to remove any references to it in makefiles etc)

Tada, you've ported an entire file, and C code is crumbling all around you! Now, not all C code is as trivial as what I showed you so here's some helpers I learned for quick and easy C translations:

void take_ptr(void *thing) {
    // ...
}

Oh no! An evil void pointer! Unfortunately it's pretty hard to refactor things into nicer code before you've already ported it, so our Rust code will be using void pointers until enough code is ported to change it. Rust has no concept of void pointers (thankfully), but the libc crate has something that acts just like it!

#[no_mangle]
pub unsafe extern fn take_ptr(thing: *mut libc::c_void) {
    // ...
}

Pointers and pointer arithmetic is a staple of C. It's also horrible. Let's see how it looks in C and then in Rust:

*c = 123;
int thing = *pointer;
take_ptr(&thing);
object->foob(object->barb);
array_ptr[thing];

Whew, what a mess. Now in Rust:

*c = 123; // whoa, identical!
let thing = *pointer; // note that deferencing raw pointers can only
                      // be done in unsafe code. actually it's one
                      // of the only things unsafe changes!
take_ptr(&mut thing as *mut _); // not pretty :(
((*object).foob)((*object).barb); // just more explicit in general
*array_ptr.offset(thing as isize); // see?

See the Rust docs for more about raw pointers.

What about structs? They're harder when using undefined sizes (like int, which can be 4 or 8 bytes) but with the explicitly sized things it's easier. Note that these tend to be in header files. Also note that if the header file definition is empty it's an opaque struct and you get to do whatever you want! Yay!

// example shamelessly stolen from wikipedia
typedef struct {
    int account_number;
    char *first_name;
    char *last_name;
    float balance;
} account;

Assuming we're on a 64-bit system and nobody's messed with type definitions:

#[repr(C)]
pub struct account {
    account_number: i64,
    first_name: *mut u8, // fun fact: the signedness of chars
                         // in c is undefined!
    last_name: *mut u8,
    balance: f32,
}

This should end up using the same exact memory representation as the C version, assuming the moon is in the right phase and the gods of porting are benevolent. Note that no #[no_mangle] is needed because the struct name itself never needs to cross the FFI boundary. It's just setting how the memory is laid out, so as long as the memory is the same you'll get the same struct.

Next up, those weird C functions that try to be object oriented but fail terribly. Which is lucky for us because actual OOP would be a nightmare to port.

float get_nth_x(Point *self, int n) {
    return self->array[n].x;
}

Skipping over a bit of struct definitions, it'll end up like this:

#[no_mangle]
pub unsafe extern fn get_nth_x(self_: *mut Point, n: i64) -> f32 {
    // this assert isn't in the c version but you'll always want it
    assert!(!self_.is_null());
    let self_ = &mut *self_; // not needed for such a short
                             // function, but it's nice to
                             // shadow the raw ptr with a mut ref
    (*self_.array.offset(n as isize)).x
}

That's a few of the handy tips I've learned along the way, so I hope they help you on your journey! For more of the basic FFI elements be sure to check out The FFI Omnibus. It's a great resource, albeit limited.

Part 2: Making everything Rustic🔗

When porting C code 1:1, you are inevitably going to get a horrible result. I know I hate looking at my ported code, eugh. Luckily, once you have enough ported, you get to Rustify it! Obviously this works best if you're already know your Rust.

I can't actually give you many tips on doing this as it's a very broad topic (as broad as Rust itself!), but in general what your code should look like after porting is: A nice super-rustic inner layer with no unsafe at all, and an FFI layer which is just a bunch of functions that expose the same functionality as before, but this time using your new fancy clean internals.

Warning: This will take a LOT OF TIME, but the results will be worth it. You'll probably fix a whole bunch of bugs that haven't even been discovered yet, simply by doing things the Rust way. If you make sure to do this every time you have enough ported to Rustify stuff, you won't even have much left to do once you're done porting!

Speaking of:

Part 3: Once you're done Porting🔗

I... Uh... Okay, I'll be honest, I haven't gotten to this step yet. I'm still halfway through the C library I'm currently porting, so I'm not sure what I'll do once I'm done. I definitely want to end up in a way that it can just be an easily swappable .so library, but I'm not sure how I'll manage that. I've got my eye on Cheddar when it comes to generating header files, but apart from that I'm not sure.

I will probably write a part 2 of this blog post once I'm done! Uhh, whenever that is.

Further reading: Baby Steps: Slowly Porting musl to Rust by Adam Perry

r/rust thread, some good tips in there!

2020 update: I ended up giving up :(