Unsafe Rust

CIS 198 Lecture 12


Unsafe Rust

  • Rust’s safety checks are convenient, but highly conservative.
  • Some Rust programs are completely safe, but the compiler won’t accept them.
  • Rust has “unsafe” parts to allow you to escape some of the compiler’s restrictions.
    • Sometimes you need to “just frob some dang bits.” -Gankro

Unsafe Rust

  • Unsafe Rust is useful for:
    • Talking to other languages (e.g. C)
    • Writing some low-level abstraction that std doesn’t provide
    • Using particular types that convey no ownership semantics
    • Writing std itself!

Unsafe Rust

  • All unsafe code must be delimited with the unsafe keyword.
  • Unsafe code can be wrapped in an unsafe block.
  • Entire functions can be marked as unsafe to call.
    • Code which calls unsafe functions must be in unsafe blocks.
  1. unsafe fn foo() { }
  2. fn main() {
  3. unsafe {
  4. foo();
  5. }
  6. }
  • Traits can be designated unsafe.
    • Thus, their impls must be also marked as unsafe.
  1. unsafe trait Sync { }
  2. unsafe impl Sync for Foo { }

Unsafe vs. Safe

  • “Safe” operations are ones that don’t break any of Rust’s ownership, type-checking, or memory safety rules.
  • These are dangerous, but are still considered “safe” in Rust:
    • Deadlocks
    • Memory leaks
    • Integer overflow
    • Exiting without running destructors

Unsafe vs. Safe

  • “Unsafe” operations are ones that might have dangerous effects that the Rust compiler cannot guard against by analysis.
  • These are bad, and are only possible in unsafe Rust:
    • Data races
    • Dereferencing invalid pointers
    • Reading uninitialized memory
    • Creating invalid primitives
    • etc.

Unsafe Rust

  • What does unsafe let you do?
    • Dereference raw pointers
    • Mutate static variables
    • Implement unsafe traits
    • Call unsafe functions
  • That’s it!
  • unsafe does not disable the borrow checker or otherwise let you do anything that would break regular Rust semantics.
  • Everything that’s normally safe is still safe in an unsafe context.
  • Everything else that the compiler would reject is still disallowed.

Where Is unsafe Used?

  • All over the place!
  • Many, many types in std are built on unsafe in some way.
    • Vec
    • Cell/RefCell
    • Box
    • Rc/Arc
    • Send/Sync
    • HashMap
    • Lots and lots of other types

Correctness of Unsafe Code

  • If std is built on unsafe, how can it possibly be correct?
  • unsafe is used in very careful ways within std.
  • There’s no formalization of std‘s safety, so you pretty much need to just trust that std is implemented correctly.
    • Rustbelt is an ongoing project to formally verify the encapsulation of unsafe code. Seeking PhD students!

Correctness of Unsafe Code

  • As it turns out, using unsafe pollutes the safety of everything that can directly modify data that the unsafe code depends on.
    • And if not written correctly, safe functions which use unsafe will pollute the safety of any code which uses it.
  • For example, take this (simplified) definition of Vec.
    • ptr is a pointer to the data the Vec owns.
    • cap is the Vec‘s capacity, len is its length.
    • len should never be greater than cap, but this is not enforced.
  1. pub struct Vec<T> {
  2. ptr: *mut T,
  3. cap: usize,
  4. len: usize,
  5. }

Correctness of Unsafe Code

  • Now consider this new method on Vec, written in totally safe code.
  • Despite this code’s total safety, it does a dangerous thing.
    • The length of the Vec is now potentially longer than its capacity.
  • This method allows clients of Vec to access uninitialized memory!
  1. impl<T> Vec<T> {
  2. // ...
  3. pub fn evil(&mut self) {
  4. self.len += 2;
  5. }
  6. }

¹ Example taken from The Scope of Unsafe


Borrow Splitting

  • Mutual exclusivity of mutable references is safe, but can be very limiting.
  • Struct borrows can be split to allow multiple mutable borrows, as the borrow checker understands structs well enough:
  1. struct Point {
  2. x: i32,
  3. y: i32,
  4. }
  5. let mut p = Point { x: 0, y: 1 };
  6. let x1 = &mut p.x;
  7. let y1 = &mut p.y;
  8. *x1 += 1;
  9. *y1 += 1; // OK

Borrow Splitting

  • However, arrays and slices cause problems.
  • While the borrow checker could understand this, more complex examples with trees, etc., are unfeasible.
  1. let mut arr = [1, 2, 3];
  2. let x = &mut arr[0];
  3. let y = &mut arr[1];
  4. // ^ cannot borrow `arr[..]` as mutable more than once at a time
  5. let z = &arr[2];

Borrow Splitting

  • Unsafety can allow us to carefully overcome this limitation.
  • Slices can be split into parts and subsequently lent out.
  • std provides a function split_at_mut, that looks like this:
  1. fn split_at_mut(&mut self, mid: usize) -> (&mut [T], &mut [T]) {
  2. let len = self.len();
  3. let ptr = self.as_mut_ptr();
  4. assert!(mid <= len);
  5. unsafe {
  6. (from_raw_parts_mut(ptr, mid),
  7. from_raw_parts_mut(ptr.offset(mid as isize), len - mid))
  8. }
  9. }
  • This is a classic example of how to use unsafe.
    • We know this function works safely, but the borrow checker can’t validate this property.
    • Unsafe code can be used as a careful escape hatch.

Raw Pointers

  • Rust defines two types of C-like “raw pointers”:
    • *const T, an immutable pointer to a T
    • *mut T, a mutable pointer to a T
  • Raw pointers are unsafe to dereference.
  • Raw pointers confer no ownership semantics, just like C pointers.
  • Raw pointers may be initialized to null with null() and null_mut(), respectively.
    • These are actually safe - only dereferencing is unsafe.

Raw Pointers

  • Raw pointers may be (safely) created in several ways:
  1. // Reference coercion
  2. let mut x = 0i32;
  3. let const_ptr = &x as *const i32;
  4. let mut_ptr = &mut x as *mut i32; // Allowed to alias pointers
  5. let box_y = Box::new(1);
  6. let raw_y = &*box_y as *const i32; // Does not consume `box_y`

Raw Pointers

  • Raw pointers may be (safely) created in several ways:
  1. // Box consumption
  2. let raw_y_2 = Box::into_raw(box_y);
  3. // ... to properly clean up, later ...
  4. unsafe { drop(Box::from_raw(raw_y_2)); }
  • Raw pointers may need to be deallocated manually.
    • If you’ve taken ownership of the pointer, not if you’ve aliased it.
  • Raw pointers can, but should not, be used after their lifetime expires.
    • Doing so will cause a segmentation fault.

Unique

  • A wrapper around *mut T to indicate that the Unique struct owns the pointer.
  • Confers regular Rust ownership semantics, unlike *mut T.
  • Implies T should not be modified without a unique path to T.
  • Unsafe to create, dereference, etc.
  • Useful for building abstractions such as Box.
  • Currently unstable :(
    • Available on nightly behind a feature gate:
  1. #![feature(unique)]
  2. use std::ptr::Unique;
  3. let mut x = 0i32;
  4. let mut_ptr = &mut x as *mut i32;
  5. unsafe {
  6. let unique_ptr = Unique::new(mut_ptr);
  7. }

Shared

  • A wrapper around *mut T to indicate that the Shared struct has shared ownership of the pointer.
  • Confers ownership semantics, unlike *mut T.
  • Unsafe to create, dereference, etc.
  • Useful for building abstractions such as Rc, Arc.
  • Also currently unstable :(

UnsafeCell

  • A wrapper around a T that provides interior mutability.
  • Implies that T may be modified using unsafe operations.
  • A primitive cell type that is the basis for Cell/RefCell.

Uninitialized Memory

  • If you really, really want some uninitialized memory, you can make some with std::mem::uninitialized<T>().
    • This function just pretends to make a value of type T.
    • This function is unsafe to invoke.
  • Reading uninitialized memory always has undefined behaviour!
  • Writing to uninitialized memory may also be undefined.
    • The compiler believes the value is initialized, so it may try to drop a value that isn’t there, causing a panic.
  • The only safe way to initialize uninitialized memory is by using one of the functions from std::ptr.
    • write, copy, or copy_nonoverlapping

Leaking Memory

  • Leaking memory is a safe operation, and can be done with std::mem::forget().
  • You rarely ever want to do this, but you might want to if:
    • you have an uninitialized value that you don’t want to get dropped
    • you have two copies of a value, but only want to drop one to avoid a double free
    • you want to transfer a resource across the FFI boundary and into another language
  • This function is safe because Rust does not guarantee that destructors are always run.

Transmutation

  • Is the type system just too… pesky for you?
  • Want to turn any type into any other type as much as you please?
  • mem::transmute may be right for you!¹

¹Ask your compiler before using mem::transmute. Side effects may include your laundry being eaten by nasal goblins.


Transmutation

  • mem::transmute<T, U> takes a value of type T and reinterprets it to be of type U.
  • The only restriction on this function is that T and U must be the same size.
  • Arbitrary type transmutation can have wildly undefined behaviour!

Transmutation

  • Creating an instance of any type with invalid state causes arbitrary behaviour.
    • e.g. a Vec with a capacity smaller than its length.
    • The above may or may not be possible with an actual Vec, it’s just an example.
  • transmute can create invalid primitives, e.g. a bool that is neither 0 nor 1 underneath.
  • transmute can turn an &T into an &mut T, which is wildly, always undefined
  • transmute on a reference without an explicit lifetime produces an unbounded lifetime.

Transmutation

  • There’s also a variant named transmute_copy which is even more unsafe!
  • transmute_copy<T, U> copies size_of<U> bytes from an &T and interprets them as a U.
  • No more pesky size check like in regular transmute.
  • U may be larger than T, which causes undefined behaviour.

Unbounded Lifetimes

  • Unsafe code often produces a reference or lifetime out of nowhere.
    • These lifetimes are unbounded.
  • Commonly, these are produced from dereferencing a raw pointer.
  • Unbounded lifetimes expand to become as large as they need to be.
  • This may produce lifetimes more powerful than 'static.
  • Generally, you can think of these as 'static.
    • Almost no references are actually 'static, though.

Implementing Vec

  • Let’s take a deep dive into how std::vec::Vec is actually implemented.
  • This code will only compile on nightly, since it uses unstable features.
  • Warning: this is going to get pretty advanced.

¹All content taken from The Rustonomicon


Vec Layout

  • A Vec has three parts:
    • a pointer to the allocated data
    • the size of the allocation
    • the number of elements in the vector
  • Naively, this translates into this struct:
  1. pub struct Vec<T> {
  2. ptr: *mut T,
  3. cap: usize,
  4. len: usize,
  5. }
  • Simple, right?

Vec Layout

  • Sadly, this won’t quite work.
  • Some lifetime variance problems:
    • An &Vec<&'static str> can’t be used in place of an &Vec<&'a str>.
  • Some drop checker problems:
    • *mut T conveys no ownership.
    • The drop checker assumes we own no values of type T.

Vec Layout

  • Using a Unique in place of an *mut T solves these problems!
    • Unique conveys ownership.
    • It also lets us be Send & Sync if T is, and auto-derefs to *mut T.
    • It also is guaranteed to never be null, allowing null pointer optimizations.
  1. #![feature(unique)]
  2. use std::ptr::{Unique, self};
  3. pub struct Vec<T> {
  4. ptr: Unique<T>,
  5. cap: usize,
  6. len: usize,
  7. }

Vec Allocation

  • Now that we use a Unique to store our data, what does an empty Vec look like?
    • It can’t have a null pointer…
    • …but we don’t want to allocate any data!
  • Turns out, we can just fill the Vec with garbage! 🚮
  • cap is 0, so we don’t need to worry about accidentally accessing garbage.
  • std exposes a value as alloc::heap::EMPTY to represent this value.

Vec Allocation

  1. #![feature(alloc, heap_api)]
  2. use alloc::heap::EMPTY;
  3. impl<T> Vec<T> {
  4. fn new() -> Self {
  5. unsafe {
  6. assert!(mem::size_of::<T>() != 0);
  7. Vec { ptr: Unique::new(heap::EMPTY as *mut _),
  8. len: 0,
  9. cap: 0 }
  10. }
  11. }
  12. }

??? We need to assert that T is not a ZST, since apparently ZSTs will break everything horribly at this point in the implementation.


Vec Allocation

  • Now that we can allocate no space, we need to figure out how to allocate space.
  • The heap module is our friend here.
    • This lets us talk to Rust’s allocator (jemalloc by default).
  • We also need to figure out how to handle an out-of-memory condition.
    • panic! is no good, since it can cause allocations.
    • std usually executes an illegal instruction to crash the program.
  • We’ll define an out-of-memory error like so:
  1. fn oom() { ::std::process::exit(-9999); }

Vec Allocation

  • Our logic for growing the Vec is roughly this:
  1. if cap == 0 {
  2. allocate()
  3. cap = 1
  4. } else {
  5. reallocate()
  6. cap *= 2
  7. }
  • Unfortunately, this is not so easy in practice…

Vec Allocation

  • Because of some serious LLVM shenanigans, a Vec can only contain isize::MAX elements.
    • This also means we only care about byte-sized allocations, since e.g. isize::MAX u16s will only just fit in memory.
  • Zero-size types are also tricky to allocate due to LLVM.
  • tl;dr Read the ch. 10 in the Rustonomicon, this info is totally out of scope.

Vec Allocation

  • Time to actually allocate data.
  • The heap module exposes an allocate function that puts data into the heap.
  • It also exposes a reallocate function that takes an existing allocation and resizes it.
  • Our pseudocode from a few slides ago can now be implemented, with a few extra parts.

Vec Allocation

  1. fn grow(&mut self) {
  2. unsafe {
  3. let align = mem::align_of::<T>();
  4. let elem_size = mem::size_of::<T>();
  5. let (new_cap, ptr) = if self.cap == 0 {
  6. let ptr = heap::allocate(elem_size, align);
  7. (1, ptr)
  8. } else {
  9. let new_cap = self.cap * 2;
  10. let old_num_bytes = self.cap * elem_size;
  11. assert!(old_num_bytes <=
  12. (::std::isize::MAX as usize) / 2);
  13. let new_num_bytes = old_num_bytes * 2;
  14. let ptr = heap::reallocate(*self.ptr as *mut _,
  15. old_num_bytes, new_num_bytes, align);
  16. (new_cap, ptr)
  17. }
  18. };
  19. if ptr.is_null() { oom(); }
  20. self.ptr = Unique::new(ptr as *mut _);
  21. self.cap = new_cap;
  22. }

??? align_of specifies the alignment of a type. We can always assume self.cap < isize::MAX, so no need to check this when we assign new_cap. Assertion against old_num_bytes ensures we don’t overflow the vector’s capacity. If allocate or reallocate fails, we get null back.


Vec Push & Pop

  • Actual functionality! Woo!
  • push only needs to check if the Vec is full to grow, write to the next available index, and increment the length.
    • Be careful! This method shouldn’t read from the memory it writes to, since this is uninitialized.
    • Even calling v[idx] = x will try to drop the old value of v[idx].
  • To avoid reading uninitialized memory, use ptr::write.
  1. pub fn push(&mut self, elem: T) {
  2. if self.len == self.cap { self.grow(); }
  3. unsafe {
  4. ptr::write(self.ptr.offset(self.len as isize), elem);
  5. }
  6. self.len += 1;
  7. }

??? self.ptr.offset does pointer arithmetic to compute an offset from a pointer. The above usage writes to the memory slot just past the end of the vector’s last element.


Vec Push & Pop

  • With pop, the data being removed is initialized, so we can read it.
  • Unfortunately, just moving the value out doesn’t work.
    • This would leave memory uninitialized.
  • ptr::read does the trick here, copying the data out.
  1. fn pop(&mut self) -> Option<T> {
  2. if self.len == 0 {
  3. None
  4. } else {
  5. self.len -= 1;
  6. unsafe {
  7. Some(ptr::read(self.ptr.offset(self.len as isize)))
  8. }
  9. }
  10. }

Vec Deallocation

  • Now we want to be able to deallocate a Vec.
  • This actually requires a manual implementation of Drop.
  • The whole Vec can be deallocated by popping it down to zero, then calling heap::deallocate.
  • Obviously, we shouldn’t try to deallocate anything if the Vec has no memory allocated.

Vec Deallocation

  1. impl<T> Drop for Vec<T> {
  2. fn drop(&mut self) {
  3. if self.cap != 0 {
  4. while let Some(_) = self.pop {}
  5. let align = mem::align_of::<T>();
  6. let elem_size = me::size_of::<T>();
  7. let num_bytes = elem_size * self.cap;
  8. unsafe {
  9. heap::deallocate(*self.ptr as *mut _, num_bytes,
  10. align);
  11. }
  12. }
  13. }
  14. }

Vec Deref

  • How do you implementDeref & DerefMut for Vec?
  • For Vec<T>, these methods return an &[T]/&mut [T], respectively.
  • Basically, this boils down to calling ::std::slice::from_raw_parts(*self.ptr, self.len).

Vec Insert & Remove

  • Inserting into a Vec needs to shift all elements over to the right from the given index.
  • Fortunately, we have an easy way to do this using ptr::copy, which is like memmove from C.
  • ptr::copy copies some chunk of memory from here to there, and the source and destination may overlap.
  • Inserting at index i shifts [i .. len] to [i+1 .. len+1], using the old len.

Vec Insert & Remove

  1. pub fn insert(&mut self, index: usize, elem: T) {
  2. assert!(index <= self.len);
  3. if self.cap == self.len { self.grow(); }
  4. unsafe {
  5. if index < self.len {
  6. ptr::copy(self.ptr.offset(index as isize),
  7. self.ptr.offset(index as isize + 1),
  8. self.len - index);
  9. }
  10. ptr::write(self.ptr.offset(index as isize), elem);
  11. self.len += 1;
  12. }
  13. }

??? We’re allowed to insert at index == len, since that’s just push


Vec Insert & Remove

  • Removing elements by index just behaves in the opposite direction.
  • Shift all elements from [i+1 .. len+1] to [i .. len] using the new len.

Vec Insert & Remove

  1. pub fn remove(&mut self, index: usize) -> T {
  2. assert!(index < self.len);
  3. unsafe {
  4. self.len -= 1;
  5. let t = ptr::read(self.ptr.offset(index as usize));
  6. ptr::copy(self.ptr.offset(index as isize + 1),
  7. self.ptr.offset(index as isize),
  8. self.len - index);
  9. t
  10. }
  11. }

Rust FFI

  • FFI: Foreign Function Interface.
  • FFI means calling one language from another.
  • For us, this means:
    • Rust calling C
    • C calling Rust
    • Other languages (e.g. Python/Ruby) calling Rust
  • Why C? C is the de facto language used for FFI.
    • Often referred to as the programming lingua franca.

Calling C from Rust

  • Why?
    • Sometimes you need to call another library - e.g. OpenSSL - that would be too costly to reimplement in Rust.
    • Sometimes you need to interface with a particular language; many native programming languages can make C bindings.
  • Calling foreign functions from Rust is unsafe!
    • Because, of course, C is unsafe.

Calling C from Rust

  • Compile C to static libraries (.a/.lib).
    • cc -c -o foo.o foo.c
    • ar rcs libfoo.a foo.o
  • Or to dynamic libraries (.so/.dylib/.dll).
    • cc -c -fPIC -o foo.o foo.c
    • cc -shared -fPIC -o libfoo.so foo.o

Calling C from Rust

  • In C:
  1. int32_t foo() { return 10; }
  • In Rust:
  1. #[link(name = "foo", kind = "static")] // links libfoo.a.
  2. extern {
  3. fn foo() -> i32;
  4. }
  5. #[link(name = "foo")] // links libfoo.so.
  6. extern { // By default, this is also
  7. fn foo() -> i32; // statically linked.
  8. }
  • Calling foreign functions is unsafe:
  1. fn main() {
  2. println!("foo: {}", unsafe { foo() });
  3. }

Calling C from Rust

  • In C APIs, it’s common to use incomplete struct definitions to define types whose implementation shouldn’t be exposed to users.
    • These can’t be instantiated by the user.
  1. struct OpaqueThing;
  • To represent a type like this in Rust (for FFI), we want to make a type that we can’t instantiate! So we do this:
  1. enum OpaqueThing { }
  • Now, we can’t have one of these, but we can have a pointer to one: *mut OpaqueThing.
    • Pointers to opaque types are very common in C interfaces.

Calling Rust from C

  • Why?
    • Writing particularly dubiously-safe code in Rust.
    • Writing any part of a C program in a nicer language.

Rust from C: #[repr(C)], extern "C"

  • Rust has its own rules about memory layout and calling convention, which are different from C’s.
  • In order to call from C (or any other language!), we have to use C rules. (C doesn’t have generics, enums with fields, etc.)
  1. #[repr(C)]
  2. pub enum Color { Red = 1, Blue, Green = 5, Yellow }
  3. #[repr(C)]
  4. pub struct Bikeshed { height: f64, area: f64 }
  5. extern "C" pub fn paint(bs: Bikeshed, c: Color) { /* ... */ }
  • Use opaque structs to hide stuff that C can’t use:
  1. struct X<T>(T); // C doesn't have generics.
  2. #[repr(C)]
  3. pub struct Xi32(X<i32>); // This struct hides the type parameter.

Rust from C: Static Linking

  • Compile Rust to static libraries (.a) and link at build time.
    • Compile to target/release/libfoo.a.
  • Cargo.toml:
  1. [lib]
  2. crate-type = ["staticlib"]
  • $ cc -Ltarget/release -lfoo -o main main.c
  1. #include <stdint.h>
  2. int32_t foo(); // Function prototype for Rust function
  3. int main() {
  4. printf("foo() -> %d\n", foo());
  5. }

Rust from C: Dynamic Linking

  • Compile Rust to dynamic libraries (.so) and load at runtime.
    • Compile to target/release/libfoo.so.
  • Cargo.toml:
  1. [lib]
  2. crate-type = ["dylib"]
  • In C:
  1. #include <dlfcn.h>
  2. int main() {
  3. void *handle = dlopen("target/release/libfoo.so", RTLD_LAZY);
  4. int32_t (*foo)() = dlsym(handle, "foo"); // get function ptr
  5. // plus error checking
  6. printf("foo() -> %d\n", foo());
  7. dlclose(handle);
  8. }

Calling Rust from (e.g.) Python

  • Why?
    • Many languages (Python, Ruby, etc.) are extremely slow compared with C and Rust.
    • Traditionally, high-performance functions are written in C or C++, and called from scripting languages.
    • Rust is an ideal alternative high-performance language.
    • Great for data processing, scientific computing, multithreaded code, etc.

Rust from Python

  • Compile Rust to a dynamic library (.so/.dylib/.dll).
  • In Python:
  1. import ctypes
  2. libfoo = ctypes.CDLL("target/release/libfoo.so")
  3. print("foo() -> {}".format(libfoo.foo()))

Rust from Ruby

  • Compile Rust to a dynamic library (.so/.dylib/.dll).
  • In Ruby:
  1. require 'fiddle'
  2. require 'fiddle/import'
  3. module Foo
  4. extend Fiddle::Importer
  5. dlload 'target/release/libfoo.so'
  6. extern 'int foo()'
  7. end
  8. puts Foo.foo()

rust-bindgen

  • Generates Rust bindings for a C header.
    • Install via cargo install bindgen
  1. typedef char my_bool;
  2. typedef struct st_mysql_field { char *name; /* */ } MYSQL_FIELD;
  3. my_bool STDCALL mysql_thread_init(void);
  • $ bindgen -l mysql -match mysql.h -o mysql.rs /usr/include/mysql/mysql.h
  1. pub type my_bool = ::std::os::raw::c_char;
  2. #[repr(C)] #[derive(Copy)] pub struct Struct_st_mysql_field {
  3. pub name: *mut ::std::os::raw::c_char, /* */ }
  4. impl ::std::clone::Clone for Struct_st_mysql_field { /* */ }
  5. impl ::std::default::Default for Struct_st_mysql_field { /* */ }
  6. pub type MYSQL_FIELD = Struct_st_mysql_field;
  7. #[link(name = "mysql")]
  8. extern "C" { pub fn mysql_thread_init() -> my_bool; }

rusty-cheddar

  • Generates C bindings (header) for Rust code.

    • Generation is done in the build.rs script.
  • Only C-like enums (no fields).

  • No diverging (-> !) functions.

¹ disclaimer: we haven’t tried this.