Refactoring robotfindskitten

Refactoring robotfindskitten

This section details the refactoring script used to transform the initial Rusttranslation of robotfindskitten, as generated by c2rust transpile, into asafe Rust program. We divide the refactoring process into several major steps:

ncurses macro cleanup: The ncurses library implements parts of its APIusing C preprocessor macros, and a few of those macros expand to relativelycomplex code. We replace these expanded macro bodies with calls toequivalent functions, which are easier to recognize and refactor.
String formatting: robotfindskitten calls several printf-stylestring-formatting functions. We replace these unsafe variable-argumentfunction calls with safe wrappers using Rust's format family of macros.Aside from improving memory safety, this also allows the Rust compiler tomore accurately typecheck the format arguments, which is helpful for latertype-directed refactoring passes.
Static string constants: robotfindskitten has two global variablescontaining string constants, which are translated to Rust as static mutdefinitions containing C-style *const c_char pointers. We refactor toremove both sources of unsafety, replacing raw pointers with checked&'static str references and converting the mutable statics to immutableones.
Heap allocations: robotfindskitten uses a heap allocated array to trackthe objects in the game world. This array is represented as a raw pointer,and the underlying storage is managed explicitly with malloc and free.We replace the array with a memory-safe collection type, avoiding unsafe FFIcalls and preventing out-of-bounds memory accesses.
Using the pancurses library: Calling ncurses library functions directlythrough the Rust FFI requires unsafe code at every call site. We replaceunsafe ncurses function calls with calls to the safe wrappers provided bythe pancurses crate.
Moving global state to the stack: robotfindskitten uses mutable globalvariables to store the game state, which turn into unsafe static mutdefinitions in Rust. We collect all such variables into a singlestack-allocated struct, which can be mutated without unsafety.
libc calls: We replace calls to miscellaneous libc functions, such assleep and rand, with calls to safe Rust equivalents.
Function argument types: Two remaining functions in robotfindskitten takeraw pointers as arguments. We change each function's signature to use onlysafe Rust types, and update their callers to match.
String conversion cleanup: Several of the previous refactoring passes insertconversions between Rust and C string types. In several places, theseconversions form cycles, such as &str -> *const c_char -> &str, which areboth redundant and a source of unsafe code. We remove such conversioncycles to avoid unnecessary raw pointer manipulation.
Final cleanup: At this point, we have removed all the unsafe code we can.Only a few cleanup steps remain, such as removing unused unsafe qualifiersand deleting unused extern "C" definitions. In the end, we are left witha correct Rust translation of robotfindskitten that contains only a singleline of unsafe code.

ncurses macro cleanup

robotfindskitten uses a variety of macros provided by the ncurses library.Since c2rust transpile runs the C preprocessor before translating to Rust,the expansions of those macros effectively get inlined in the Rust code at eachcall site. In many cases, this is harmless: for example, move(y, x) expandsto wmove(stdscr, y, x), which is not much harder to refactor than theoriginal. However, the attr_get and attrset macros are more complex: theyexpand to multiple lines of code involving several conditionals and complexexpressions. In this step, we convert the expanded code into simple functioncalls, which are easier to manipulate in later refactoring passes.

Fortunately, the ncurses library provides functions implementing the sameoperations as the troublesome macros, and we can call those functions throughRust's FFI. We begin by providing Rust declarations for these foreignfunctions. For ease of reading, we put the new declarations just after theexisting extern "C" block:

select target 'crate; child(foreign_mod); last;' ;
create_item
    '
        extern "C" {
            fn wattr_get(win: *mut WINDOW, attrs: *mut attr_t,
                pair: *mut libc::c_short, opts: *mut libc::c_void) -> libc::c_int;
            fn wattrset(win: *mut WINDOW, attrs: libc::c_int) -> libc::c_int;
        }
    '
    after ;

Diff #1

Now we can use rewrite_expr to find Rust code that comes from the expansionsof the wattrset macro and replace it with calls to the wattrset function:

rewrite_expr
    '
        if !(__win as *const libc::c_void).is_null() {
            (*__win)._attrs = __attrs
        } else {
        }
    '
    'wattrset(__win, __attrs as libc::c_int)' ;

Diff #2

The win and attrs metavariables in the pattern correspond to thearguments of the original C macro, and are used in the replacement to constructthe equivalent Rust function call.

Next, we do the same thing for the more complicated wattr_get macro:

rewrite_expr
    '
        if !(__win as *const libc::c_void).is_null() {
            if !(&mut __attrs as *mut attr_t as *const libc::c_void).is_null() {
                __attrs = (*__win)._attrs
            } else {
            };
            if !(&mut __pair as *mut libc::c_short as *const libc::c_void).is_null() {
                __pair = (((*__win)._attrs as libc::c_ulong
                    & ((1u32 << 8i32).wrapping_sub(1u32) << 0i32 + 8i32) as libc::c_ulong)
                    >> 8i32) as libc::c_int as libc::c_short
            } else {
            };
        } else {
        }
    '
    'wattr_get(__win, &mut __attrs, &mut __pair, ::std::ptr::null_mut())' ;

Finally, we are done with this bit of cleanup, so we write the changes to diskbefore continuing on:

commit ;

Diff #4

String formatting

robotfindskitten calls several printf-style variable-argument functions toperform string formatting. Since variable-argument function calls areconsidered unsafe in Rust, we must replace these with Rust-style stringformatting using format! and related macros. Specifically, for each stringformatting function such as printf, we will create a safe wrapperfmt_printf that takes a Rust fmt::Arguments object, and replaceprintf(…) calls with fmt_printf(format_args!(…)). This approachisolates all the unsafety into the fmt_printf wrapper, where it can beeliminated by later passes.

The replacement itself happens in two steps. First, we convert printf callsfrom printf(<C format args…>) to printf(format_args!(<Rust format args…>)). Note that the code does not typecheck in this intermediate state:C's printf function cannot accept the std::fmt::Arguments produced by theformat_args! macro. The second step then replaces the printf call with acall to the fmt_printf wrapper, which does accept std::fmt::Arguments.

printf format argument conversion

We run a few commands to mark the nodes involved in string formatting, beforefinally running the convert_format_args command to perform the actualtransformation.

First, we use select and mark_arg_uses to mark the first argument of everyprintf call as targets:

select target 'item(printf);' ;
mark_arg_uses 0 target ;

Diff #5

convert_format_args will treat the target argument at each call site as aprintf-style format string, and will treat all later arguments as formatargs.

Next, we mark the format string literal with fmt_str, which tellsconvert_format_args the exact string literal it should use as the formatstring. This usually is not the same as the target argument, sincec2rust-transpile inserts several casts to turn a Rust string literal into a*const libc::c_char.

select fmt_str 'marked(target); desc(expr && !match_expr(__e as __t));' ;

Diff #6

With both target and fmt_str marks in place, we can apply the actualtransformation:

convert_format_args ;

Diff #7

Finally, we clean up from this step by clearing all the marks.

clear_marks ;

commit would alsoclear the marks, but we don't want to commit these changes until we've fixedthe type errors introduced in this step.

Creating a printf wrapper

As a reminder, we currently have code that looks like this:

printf(format_args!("Hello, {}!\n", "world"))

printf itself can't accept the std::fmt::Arguments returned byformat_args!, so we will define a wrapper that does acceptstd::fmt::Arguments and then rewrite these printf calls to call the wrapperinstead.

First, we insert the wrapper:

select target 'crate; child(foreign_mod); last;' ;
create_item
    '
        fn fmt_printf(args: ::std::fmt::Arguments) -> libc::c_int {
            print!("{}", args);
            0
        }
    '
    after ;

Diff #9

Since Rust provides a print! macro with similar functionality to printf,our "wrapper" actually just calls print! directly, avoiding the stringconversions necessary to call the actual C printf. (See the next subsectionfor an example of a "real" wrapper function.)

With the wrapper in place, we can now update the call sites:

rewrite_expr 'printf' 'fmt_printf' ;

Diff #10

Now that we've finished this step and the crate typechecks again, we can safelycommit the changes:

commit ;

Diff #11

Other string formatting functions

Aside from printf, robotfindskitten also uses the ncurses printw andmvprintw string-formatting functions. The refactoring script for printw issimilar to the previous two steps combined:

select target 'item(printw);' ;
mark_arg_uses 0 target ;
select fmt_str 'marked(target); desc(expr && !match_expr(__e as __t));' ;
convert_format_args ;
clear_marks ;
select target 'crate; child(foreign_mod); last;' ;
create_item
    '
        fn fmt_printw(args: ::std::fmt::Arguments) -> libc::c_int {
            unsafe {
                ::printw(b"%s\0" as *const u8 as *const libc::c_char,
                         ::std::ffi::CString::new(format!("{}", args))
                             .unwrap().as_ptr())
            }
        }
    '
    after ;
rewrite_expr 'printw' 'fmt_printw' ;
commit ;

Diff #12

Aside from replacing the name printf with printw, the other notabledifference from the printf script is the body of fmt_printw. There is noconvenient replacement for printw in the Rust standard library, so instead wecall the original printw function, passing in the result of Rust stringformatting (converted to a C string) as an argument.

The mvprintw replacement is also similar, just with a few extra arguments:

select target 'item(mvprintw);' ;
mark_arg_uses 2 target ;
select fmt_str 'marked(target); desc(expr && !match_expr(__e as __t));' ;
convert_format_args ;
clear_marks ;
select target 'crate; child(foreign_mod); last;' ;
create_item
    '
        fn fmt_mvprintw(y: libc::c_int, x: libc::c_int,
                        args: ::std::fmt::Arguments) -> libc::c_int {
            unsafe {
                ::mvprintw(y, x, b"%s\0" as *const u8 as *const libc::c_char,
                         ::std::ffi::CString::new(format!("{}", args))
                             .unwrap().as_ptr())
            }
        }
    '
    after ;
rewrite_expr 'mvprintw' 'fmt_mvprintw' ;
commit ;

Diff #13

Static string constant - ver

robotfindskitten defines a static string constant, ver, to store the game'sversion. Using ver is currently unsafe, first because its Rust type is a rawpointer (*mut c_char), and second because it's mutable. To make ver usagesafe, we first change its type to &'static str (and fix up the resulting typeerrors), and then we change it from a static mut to an ordinary immutablestatic. Note that we must change the type first because Rust does not allowraw pointers to be stored in safe (non-mut) statics.

We change the type using rewrite_ty:

select target 'item(ver); child(ty);' ;
rewrite_ty 'marked!(*mut libc::c_char)' "&'static str" ;
delete_marks target ;

Diff #14

The combination of select and the marked! matching form ensures that onlyver's type annotation is modified. We delete the mark afterward, since it'sno longer needed.

Simply replacing mut c_char with &str introduces type errors throughoutthe crate. The initializer for ver still has type mut c_char, and alluses of ver are still expecting a *mut c_char.

Fixing ver's initializer

Fixing the ver initializer is straightforward: we simply remove all thecasts, then convert the binary string (&[u8]) literal to an ordinary stringliteral. For the casts, we mark all cast expressions in ver's definition,then replace each one with its subexpression:

select target 'item(ver); desc(match_expr(__e as __t));' ;
rewrite_expr 'marked!(__e as __t)' '__e' ;
delete_marks target ;

Diff #15

Only the binary string literal remains, so we mark it and change it to anordinary str:

select target 'item(ver); child(expr);' ;
bytestr_to_str ;
delete_marks target ;

Diff #16

Fixing ver's uses

ver's initializer is now well-typed, but its uses are still expecting a *mut c_char instead of a &str. To fix these up, we use the type_fix_rulescommand, which rewrites expressions anywhere a type error occurs:

type_fix_rules '*, &str, *const __t => __old.as_ptr()' ;

Diff #17

Here we run type_fix_rules with only one rule: in any position (), if anexpression has type &str but is expected to have a raw pointer type (const __t), then wrap the original expression in a call to .as_ptr(). This turnsout to be enough to fix all the errors at uses of ver.

Making ver immutable

Now that all type errors have been corrected, we can finish our refactoring ofver. We make it immutable, then commit the changes.

select target 'item(ver);' ;
set_mutability imm ;
commit ;

Diff #18

Static string array - messages

Aside from ver, robotfindskitten contains a static array of strings, calledmessages. Like ver, accessing messages is unsafe because each element isa raw *mut c_char pointer and because messages itself is a static mut.

We rewrite the type and initializer of messages using the same strategy asfor ver:

select target 'item(messages); child(ty); desc(ty);' ;
rewrite_ty 'marked!(*mut libc::c_char)' "&'static str" ;
delete_marks target ;
select target 'item(messages); child(expr); desc(expr);' ;
rewrite_expr 'marked!(__e as __t)' '__e' ;
bytestr_to_str ;
delete_marks target ;

Diff #19

We use type_fix_rules to fix up the uses of messages, as we did for ver:

type_fix_rules
    '*, &str, *const __t => __old.as_ptr()'
    '*, &str, *mut __t => __old.as_ptr() as *mut __t' ;

Diff #20

Here we needed a second rule for mut pointers, similar to the one forconst, because robotfindskitten mistakenly declares messages as an arrayof char instead of const char.

With all type errors fixed, we can make messages immutable and commit thechanges:

select target 'item(messages);' ;
set_mutability imm ;
commit ;

Diff #21

Heap allocations

The screen variable stores a heap-allocated two-dimensional array,represented in C as an int*. In Rust, this becomes mut *mut c_int, whichis unsafe to access. We replace it with CArray<CArray<c_int>>, whereCArray is a memory-safe collection type provided by the c2rust_runtimelibrary. CArray is convenient for this purpose because it supports C-styleinitialization and access patterns (including pointer arithmetic) while stillguaranteeing memory safety.

We actually perform the conversion from mut to CArray in two steps.First, we replace mut with the simpler CBlockPtr type, also defined inc2rust_runtime. CBlockPtr provides some limited bounds checking, butotherwise functions much like a raw pointer. It serves as a usefulintermediate step, letting us fix up the differences between the raw-pointerand CArray APIs in two stages instead of attempting to do it all at once.Once screen has been fully converted to CBlockPtr<CBlockPtr<c_int>>, wefinish the conversion to CArray in the second step.

As a preliminary, we need to add an import of the c2rust_runtime library:

select target 'crate;' ;
create_item 'extern crate c2rust_runtime;' inside ;

Diff #22

Now we can proceed with the actual refactoring.

Converting to CBlockPtr

We further break down the transition from mut mut c_int toCBlockPtr<CBlockPtr<c_int>> into two steps, first converting the innerpointer (leaving the overall type as *mut CBlockPtr<c_int>) and then theouter. We change the type annotation first, as we did for var andmessages:

select target 'item(screen); child(ty);' ;
rewrite_ty 'marked!(*mut *mut __t)'
    '*mut ::c2rust_runtime::CBlockPtr<__t>' ;

Diff #23

This introduces type errors, letting us easily find (and fix) relatedexpressions using type_fix_rules:

type_fix_rules
    'rval, *mut __t, ::c2rust_runtime::CBlockPtr<__u> =>
        unsafe { ::c2rust_runtime::CBlockPtr::from_ptr(__old) }'
    'rval, *mut __t, *mut __u => __old as *mut __u'
    ;

Diff #24

The first rule provided here handles the later part of screen'sinitialization, where the program allocates a mut c_int array (nowCBlockPtr<c_int>) for each row of the screen. The second rule handles theearlier part, where it allocates the top-level mut mut c_int (now mut CBlockPtr<c_int>). Both allocations now need a cast, since the type of therows has changed.

One category of type errors remains: the initialization code tries todereference the result of offsetting the array pointer, which is not possibledirectly with the CBlockPtr API. We add the necessary method call usingrewrite_expr:

rewrite_expr
    '*typed!(__e, ::c2rust_runtime::block_ptr::CBlockOffset<__t>)'
    '*__e.as_mut()' ;

Diff #25

Here, the pattern filters for dereferences of CBlockOffset expressions, whichresult from calling offset on a CBlockPtr, and adds a call to as_mut()before the dereference.

The conversion of screen to *mut CBlockPtr<c_int> is now complete. Theconversion to CBlockPtr<CBlockPtr<c_int>> uses a similar refactoring script:

select target 'crate; item(screen); child(ty);' ;
rewrite_ty 'marked!(*mut __t)'
    '::c2rust_runtime::CBlockPtr<__t>' ;
type_fix_rules
    'rval, *mut __t, ::c2rust_runtime::CBlockPtr<__u> =>
        unsafe { ::c2rust_runtime::CBlockPtr::from_ptr(__old) }'
    'rval, *mut __t, *mut __u => __old as *mut __u'
    ;
rewrite_expr
    '*typed!(__e, ::c2rust_runtime::block_ptr::CBlockOffset<__t>)'
    '*__e.as_mut()' ;

Diff #26

The only change is in the rewrite_ty step.

There's one last bit of cleanup to perform: now that screen has the desiredCBlockPtr<CBlockPtr<c_int>> type, we can rewrite the allocations thatinitialize it. At this point the allocations use the unsafe malloc functionfollowed by the unsafe CBlockPtr::from_ptr, but we can change that to use thesafe CBlockPtr::alloc method instead:

rewrite_expr 'malloc(__e) as *mut __t as *mut __u' 'malloc(__e) as *mut __u' ;
rewrite_expr
    '::c2rust_runtime::CBlockPtr::from_ptr(malloc(__e) as *mut __t)'
    '::c2rust_runtime::CBlockPtr::alloc(
        __e as usize / ::std::mem::size_of::<__t>())'
    ;

Diff #27

This doesn't remove the unsafe blocks wrapping each allocation - we leavethose until the end of our refactoring, when we remove unnecessary unsafeblocks throughout the entire crate at once.

At this point, the refactoring of screen to isdone, and we can commit the changes:

commit ;

Diff #28

Converting to CArray

The CArray and CBlockPtr APIs are deliberately quite similar, which makesthis part of the screen refactoring fairly straightforward.

First, we replace all uses of CBlockPtr with CArray, both in types and infunction calls:

rewrite_ty '::c2rust_runtime::CBlockPtr<__t>' '::c2rust_runtime::CArray<__t>' ;
rewrite_expr
    '::c2rust_runtime::CBlockPtr::from_ptr'
    '::c2rust_runtime::CArray::from_ptr' ;
rewrite_expr
    '::c2rust_runtime::CBlockPtr::alloc'
    '::c2rust_runtime::CArray::alloc' ;

Diff #29

Next, we fix up calls to offset. Unlike CBlockPtr (and raw pointers ingeneral), CArray distinguishes between mutable and immutable offset pointers.We handle this by simply replacing all offset calls with offset_mut:

rewrite_expr
    'typed!(__e, ::c2rust_runtime::CArray<__t>).offset(__f)'
    '__e.offset_mut(__f)' ;

Diff #30

This works fine for robotfindskitten, though in other codebases it may benecessary to properly distinguish mutable and immutable uses of offset.

With this change, the code typechecks with screens new memory-safe type, sowe could stop here. However, unlike CBlockPtr, CArray supports arrayindexing - ptr[i] - in place of the convoluted *arr.offset(i).as_mut()syntax. So we perform a simple rewrite to make the code a little easier toread:

rewrite_expr
    'typed!(__e, ::c2rust_runtime::CArray<__t>).offset_mut(__f).as_mut()'
    '&mut __e[__f as usize]' ;
rewrite_expr '*&mut __e' '__e' ;

Diff #31

Finally, we remove unsafety from screen's static initializer. It currentlycalls CArray::fromptr(0 as *mut ), which is unsafe becauseCArray::from_ptr requires its pointer argument to must satisfy certainproperties. But CArray also provides a safe method specifically forinitializing a CArray to null, which we can use instead:

rewrite_expr
    '::c2rust_runtime::CArray::from_ptr(cast!(0))'
    '::c2rust_runtime::CArray::empty()' ;

Diff #32

This completes the refactoring of screen, as all raw pointer manipulationshave been replaced with safe CArray method calls. The only remainingunsafety arises from the fact that screen is a static mut, which we addressin a later refactoring step.

commit ;

Using the pancurses library

The pancurses library provides safe wrappers around ncurses APIs. Sincethe pancurses and ncurses APIs are so similar, we can automatically convertthe unsafe ncurses FFI calls in robotfindskitten to safe pancurses calls,avoiding the need to maintain safe wrappers in robotfindskitten itself.

There are two preliminary steps before we do the actual conversion. First, wemust import the pancurses library:

select target 'crate;' ;
create_item 'extern crate pancurses;' inside ;

Diff #33

And second, we must create a global variable to store the main pancursesWindow:

select target 'crate;' ;
create_item 'static mut win: Option<::pancurses::Window> = None;' inside ;

Diff #34

pancurses doesn't have an equivalent of the global stdscr window thatncurses provides. Instead, the pancurses initialization function createsan initial Window object that must be passed around to each function thatupdates the display. We store that initial Window in the global winvariable so that it's accessible everywhere that stdscr is used.

Note that making win a static mut makes it unsafe to access. However, alater refactoring pass will gather up all static muts, including win,and collect them into a stack-allocated struct, at which point accessing winwill no longer be unsafe.

General library calls

We convert ncurses library calls to pancurses ones in a few stages.

First, for functions that don't require a window object, we simply replace eachncurses function with its equivalent in the pancurses library:

rewrite_expr 'nonl' '::pancurses::nonl' ;
rewrite_expr 'noecho' '::pancurses::noecho' ;
rewrite_expr 'cbreak' '::pancurses::cbreak' ;
rewrite_expr 'has_colors' '::pancurses::has_colors' ;
rewrite_expr 'start_color' '::pancurses::start_color' ;
rewrite_expr 'endwin' '::pancurses::endwin' ;
rewrite_expr 'init_pair' '::pancurses::init_pair' ;

Diff #35

Next, functions taking a window are replaced with method calls on the staticwin variable we defined earlier:

rewrite_expr 'wrefresh(stdscr)' 'win.refresh()' ;
rewrite_expr 'wrefresh(curscr)' 'win.refresh()' ;
rewrite_expr 'keypad(stdscr, __bf)' 'win.keypad(__bf)' ;
rewrite_expr 'wmove(stdscr, __my, __mx)' 'win.mv(__my, __mx)' ;
rewrite_expr 'wclear(stdscr)' 'win.clear()' ;
rewrite_expr 'wclrtoeol(stdscr)' 'win.clrtoeol()' ;
rewrite_expr 'waddch(stdscr, __ch)' 'win.addch(__ch)' ;
rewrite_expr
    'wattr_get(stdscr, __attrs, __pair, __e)'
    '{
        let tmp = win.attrget();
        *__attrs = tmp.0;
        *__pair = tmp.1;
        0
    }' ;
rewrite_expr
    'wattrset(stdscr, __attrs)'
    'win.attrset(__attrs as ::pancurses::chtype)' ;

Diff #36

For simplicity, we write win.f(…) in the rewrite_expr replacementarguments, even though win is actually an Option<Window>, not a Window.Later, we replace win with win.as_ref().unwrap() throughout the crate tocorrect the resulting type errors.

We next replace some ncurses global variables with calls to correspondingpancurses functions:

rewrite_expr 'LINES' 'win.get_max_y()' ;
rewrite_expr 'COLS' 'win.get_max_x()' ;

Diff #37

Finally, we handle a few special cases.

waddnstr takes a string argument, which in general could be any *const c_char. However, robotfindskitten calls it only on string literals, whichlets us perform a more specialized rewrite that avoids unsafe C stringconversions:

rewrite_expr
    'waddnstr(stdscr, __str as *const u8 as *const libc::c_char, __n)'
    "win.addnstr(::std::str::from_utf8(__str).unwrap().trim_end_matches('\0'),
                 __n as usize)" ;

Diff #38

intrflush has no pancurses equivalent, so we replace it with a no-op of thesame type:

rewrite_expr 'intrflush(__e, __f)' '0' ;

Diff #39

That covers all of the "ordinary" ncurses functions used inrobotfindskitten. The remaining subsections cover the more complex cases.

String formatting

We previously replaced calls to the ncurses printw and mvprintwstring-formatting functions with code using Rust's safe string formattingmacros. This removes unsafety from the call site, but uses wrapper functions(fmt_printw and fmt_mvprintw) that call unsafe code internally. But nowthat we are using the pancurses library, we can replace those wrappers withsafer equivalents.

select target 'item(fmt_printw);' ;
create_item '
    fn fmt_printw(args: ::std::fmt::Arguments) -> libc::c_int {
        unsafe {
            win.printw(&format!("{}", args))
        }
    }
' after ;
delete_items ;
clear_marks ;
select target 'item(fmt_mvprintw);' ;
create_item '
    fn fmt_mvprintw(y: libc::c_int, x: libc::c_int,
                    args: ::std::fmt::Arguments) -> libc::c_int {
        unsafe {
            win.mvprintw(y, x, &format!("{}", args))
        }
    }
' after ;
delete_items ;
clear_marks ;

Diff #40

The wrappers still use unsafe code to access win, a static mut, but nolonger make FFI calls or manipulate raw C strings. When we later remove allstatic muts from the program, these functions will become entirely safe.

Input handling

Adapting ncurses-based input handling to use pancurses requires some extracare. The pancurses getch function returns a Rust enum, while thencurses version simply returns an integer. robotfindskitten matches thoseintegers against various ncurses keycode constants, which, after macroexpansion, become integer literals in the Rust code.

The more idiomatic approach would be to replace each integer literal with thematching pancurses::Input enum variant when switching from ncurses getchto the pancurses version. However, we instead take the easier approach ofconverting pancurses::Input values back to ncurses integer keycodes, sothe existing robotfindskitten input handling code can remain unchanged.

First, we inject a translation function from pancurses to ncurses keycodes:

select target 'item(initialize_ncurses);' ;
create_item '
    fn encode_input(inp: Option<::pancurses::Input>) -> libc::c_int {
        use ::pancurses::Input::*;
        let inp = match inp {
            Some(x) => x,
            None => return -1,
        };
        match inp {
            // TODO: unicode inputs in the range 256 .. 512 can
            // collide with ncurses special keycodes
            Character(c) => c as u32 as libc::c_int,
            Unknown(i) => i,
            special => {
                let idx = ::pancurses::SPECIAL_KEY_CODES.iter()
                    .position(|&k| k == special).unwrap();
                let code = idx as i32 + ::pancurses::KEY_OFFSET;
                if code > ::pancurses::KEY_F15 {
                    code + 48
                } else {
                    code
                }
            },
        }
    }
' after ;

Diff #41

Then, we translate ncurses wgetch calls to use the pancurses getchmethod, wrapping the result in encode_input to keep the results unchanged.

rewrite_expr 'wgetch(stdscr)' '::encode_input(win.getch())' ;

Diff #42

Final steps

As mentioned previously, we use win to obtain the current window objectthroughout the ncurses refactoring process, even though win is actually anOption<Window>, not a Window. Now that we are done with all the rewrites,we can update thote uses to access the Window properly:

rewrite_expr 'win' 'win.as_ref().unwrap()' ;

Diff #43

The final step is to initialize win. This corresponds to the call to thencurses initscr initialization function:

rewrite_expr 'initscr()' 'win = Some(::pancurses::initscr())' ;

Diff #44

We save this for last only so that the win to win.as_ref().unwrap() rewritedoesn't produce an erroneous assignment win.as_ref().unwrap() = ….

At this point, we are done with the current refactoring step:robotfindskitten has been fully adapted to use the safe pancurses API inplace of raw ncurses FFI calls.

commit

Diff #45

Moving global state to the stack

robotfindskitten uses global variables - static muts in Rust - to store thegame state. Accessing these globals is unsafe, due to the difficulty ofpreventing simultaneous borrowing and mutation. In this refactoring step, wemove the global state onto the stack and pass it by reference to every functionthat needs it, which allows the borrow checker to analyze its usage and ensuresafety.

Most of the work in this step is handled by the static_to_local_refrefactoring command. This command identifies all functions that use a givenstatic, and modifies those functions to access the global through a reference(passed as an argument to the function) instead of accessing it directly. (Seethe static_to_local_ref command documentation for examples.)

However, running static_to_local_ref separately on each ofrobotfindskitten's seven global variables would add up to seven new argumentsto many of robotfindskitten's functions, making their signatures difficult toread. Instead, we proceed in two steps. First, we gather up all the globalvariables into a single global struct. Then, we run static_to_local_ref onjust the struct, achieving safety while adding only a single new argument toeach affected function.

We collect the statics into a struct using static_collect_to_struct:

select target 'crate; child(static && mut);' ;
static_collect_to_struct State S

Diff #46

Then we run static_to_local_ref to pass a reference to the new State objecteverywhere it is used:

select target 'crate; child(static && name("S"));' ;
select user 'crate; desc(fn && !name("main|main_0"));' ;
static_to_local_ref ;

Diff #47

The functions that previously accessed the global S now use a referenceargument S_, removing a source of unsafety.

The only function that still accesses S directly is main_0. And sincemain_0 is called only once per run of the program, we can replace the globalS with a local variable declared inside main_0 without affecting thebehavior of the program. The static_to_local command performs the necessarytransformation (using the marks we previous set up for static_to_local_ref):

static_to_local

Diff #48

Now there are no static muts remaining in the program.

There is one final cleanup step to perform. The struct State appears in thesignature of several public functions, but State itself is not public, sorustc reports an error. We could make State public, but since there is noreason for the functions in question to be public in the first place, we makethe functions private instead:

select target 'crate; desc(fn && !name("main"));' ;
set_visibility '' ;
commit

Diff #49

libc calls

robotfindskitten makes a number of calls to libc functions, such as sleepand rand, using the FFI. Rust's standard library provides most of the samefunctionality, so we can replace these libc calls with safe equivalents.

We replace sleep with std::thread::sleep:

rewrite_expr 'sleep(__e)'
    '::std::thread::sleep(
        ::std::time::Duration::from_secs(__e as u64))' ;

Diff #50

We replace atoi with a call to from_str:

rewrite_expr 'atoi(__e)'
    '<libc::c_int as ::std::str::FromStr>::from_str(
        ::std::ffi::CStr::from_ptr(__e).to_str().unwrap()).unwrap()' ;

Diff #51

In the version of glibc we used for translating robotfindskitten, atoi isactually provided as an inline wrapper function in the libc headers. Thatmeans the Rust translation of robotfindskitten actually includes a fulldefinition of fn atoi(…) { … }. Now that we've replaced the atoi call,we can delete the definition as well:

select target 'item(atoi);' ;
delete_items ;
clear_marks ;

Diff #52

We replace exit with std::process::exit:

rewrite_expr 'exit(__e)' '::std::process::exit(__e as i32)' ;

Diff #53

For rand, no equivalent is available in the Rust standard library. Instead,we import the rand crate from crates.io:

select target 'crate;' ;
create_item 'extern crate rand;' inside ;
clear_marks ;

Diff #54

robotfindskitten uses the common srand(time()) pattern to initialize therandom number generator, suggesting it does not rely on the ability to controlor reuse seeds. That means we can use the thread-local RNG provided by therand crate, instead of explicitly constructing an RNG with a specific seed.So we replace rand with calls to rand::random:

rewrite_expr 'rand()'
    '(::rand::random::<libc::c_uint>() >> 1) as libc::c_int' ;

Diff #55

And we delete srand calls entirely, relying on the rand crate's automaticinitialization of the thread-local RNG:

rewrite_expr 'srand(__e)' '()' ;

Diff #56

At this point, the only remaining FFI call is to signal. robotfindskittensets up a SIGINT handler to ensure that ncurses (now pancurses) is shutdown properly and the terminal is returned to a normal state when the userterminates the program with ^C. Unfortunately, there is no general way tomake signal handling safe: to achieve memory safety, signal handling functionsmust obey a number of special restrictions above and beyond Rust's normalnotions of safety, and these properties cannot be checked by the Rust compiler.

We therefore leave the call to signal as unsafe code. Since this will be theonly unsafe operation in the program once we finish refactoring, we wrap it inits own unsafe block:

rewrite_expr 'signal(__e, __f)' 'unsafe { signal(__e, __f) }' ;

Diff #57

We've now covered all of the libc functions used by robotfindskitten, andreplaced nearly all of them with safe code.

commit

Function argument types

Two functions in robotfindskitten accept raw pointers: message takes apointer to a string to display on the screen, and main_0 takes an array ofstring pointers argv containing the program's command line arguments. Tomake these functions safe, we must replace their raw pointer arguments withsafe equivalents.

message

We begin with message because it is simpler. This function takes a singleargument of type *mut c_char, which we want to replace with &str:

select target
    'item(message); child(arg); child(match_ty(*mut libc::c_char));' ;
rewrite_ty 'marked!(*mut libc::c_char)' '&str' ;
delete_marks target ;

Diff #59

Of course, simply changing the type annotation is not sufficient. Like when weretyped the ver and messages constants, this change has introduced twokinds of type errors: callers of message still pass mut c_char where&str is now expected, and the body of message still uses the message_0: &strargument in contexts that require a mut c_char. We fix these usingtype_fix_rules:

type_fix_rules
    '*, *mut __t, &str =>
        ::std::ffi::CStr::from_ptr(__old).to_str().unwrap()'
    '*, &str, *const __t =>
        ::std::ffi::CString::new(__old.to_owned()).unwrap().as_ptr()'
    ;

Diff #60

The first rule handles callers of message, using CStr methods to converttheir mut c_char raw pointers into safe &str references. The secondhandles errors in the body of message, using CString to convert &strsback into const c_char. Note we must use CString instead of CStr in thesecond rule because an allocation is required: a &str is not guaranteed toend with a null terminator, so CString must copy it into a larger buffer andadd the null terminator to produce a valid *const c_char string pointer.Since the CString is temporary, it will be deallocated at the end of thecontaining expression, but this is good enough for the code we encounter insideof message. More complex string manipulation, however, would likely requirea different refactoring approach.

main_0

The Rust function main_0 is the translation of the C main function ofrobotfindskitten. The Rust main is a c2rust-generated wrapper thathandles the differences between C's main signature and Rust's before invokingmain_0.

As in the message case, we wish to replace the unsafe pointer types inmain_0's argument list with safe equivalents. However, in this case ourchoice of safe reference type is more constrained. main_0 callsargv.offset to access the individual command-line arguments, so we must useCArray (which supports such access patterns) for the outer pointer. For theinner pointer, we use Option<&CStr>: CStr supports the conversions we willneed to perform in main and main_0, and Option<&CStr> can be safelyzero-initialized, which is required by CArray.

We begin, as with message, by rewriting the argument type:

select target
    'item(main_0); child(arg && name("argv")); child(ty);' ;
rewrite_ty 'marked!(*mut *mut libc::c_char)'
    '::c2rust_runtime::CArray<Option<&::std::ffi::CStr>>' ;
delete_marks target ;

Diff #61

Next, we fix type errors in main, which is the only caller of main_0.Since c2rust always generates the same main wrapper function, rather thanrefactor it, we can simply replace it entirely with a new version that iscompatible with main_0's new signature:

select target 'item(main);' ;
create_item '
    fn main() {
        // Collect argv into a vector.
        let mut args_owned: Vec<::std::ffi::CString> = Vec::new();
        for arg in ::std::env::args() {
            args_owned.push(::std::ffi::CString::new(arg).unwrap());
        }
        // Now that the length is known, we can build a CArray.
        let mut args: ::c2rust_runtime::CArray<Option<&::std::ffi::CStr>> =
            ::c2rust_runtime::CArray::alloc(args_owned.len() + 1);
        for i in 0 .. args_owned.len() {
            args[i] = Some(&args_owned[i]);
        }
        // The last element of `args` remains `None`.
        unsafe {
            ::std::process::exit(main_0(
                (args.len() - 1) as libc::c_int,
                args) as i32);
        }
    }
' after ;
delete_items ;
clear_marks ;

Diff #62

Now to fix errors in main_0 itself. We changed both the inner and outerpointer types of argv, so there are two kinds of errors to clean up.

For the outer pointer, where we changed mut T to CArray<T>, the problemwe see is that argv.offset(…) returns &CArrayOffset<T>, not mut T, and&CArrayOffset<T> requires two derefs to obtain a T (&CArrayOffset<T>derefs to CArrayOffset<T>, which derefs to T) instead of just one. Wehandle this with type_fix_rules, looking for cases where a single derefresulted in CArrayOffset<T> but some other type was expected, and adding thesecond deref:

type_fix_rules
    '*, ::c2rust_runtime::array::CArrayOffset<__t>, __u => *__old'
    ;

Diff #63

For the inner pointer type, which we changed from mut c_char toOption<&CStr>, we need only insert a simple conversion anywhere the new typeis used but mut c_char is expected:

type_fix_rules
    '*, ::std::option::Option<&::std::ffi::CStr>, *const i8 =>
        opt_c_str_to_ptr(__old)'
    ;

Diff #64

The only quirk here is that we wrap up the conversion in a helper function,making it easier to recognize in the later refactoring step where we clean upredundant string conversions. Of course, now we must define that helperfunction:

select target 'item(main);' ;
create_item '
    fn opt_c_str_to_ptr(x: Option<&::std::ffi::CStr>) -> *const libc::c_char {
        match x {
            None => ::std::ptr::null(),
            Some(x) => x.as_ptr(),
        }
    }
' after ;
clear_marks ;

Diff #65

And with that, we are done. All raw pointer arguments in robotfindskittenhave now been replaced with safe equivalents.

commit

String conversion cleanup

A number of the previous refactoring steps involved changing the type of somevariable from a raw C string (*const c_char) to a safe Rust string (&str),inserting conversions between the two forms everywhere the variable wasinitialized or used. But now that we have finished transitioning the entirecrate to Rust strings, many of those conversions have become redundant.Essentially, we began with code like this:

fn f(s1: *const c_char) { ... }
fn g(s2: *const c_char) {
    ... f(s2) ...
}

By incrementally refactoring C strings into Rust string, we firsttransitioned to code like this:

fn f(s1: &str) { ... }
fn g(s2: *const c_char) {
    ... f(c_str_to_rust(s2)) ...
}

And then to code like this:

fn f(s1: &str) { ... }
fn g(s2: &str) {
    ... f(rust_str_to_c(c_str_to_rust(s2))) ...
}

But rust_str_to_c(c_str_to_rust(s2)) is the same as just s2 - the twoconversions are redundant and can be removed:

fn f(s1: &str) { ... }
fn g(s2: &str) {
    ... f(s2) ...
}

This doesn't merely affect readability - the actual conversion operationsrepresented by c_str_to_rust are unsafe, so we must remove them to completeour refactoring of robotfindskitten.

The actual refactoring process we apply to robotfindskitten mostly consistsof removing specific types of redundant conversions with rewrite_expr. Thepatterns we use here are general, taking advantage of overlap between differentconversion cases rather than hardcoding a rewrite for each distinct conversionin robotfindskitten.

To begin with, converting CString to *const c_char to CStr can bereplaced with a no-op (CString derefs to CStr, so it can be used almostanywhere a CStr is required):

rewrite_expr
    '::std::ffi::CStr::from_ptr(
        cast!(typed!(__e, ::std::ffi::CString).as_ptr()))'
    '__e' ;

Diff #67

Converting String to CString to Option<&str> is not strictly a no-op, butcan still be simplified:

rewrite_expr
    '::std::ffi::CString::new(__e).unwrap().to_str()'
    'Some(&__e)' ;

Diff #68

In some places, the code actually converts &str to const c_char directly,rather than using CString, and then converts const c_char to CStr to&str. This is memory-safe only when the &str already includes a nullterminator, and the CStr to str conversion will trim it off. We rewritethe code to simply trim off the null terminator directly, avoiding thesecomplex (and unsafe) conversions:

rewrite_expr
    '::std::ffi::CStr::from_ptr(
        cast!(typed!(__e, &str).as_ptr())).to_str()'
    "Some(__e.trim_end_matches('\0'))" ;

Diff #69

For code in main_0 using the opt_c_str_to_ptr helper function we introducedearlier, the Option<&CStr> to &CStr conversion can be replaced with asimple unwrap():

rewrite_expr
    '::std::ffi::CStr::from_ptr(cast!(opt_c_str_to_ptr(__e)))'
    '__e.unwrap()' ;

Diff #70

Conversions of bytestring literals (b"…", whose type is &[u8; ]) to*const c_char to CStr to str can be simplified down to a directconversion from &[u8; ] to &str, plus removal of the null terminator:

rewrite_expr
    '::std::ffi::CStr::from_ptr(
        cast!(typed!(__e, &[u8; __f]))).to_str()'
    "Some(::std::str::from_utf8(__e).unwrap().trim_end_matches('\0'))" ;

Diff #71

This removes the unsafety, but with a little more work, we can further improvereadability. First, we convert the byte strings to ordinary string literals(b"…" to "…"):

select target
    'crate; desc(match_expr(::std::str::from_utf8(__e))); desc(expr);' ;
bytestr_to_str ;
clear_marks ;

Diff #72

This introduces type errors, as the type of the literal has changed from &strto &[u8]. We fix these by inserting calls to str::as_bytes:

type_fix_rules '*, &str, &[u8] => __old.as_bytes()' ;

Diff #73

Finally, we remove the redundant conversion from &str to &[u8] to &str:

rewrite_expr
    '::std::str::from_utf8(__e.as_bytes())'
    'Some(__e)' ;

Diff #74

With the replacements above, we have removed all redundant string conversionsfrom the crate. This was the last major source of unnecessary unsafety inrobotfindskitten.

The last few changes we make are purely cosmetic - they do not affect safety.First, Some(x).unwrap() is the same as just x:

rewrite_expr
    'Some(__x).unwrap()'
    '__x' ;

Diff #75

And second, "foo\0".trim_end_matches('\0') is the same as just "foo". Thisone is a little more complicated to rewrite. We first remove null terminatorsthroughout the crate, then remove the calls to trim_end_matches:

select target 'crate; desc(expr);' ;
remove_null_terminator ;
clear_marks ;
rewrite_expr "__e.trim_end_matches('\0')" '__e' ;

Diff #76

This indiscriminate use of remove_null_terminator could introduce bugs(including memory unsafety) if the program still contained code that relies onthe presence of the null terminator, such as calls to CStr::from_ptr orlibc string functions. But previous refactoring steps have already removedall uses of those functions from robotfindskitten, so this transformation issafe.

commit

Final cleanup

At this point, we have removed all the major sources of unsafety fromrobotfindskitten. We finish the refactoring with an assortment of minorcleanup steps.

We want to remove unnecessary unsafe blocks, but right now every unsafeblock is considered unused because they all occur inside unsafe fns. None ofthese functions actually need to be unsafe at this point, so we mark them safe:

select target 'crate; desc(item && fn);' ;
set_unsafety safe ;
clear_marks ;

Diff #78

This part can't be fully automated. In general, there is no easy way to tellwhether the safety of a given function relies on unchecked assumptions aboutits input, or whether it might break invariants that other functions rely on.In the case of robotfindskitten, every function really is safe, but for otherapplications or libraries, it might be necessary to be more selective whenremoving the unsafe qualifier.

Now that all functions are safe, fix_unused_unsafe will remove any unsafeblocks that contain no unsafe operations:

fix_unused_unsafe

Diff #79

Next, we remove a number of unused items from the crate. We have replaced alluses of the FFI declarations generated by c2rust with alternatives, exceptfor one call to signal. We generate a new extern "C" block containing onlythe declaration of signal, then delete the old unused extern "C" blocks.remove the declarations now:

select target 'crate; desc(foreign_mod); last;' ;
create_item '
    extern "C" {
        fn signal(sig: libc::c_int, handler: __sighandler_t) -> __sighandler_t;
    }
' after ;
select target 'crate; desc(foreign_mod && !marked(new));' ;
delete_items ;

Diff #80

Furthermore, we can delete a number of type declarations that were previouslyused only in foreign functions:

select target '
    item(__time_t);
    item(time_t);
    item(pdat);
    item(_win_st);
    item(WINDOW);
' ;
delete_items ;

Diff #81

Similarly, we can delete the opt_c_str_to_ptr helper function, which we usedonly temporarily while cleaning up string-pointer function arguments:

select target '
    item(opt_c_str_to_ptr);
' ;
delete_items ;

Diff #82

Now we are done refactoring robotfindskitten. We have preserved thefunctionality of the original C program, but all unsafe code has been removed,with the exception of a single signal call that cannot be made safe.The refactored Rust version of robotfindskitten is still unidiomatic andsomewhat difficult to read, but by removing nearly all of the unsafe code, wehave established a solid foundation for future improvements.

Final output

Here is the final refactored version of robotfindskitten:

Diff #83

7.11. robotfindskitten

Refactoring robotfindskitten

ncurses macro cleanup

String formatting

printf format argument conversion

Creating a printf wrapper

Other string formatting functions

Static string constant - ver

Fixing ver's initializer

Fixing ver's uses

Making ver immutable

Static string array - messages

Heap allocations

Converting to CBlockPtr

Converting to CArray

Using the pancurses library

General library calls

String formatting

Input handling

Final steps

Moving global state to the stack

libc calls

Function argument types

message

main_0

String conversion cleanup

Final cleanup

Final output