Refactoring robotfindskitten
This section details the refactoring script used to transform the initial Rusttranslation of robotfindskitten
, as generated by c2rust transpile
, into asafe Rust program. We divide the refactoring process into several major steps:
ncurses
macro cleanup: Thencurses
library implements parts of its APIusing C preprocessor macros, and a few of those macros expand to relativelycomplex code. We replace these expanded macro bodies with calls toequivalent functions, which are easier to recognize and refactor.String formatting:
robotfindskitten
calls severalprintf
-stylestring-formatting functions. We replace these unsafe variable-argumentfunction calls with safe wrappers using Rust'sformat
family of macros.Aside from improving memory safety, this also allows the Rust compiler tomore accurately typecheck the format arguments, which is helpful for latertype-directed refactoring passes.Static string constants:
robotfindskitten
has two global variablescontaining string constants, which are translated to Rust asstatic mut
definitions containing C-style*const c_char
pointers. We refactor toremove both sources of unsafety, replacing raw pointers with checked&'static str
references and converting the mutablestatic
s to immutableones.Heap allocations:
robotfindskitten
uses a heap allocated array to trackthe objects in the game world. This array is represented as a raw pointer,and the underlying storage is managed explicitly withmalloc
andfree
.We replace the array with a memory-safe collection type, avoiding unsafe FFIcalls and preventing out-of-bounds memory accesses.Using the
pancurses
library: Callingncurses
library functions directlythrough the Rust FFI requires unsafe code at every call site. We replaceunsafencurses
function calls with calls to the safe wrappers provided bythepancurses
crate.Moving global state to the stack:
robotfindskitten
uses mutable globalvariables to store the game state, which turn into unsafestatic mut
definitions in Rust. We collect all such variables into a singlestack-allocated struct, which can be mutated without unsafety.libc
calls: We replace calls to miscellaneouslibc
functions, such assleep
andrand
, with calls to safe Rust equivalents.Function argument types: Two remaining functions in
robotfindskitten
takeraw pointers as arguments. We change each function's signature to use onlysafe Rust types, and update their callers to match.String conversion cleanup: Several of the previous refactoring passes insertconversions between Rust and C string types. In several places, theseconversions form cycles, such as
&str -> *const c_char -> &str
, which areboth redundant and a source of unsafe code. We remove such conversioncycles to avoid unnecessary raw pointer manipulation.Final cleanup: At this point, we have removed all the unsafe code we can.Only a few cleanup steps remain, such as removing unused
unsafe
qualifiersand deleting unusedextern "C"
definitions. In the end, we are left witha correct Rust translation ofrobotfindskitten
that contains only a singleline of unsafe code.
ncurses macro cleanup
robotfindskitten
uses a variety of macros provided by the ncurses
library.Since c2rust transpile
runs the C preprocessor before translating to Rust,the expansions of those macros effectively get inlined in the Rust code at eachcall site. In many cases, this is harmless: for example, move(y, x)
expandsto wmove(stdscr, y, x)
, which is not much harder to refactor than theoriginal. However, the attr_get
and attrset
macros are more complex: theyexpand to multiple lines of code involving several conditionals and complexexpressions. In this step, we convert the expanded code into simple functioncalls, which are easier to manipulate in later refactoring passes.
Fortunately, the ncurses
library provides functions implementing the sameoperations as the troublesome macros, and we can call those functions throughRust's FFI. We begin by providing Rust declarations for these foreignfunctions. For ease of reading, we put the new declarations just after theexisting extern "C"
block:
select target 'crate; child(foreign_mod); last;' ;
create_item
'
extern "C" {
fn wattr_get(win: *mut WINDOW, attrs: *mut attr_t,
pair: *mut libc::c_short, opts: *mut libc::c_void) -> libc::c_int;
fn wattrset(win: *mut WINDOW, attrs: libc::c_int) -> libc::c_int;
}
'
after ;
Diff #1
Now we can use rewrite_expr
to find Rust code that comes from the expansionsof the wattrset
macro and replace it with calls to the wattrset
function:
rewrite_expr
'
if !(__win as *const libc::c_void).is_null() {
(*__win)._attrs = __attrs
} else {
}
'
'wattrset(__win, __attrs as libc::c_int)' ;
Diff #2
The win
and attrs
metavariables in the pattern correspond to thearguments of the original C macro, and are used in the replacement to constructthe equivalent Rust function call.
Next, we do the same thing for the more complicated wattr_get
macro:
rewrite_expr
'
if !(__win as *const libc::c_void).is_null() {
if !(&mut __attrs as *mut attr_t as *const libc::c_void).is_null() {
__attrs = (*__win)._attrs
} else {
};
if !(&mut __pair as *mut libc::c_short as *const libc::c_void).is_null() {
__pair = (((*__win)._attrs as libc::c_ulong
& ((1u32 << 8i32).wrapping_sub(1u32) << 0i32 + 8i32) as libc::c_ulong)
>> 8i32) as libc::c_int as libc::c_short
} else {
};
} else {
}
'
'wattr_get(__win, &mut __attrs, &mut __pair, ::std::ptr::null_mut())' ;
Finally, we are done with this bit of cleanup, so we write the changes to diskbefore continuing on:
commit ;
Diff #4
String formatting
robotfindskitten
calls several printf
-style variable-argument functions toperform string formatting. Since variable-argument function calls areconsidered unsafe in Rust, we must replace these with Rust-style stringformatting using format!
and related macros. Specifically, for each stringformatting function such as printf
, we will create a safe wrapperfmt_printf
that takes a Rust fmt::Arguments
object, and replaceprintf(…)
calls with fmt_printf(format_args!(…))
. This approachisolates all the unsafety into the fmt_printf
wrapper, where it can beeliminated by later passes.
The replacement itself happens in two steps. First, we convert printf
callsfrom printf(<C format args…>)
to printf(format_args!(<Rust format args…>))
. Note that the code does not typecheck in this intermediate state:C's printf
function cannot accept the std::fmt::Arguments
produced by theformat_args!
macro. The second step then replaces the printf
call with acall to the fmt_printf
wrapper, which does accept std::fmt::Arguments
.
printf format argument conversion
We run a few commands to mark the nodes involved in string formatting, beforefinally running the convert_format_args
command to perform the actualtransformation.
First, we use select
and mark_arg_uses
to mark the first argument of everyprintf
call as target
s:
select target 'item(printf);' ;
mark_arg_uses 0 target ;
Diff #5
convert_format_args
will treat the target
argument at each call site as aprintf
-style format string, and will treat all later arguments as formatargs.
Next, we mark the format string literal with fmt_str
, which tellsconvert_format_args
the exact string literal it should use as the formatstring. This usually is not the same as the target
argument, sincec2rust-transpile
inserts several casts to turn a Rust string literal into a*const libc::c_char
.
select fmt_str 'marked(target); desc(expr && !match_expr(__e as __t));' ;
Diff #6
With both target
and fmt_str
marks in place, we can apply the actualtransformation:
convert_format_args ;
Diff #7
Finally, we clean up from this step by clearing all the marks.
clear_marks ;
commit
would alsoclear the marks, but we don't want to commit
these changes until we've fixedthe type errors introduced in this step.
Creating a printf wrapper
As a reminder, we currently have code that looks like this:
printf(format_args!("Hello, {}!\n", "world"))
printf
itself can't accept the std::fmt::Arguments
returned byformat_args!
, so we will define a wrapper that does acceptstd::fmt::Arguments
and then rewrite these printf
calls to call the wrapperinstead.
First, we insert the wrapper:
select target 'crate; child(foreign_mod); last;' ;
create_item
'
fn fmt_printf(args: ::std::fmt::Arguments) -> libc::c_int {
print!("{}", args);
0
}
'
after ;
Diff #9
Since Rust provides a print!
macro with similar functionality to printf
,our "wrapper" actually just calls print!
directly, avoiding the stringconversions necessary to call the actual C printf
. (See the next subsectionfor an example of a "real" wrapper function.)
With the wrapper in place, we can now update the call sites:
rewrite_expr 'printf' 'fmt_printf' ;
Diff #10
Now that we've finished this step and the crate typechecks again, we can safelycommit the changes:
commit ;
Diff #11
Other string formatting functions
Aside from printf
, robotfindskitten
also uses the ncurses
printw
andmvprintw
string-formatting functions. The refactoring script for printw
issimilar to the previous two steps combined:
select target 'item(printw);' ;
mark_arg_uses 0 target ;
select fmt_str 'marked(target); desc(expr && !match_expr(__e as __t));' ;
convert_format_args ;
clear_marks ;
select target 'crate; child(foreign_mod); last;' ;
create_item
'
fn fmt_printw(args: ::std::fmt::Arguments) -> libc::c_int {
unsafe {
::printw(b"%s\0" as *const u8 as *const libc::c_char,
::std::ffi::CString::new(format!("{}", args))
.unwrap().as_ptr())
}
}
'
after ;
rewrite_expr 'printw' 'fmt_printw' ;
commit ;
Diff #12
Aside from replacing the name printf
with printw
, the other notabledifference from the printf
script is the body of fmt_printw
. There is noconvenient replacement for printw
in the Rust standard library, so instead wecall the original printw
function, passing in the result of Rust stringformatting (converted to a C string) as an argument.
The mvprintw
replacement is also similar, just with a few extra arguments:
select target 'item(mvprintw);' ;
mark_arg_uses 2 target ;
select fmt_str 'marked(target); desc(expr && !match_expr(__e as __t));' ;
convert_format_args ;
clear_marks ;
select target 'crate; child(foreign_mod); last;' ;
create_item
'
fn fmt_mvprintw(y: libc::c_int, x: libc::c_int,
args: ::std::fmt::Arguments) -> libc::c_int {
unsafe {
::mvprintw(y, x, b"%s\0" as *const u8 as *const libc::c_char,
::std::ffi::CString::new(format!("{}", args))
.unwrap().as_ptr())
}
}
'
after ;
rewrite_expr 'mvprintw' 'fmt_mvprintw' ;
commit ;
Diff #13
Static string constant - ver
robotfindskitten
defines a static string constant, ver
, to store the game'sversion. Using ver
is currently unsafe, first because its Rust type is a rawpointer (*mut c_char
), and second because it's mutable. To make ver
usagesafe, we first change its type to &'static str
(and fix up the resulting typeerrors), and then we change it from a static mut
to an ordinary immutablestatic
. Note that we must change the type first because Rust does not allowraw pointers to be stored in safe (non-mut
) static
s.
We change the type using rewrite_ty
:
select target 'item(ver); child(ty);' ;
rewrite_ty 'marked!(*mut libc::c_char)' "&'static str" ;
delete_marks target ;
Diff #14
The combination of select
and the marked!
matching form ensures that onlyver
's type annotation is modified. We delete the mark afterward, since it'sno longer needed.
Simply replacing mut c_char
with &str
introduces type errors throughoutthe crate. The initializer for ver
still has type mut c_char
, and alluses of ver
are still expecting a *mut c_char
.
Fixing ver's initializer
Fixing the ver
initializer is straightforward: we simply remove all thecasts, then convert the binary string (&[u8]
) literal to an ordinary stringliteral. For the casts, we mark all cast expressions in ver
's definition,then replace each one with its subexpression:
select target 'item(ver); desc(match_expr(__e as __t));' ;
rewrite_expr 'marked!(__e as __t)' '__e' ;
delete_marks target ;
Diff #15
Only the binary string literal remains, so we mark it and change it to anordinary str
:
select target 'item(ver); child(expr);' ;
bytestr_to_str ;
delete_marks target ;
Diff #16
Fixing ver's uses
ver
's initializer is now well-typed, but its uses are still expecting a *mut c_char
instead of a &str
. To fix these up, we use the type_fix_rules
command, which rewrites expressions anywhere a type error occurs:
type_fix_rules '*, &str, *const __t => __old.as_ptr()' ;
Diff #17
Here we run type_fix_rules
with only one rule: in any position (), if anexpression has type
&str
but is expected to have a raw pointer type (const __t
), then wrap the original expression in a call to .as_ptr()
. This turnsout to be enough to fix all the errors at uses of ver
.
Making ver immutable
Now that all type errors have been corrected, we can finish our refactoring ofver
. We make it immutable, then commit the changes.
select target 'item(ver);' ;
set_mutability imm ;
commit ;
Diff #18
Static string array - messages
Aside from ver
, robotfindskitten
contains a static array of strings, calledmessages
. Like ver
, accessing messages
is unsafe because each element isa raw *mut c_char
pointer and because messages
itself is a static mut
.
We rewrite the type and initializer of messages
using the same strategy asfor ver
:
select target 'item(messages); child(ty); desc(ty);' ;
rewrite_ty 'marked!(*mut libc::c_char)' "&'static str" ;
delete_marks target ;
select target 'item(messages); child(expr); desc(expr);' ;
rewrite_expr 'marked!(__e as __t)' '__e' ;
bytestr_to_str ;
delete_marks target ;
Diff #19
We use type_fix_rules
to fix up the uses of messages
, as we did for ver
:
type_fix_rules
'*, &str, *const __t => __old.as_ptr()'
'*, &str, *mut __t => __old.as_ptr() as *mut __t' ;
Diff #20
Here we needed a second rule for mut
pointers, similar to the one forconst
, because robotfindskitten
mistakenly declares messages
as an arrayof char
instead of const char
.
With all type errors fixed, we can make messages
immutable and commit thechanges:
select target 'item(messages);' ;
set_mutability imm ;
commit ;
Diff #21
Heap allocations
The screen
variable stores a heap-allocated two-dimensional array,represented in C as an int*
. In Rust, this becomes mut *mut c_int
, whichis unsafe to access. We replace it with CArray<CArray<c_int>>
, whereCArray
is a memory-safe collection type provided by the c2rust_runtime
library. CArray
is convenient for this purpose because it supports C-styleinitialization and access patterns (including pointer arithmetic) while stillguaranteeing memory safety.
We actually perform the conversion from mut
to CArray
in two steps.First, we replace mut
with the simpler CBlockPtr
type, also defined inc2rust_runtime
. CBlockPtr
provides some limited bounds checking, butotherwise functions much like a raw pointer. It serves as a usefulintermediate step, letting us fix up the differences between the raw-pointerand CArray
APIs in two stages instead of attempting to do it all at once.Once screen
has been fully converted to CBlockPtr<CBlockPtr<c_int>>
, wefinish the conversion to CArray
in the second step.
As a preliminary, we need to add an import of the c2rust_runtime
library:
select target 'crate;' ;
create_item 'extern crate c2rust_runtime;' inside ;
Diff #22
Now we can proceed with the actual refactoring.
Converting to CBlockPtr
We further break down the transition from mut mut c_int
toCBlockPtr<CBlockPtr<c_int>>
into two steps, first converting the innerpointer (leaving the overall type as *mut CBlockPtr<c_int>
) and then theouter. We change the type annotation first, as we did for var
andmessages
:
select target 'item(screen); child(ty);' ;
rewrite_ty 'marked!(*mut *mut __t)'
'*mut ::c2rust_runtime::CBlockPtr<__t>' ;
Diff #23
This introduces type errors, letting us easily find (and fix) relatedexpressions using type_fix_rules
:
type_fix_rules
'rval, *mut __t, ::c2rust_runtime::CBlockPtr<__u> =>
unsafe { ::c2rust_runtime::CBlockPtr::from_ptr(__old) }'
'rval, *mut __t, *mut __u => __old as *mut __u'
;
Diff #24
The first rule provided here handles the later part of screen
'sinitialization, where the program allocates a mut c_int
array (nowCBlockPtr<c_int>
) for each row of the screen. The second rule handles theearlier part, where it allocates the top-level mut mut c_int
(now mut CBlockPtr<c_int>
). Both allocations now need a cast, since the type of therows has changed.
One category of type errors remains: the initialization code tries todereference the result of offset
ting the array pointer, which is not possibledirectly with the CBlockPtr
API. We add the necessary method call usingrewrite_expr
:
rewrite_expr
'*typed!(__e, ::c2rust_runtime::block_ptr::CBlockOffset<__t>)'
'*__e.as_mut()' ;
Diff #25
Here, the pattern filters for dereferences of CBlockOffset
expressions, whichresult from calling offset
on a CBlockPtr
, and adds a call to as_mut()
before the dereference.
The conversion of screen
to *mut CBlockPtr<c_int>
is now complete. Theconversion to CBlockPtr<CBlockPtr<c_int>>
uses a similar refactoring script:
select target 'crate; item(screen); child(ty);' ;
rewrite_ty 'marked!(*mut __t)'
'::c2rust_runtime::CBlockPtr<__t>' ;
type_fix_rules
'rval, *mut __t, ::c2rust_runtime::CBlockPtr<__u> =>
unsafe { ::c2rust_runtime::CBlockPtr::from_ptr(__old) }'
'rval, *mut __t, *mut __u => __old as *mut __u'
;
rewrite_expr
'*typed!(__e, ::c2rust_runtime::block_ptr::CBlockOffset<__t>)'
'*__e.as_mut()' ;
Diff #26
The only change is in the rewrite_ty
step.
There's one last bit of cleanup to perform: now that screen
has the desiredCBlockPtr<CBlockPtr<c_int>>
type, we can rewrite the allocations thatinitialize it. At this point the allocations use the unsafe malloc
functionfollowed by the unsafe CBlockPtr::from_ptr
, but we can change that to use thesafe CBlockPtr::alloc
method instead:
rewrite_expr 'malloc(__e) as *mut __t as *mut __u' 'malloc(__e) as *mut __u' ;
rewrite_expr
'::c2rust_runtime::CBlockPtr::from_ptr(malloc(__e) as *mut __t)'
'::c2rust_runtime::CBlockPtr::alloc(
__e as usize / ::std::mem::size_of::<__t>())'
;
Diff #27
This doesn't remove the unsafe
blocks wrapping each allocation - we leavethose until the end of our refactoring, when we remove unnecessary unsafe
blocks throughout the entire crate at once.
At this point, the refactoring of screen
to isdone, and we can commit the changes:
commit ;
Diff #28
Converting to CArray
The CArray
and CBlockPtr
APIs are deliberately quite similar, which makesthis part of the screen
refactoring fairly straightforward.
First, we replace all uses of CBlockPtr
with CArray
, both in types and infunction calls:
rewrite_ty '::c2rust_runtime::CBlockPtr<__t>' '::c2rust_runtime::CArray<__t>' ;
rewrite_expr
'::c2rust_runtime::CBlockPtr::from_ptr'
'::c2rust_runtime::CArray::from_ptr' ;
rewrite_expr
'::c2rust_runtime::CBlockPtr::alloc'
'::c2rust_runtime::CArray::alloc' ;
Diff #29
Next, we fix up calls to offset
. Unlike CBlockPtr
(and raw pointers ingeneral), CArray
distinguishes between mutable and immutable offset pointers.We handle this by simply replacing all offset
calls with offset_mut
:
rewrite_expr
'typed!(__e, ::c2rust_runtime::CArray<__t>).offset(__f)'
'__e.offset_mut(__f)' ;
Diff #30
This works fine for robotfindskitten
, though in other codebases it may benecessary to properly distinguish mutable and immutable uses of offset
.
With this change, the code typechecks with screen
s new memory-safe type, sowe could stop here. However, unlike CBlockPtr
, CArray
supports arrayindexing - ptr[i]
- in place of the convoluted *arr.offset(i).as_mut()
syntax. So we perform a simple rewrite to make the code a little easier toread:
rewrite_expr
'typed!(__e, ::c2rust_runtime::CArray<__t>).offset_mut(__f).as_mut()'
'&mut __e[__f as usize]' ;
rewrite_expr '*&mut __e' '__e' ;
Diff #31
Finally, we remove unsafety from screen
's static initializer. It currentlycalls CArray::fromptr(0 as *mut )
, which is unsafe becauseCArray::from_ptr
requires its pointer argument to must satisfy certainproperties. But CArray
also provides a safe method specifically forinitializing a CArray
to null, which we can use instead:
rewrite_expr
'::c2rust_runtime::CArray::from_ptr(cast!(0))'
'::c2rust_runtime::CArray::empty()' ;
Diff #32
This completes the refactoring of screen
, as all raw pointer manipulationshave been replaced with safe CArray
method calls. The only remainingunsafety arises from the fact that screen
is a static mut
, which we addressin a later refactoring step.
commit ;
Using the pancurses library
The pancurses
library provides safe wrappers around ncurses
APIs. Sincethe pancurses
and ncurses
APIs are so similar, we can automatically convertthe unsafe ncurses
FFI calls in robotfindskitten
to safe pancurses
calls,avoiding the need to maintain safe wrappers in robotfindskitten
itself.
There are two preliminary steps before we do the actual conversion. First, wemust import the pancurses
library:
select target 'crate;' ;
create_item 'extern crate pancurses;' inside ;
Diff #33
And second, we must create a global variable to store the main pancurses
Window
:
select target 'crate;' ;
create_item 'static mut win: Option<::pancurses::Window> = None;' inside ;
Diff #34
pancurses
doesn't have an equivalent of the global stdscr
window thatncurses
provides. Instead, the pancurses
initialization function createsan initial Window
object that must be passed around to each function thatupdates the display. We store that initial Window
in the global win
variable so that it's accessible everywhere that stdscr
is used.
Note that making win
a static mut
makes it unsafe to access. However, alater refactoring pass will gather up all static mut
s, including win
,and collect them into a stack-allocated struct, at which point accessing win
will no longer be unsafe.
General library calls
We convert ncurses
library calls to pancurses
ones in a few stages.
First, for functions that don't require a window object, we simply replace eachncurses
function with its equivalent in the pancurses
library:
rewrite_expr 'nonl' '::pancurses::nonl' ;
rewrite_expr 'noecho' '::pancurses::noecho' ;
rewrite_expr 'cbreak' '::pancurses::cbreak' ;
rewrite_expr 'has_colors' '::pancurses::has_colors' ;
rewrite_expr 'start_color' '::pancurses::start_color' ;
rewrite_expr 'endwin' '::pancurses::endwin' ;
rewrite_expr 'init_pair' '::pancurses::init_pair' ;
Diff #35
Next, functions taking a window are replaced with method calls on the staticwin
variable we defined earlier:
rewrite_expr 'wrefresh(stdscr)' 'win.refresh()' ;
rewrite_expr 'wrefresh(curscr)' 'win.refresh()' ;
rewrite_expr 'keypad(stdscr, __bf)' 'win.keypad(__bf)' ;
rewrite_expr 'wmove(stdscr, __my, __mx)' 'win.mv(__my, __mx)' ;
rewrite_expr 'wclear(stdscr)' 'win.clear()' ;
rewrite_expr 'wclrtoeol(stdscr)' 'win.clrtoeol()' ;
rewrite_expr 'waddch(stdscr, __ch)' 'win.addch(__ch)' ;
rewrite_expr
'wattr_get(stdscr, __attrs, __pair, __e)'
'{
let tmp = win.attrget();
*__attrs = tmp.0;
*__pair = tmp.1;
0
}' ;
rewrite_expr
'wattrset(stdscr, __attrs)'
'win.attrset(__attrs as ::pancurses::chtype)' ;
Diff #36
For simplicity, we write win.f(…)
in the rewrite_expr
replacementarguments, even though win
is actually an Option<Window>
, not a Window
.Later, we replace win
with win.as_ref().unwrap()
throughout the crate tocorrect the resulting type errors.
We next replace some ncurses
global variables with calls to correspondingpancurses
functions:
rewrite_expr 'LINES' 'win.get_max_y()' ;
rewrite_expr 'COLS' 'win.get_max_x()' ;
Diff #37
Finally, we handle a few special cases.
waddnstr
takes a string argument, which in general could be any *const c_char
. However, robotfindskitten
calls it only on string literals, whichlets us perform a more specialized rewrite that avoids unsafe C stringconversions:
rewrite_expr
'waddnstr(stdscr, __str as *const u8 as *const libc::c_char, __n)'
"win.addnstr(::std::str::from_utf8(__str).unwrap().trim_end_matches('\0'),
__n as usize)" ;
Diff #38
intrflush
has no pancurses
equivalent, so we replace it with a no-op of thesame type:
rewrite_expr 'intrflush(__e, __f)' '0' ;
Diff #39
That covers all of the "ordinary" ncurses
functions used inrobotfindskitten
. The remaining subsections cover the more complex cases.
String formatting
We previously replaced calls to the ncurses
printw
and mvprintw
string-formatting functions with code using Rust's safe string formattingmacros. This removes unsafety from the call site, but uses wrapper functions(fmt_printw
and fmt_mvprintw
) that call unsafe code internally. But nowthat we are using the pancurses
library, we can replace those wrappers withsafer equivalents.
select target 'item(fmt_printw);' ;
create_item '
fn fmt_printw(args: ::std::fmt::Arguments) -> libc::c_int {
unsafe {
win.printw(&format!("{}", args))
}
}
' after ;
delete_items ;
clear_marks ;
select target 'item(fmt_mvprintw);' ;
create_item '
fn fmt_mvprintw(y: libc::c_int, x: libc::c_int,
args: ::std::fmt::Arguments) -> libc::c_int {
unsafe {
win.mvprintw(y, x, &format!("{}", args))
}
}
' after ;
delete_items ;
clear_marks ;
Diff #40
The wrappers still use unsafe code to access win
, a static mut
, but nolonger make FFI calls or manipulate raw C strings. When we later remove allstatic mut
s from the program, these functions will become entirely safe.
Input handling
Adapting ncurses
-based input handling to use pancurses
requires some extracare. The pancurses
getch
function returns a Rust enum, while thencurses
version simply returns an integer. robotfindskitten
matches thoseintegers against various ncurses
keycode constants, which, after macroexpansion, become integer literals in the Rust code.
The more idiomatic approach would be to replace each integer literal with thematching pancurses::Input
enum variant when switching from ncurses
getch
to the pancurses
version. However, we instead take the easier approach ofconverting pancurses::Input
values back to ncurses
integer keycodes, sothe existing robotfindskitten
input handling code can remain unchanged.
First, we inject a translation function from pancurses
to ncurses
keycodes:
select target 'item(initialize_ncurses);' ;
create_item '
fn encode_input(inp: Option<::pancurses::Input>) -> libc::c_int {
use ::pancurses::Input::*;
let inp = match inp {
Some(x) => x,
None => return -1,
};
match inp {
// TODO: unicode inputs in the range 256 .. 512 can
// collide with ncurses special keycodes
Character(c) => c as u32 as libc::c_int,
Unknown(i) => i,
special => {
let idx = ::pancurses::SPECIAL_KEY_CODES.iter()
.position(|&k| k == special).unwrap();
let code = idx as i32 + ::pancurses::KEY_OFFSET;
if code > ::pancurses::KEY_F15 {
code + 48
} else {
code
}
},
}
}
' after ;
Diff #41
Then, we translate ncurses
wgetch
calls to use the pancurses
getch
method, wrapping the result in encode_input
to keep the results unchanged.
rewrite_expr 'wgetch(stdscr)' '::encode_input(win.getch())' ;
Diff #42
Final steps
As mentioned previously, we use win
to obtain the current window objectthroughout the ncurses
refactoring process, even though win
is actually anOption<Window>
, not a Window
. Now that we are done with all the rewrites,we can update thote uses to access the Window
properly:
rewrite_expr 'win' 'win.as_ref().unwrap()' ;
Diff #43
The final step is to initialize win
. This corresponds to the call to thencurses
initscr
initialization function:
rewrite_expr 'initscr()' 'win = Some(::pancurses::initscr())' ;
Diff #44
We save this for last only so that the win
to win.as_ref().unwrap()
rewritedoesn't produce an erroneous assignment win.as_ref().unwrap() = …
.
At this point, we are done with the current refactoring step:robotfindskitten
has been fully adapted to use the safe pancurses
API inplace of raw ncurses
FFI calls.
commit
Diff #45
Moving global state to the stack
robotfindskitten
uses global variables - static mut
s in Rust - to store thegame state. Accessing these globals is unsafe, due to the difficulty ofpreventing simultaneous borrowing and mutation. In this refactoring step, wemove the global state onto the stack and pass it by reference to every functionthat needs it, which allows the borrow checker to analyze its usage and ensuresafety.
Most of the work in this step is handled by the static_to_local_ref
refactoring command. This command identifies all functions that use a givenstatic, and modifies those functions to access the global through a reference(passed as an argument to the function) instead of accessing it directly. (Seethe static_to_local_ref
command documentation for examples.)
However, running static_to_local_ref
separately on each ofrobotfindskitten
's seven global variables would add up to seven new argumentsto many of robotfindskitten
's functions, making their signatures difficult toread. Instead, we proceed in two steps. First, we gather up all the globalvariables into a single global struct. Then, we run static_to_local_ref
onjust the struct, achieving safety while adding only a single new argument toeach affected function.
We collect the statics into a struct using static_collect_to_struct
:
select target 'crate; child(static && mut);' ;
static_collect_to_struct State S
Diff #46
Then we run static_to_local_ref
to pass a reference to the new State
objecteverywhere it is used:
select target 'crate; child(static && name("S"));' ;
select user 'crate; desc(fn && !name("main|main_0"));' ;
static_to_local_ref ;
Diff #47
The functions that previously accessed the global S
now use a referenceargument S_
, removing a source of unsafety.
The only function that still accesses S
directly is main_0
. And sincemain_0
is called only once per run of the program, we can replace the globalS
with a local variable declared inside main_0
without affecting thebehavior of the program. The static_to_local
command performs the necessarytransformation (using the marks we previous set up for static_to_local_ref
):
static_to_local
Diff #48
Now there are no static mut
s remaining in the program.
There is one final cleanup step to perform. The struct State
appears in thesignature of several public functions, but State
itself is not public, sorustc
reports an error. We could make State
public, but since there is noreason for the functions in question to be public in the first place, we makethe functions private instead:
select target 'crate; desc(fn && !name("main"));' ;
set_visibility '' ;
commit
Diff #49
libc calls
robotfindskitten
makes a number of calls to libc
functions, such as sleep
and rand
, using the FFI. Rust's standard library provides most of the samefunctionality, so we can replace these libc
calls with safe equivalents.
We replace sleep
with std::thread::sleep
:
rewrite_expr 'sleep(__e)'
'::std::thread::sleep(
::std::time::Duration::from_secs(__e as u64))' ;
Diff #50
We replace atoi
with a call to from_str
:
rewrite_expr 'atoi(__e)'
'<libc::c_int as ::std::str::FromStr>::from_str(
::std::ffi::CStr::from_ptr(__e).to_str().unwrap()).unwrap()' ;
Diff #51
In the version of glibc
we used for translating robotfindskitten
, atoi
isactually provided as an inline wrapper function in the libc
headers. Thatmeans the Rust translation of robotfindskitten
actually includes a fulldefinition of fn atoi(…) { … }
. Now that we've replaced the atoi
call,we can delete the definition as well:
select target 'item(atoi);' ;
delete_items ;
clear_marks ;
Diff #52
We replace exit
with std::process::exit
:
rewrite_expr 'exit(__e)' '::std::process::exit(__e as i32)' ;
Diff #53
For rand
, no equivalent is available in the Rust standard library. Instead,we import the rand
crate from crates.io:
select target 'crate;' ;
create_item 'extern crate rand;' inside ;
clear_marks ;
Diff #54
robotfindskitten
uses the common srand(time())
pattern to initialize therandom number generator, suggesting it does not rely on the ability to controlor reuse seeds. That means we can use the thread-local RNG provided by therand
crate, instead of explicitly constructing an RNG with a specific seed.So we replace rand
with calls to rand::random
:
rewrite_expr 'rand()'
'(::rand::random::<libc::c_uint>() >> 1) as libc::c_int' ;
Diff #55
And we delete srand
calls entirely, relying on the rand
crate's automaticinitialization of the thread-local RNG:
rewrite_expr 'srand(__e)' '()' ;
Diff #56
At this point, the only remaining FFI call is to signal
. robotfindskitten
sets up a SIGINT
handler to ensure that ncurses
(now pancurses
) is shutdown properly and the terminal is returned to a normal state when the userterminates the program with ^C
. Unfortunately, there is no general way tomake signal handling safe: to achieve memory safety, signal handling functionsmust obey a number of special restrictions above and beyond Rust's normalnotions of safety, and these properties cannot be checked by the Rust compiler.
We therefore leave the call to signal
as unsafe code. Since this will be theonly unsafe operation in the program once we finish refactoring, we wrap it inits own unsafe block:
rewrite_expr 'signal(__e, __f)' 'unsafe { signal(__e, __f) }' ;
Diff #57
We've now covered all of the libc
functions used by robotfindskitten
, andreplaced nearly all of them with safe code.
commit
Function argument types
Two functions in robotfindskitten
accept raw pointers: message
takes apointer to a string to display on the screen, and main_0
takes an array ofstring pointers argv
containing the program's command line arguments. Tomake these functions safe, we must replace their raw pointer arguments withsafe equivalents.
message
We begin with message
because it is simpler. This function takes a singleargument of type *mut c_char
, which we want to replace with &str
:
select target
'item(message); child(arg); child(match_ty(*mut libc::c_char));' ;
rewrite_ty 'marked!(*mut libc::c_char)' '&str' ;
delete_marks target ;
Diff #59
Of course, simply changing the type annotation is not sufficient. Like when weretyped the ver
and messages
constants, this change has introduced twokinds of type errors: callers of message
still pass mut c_char
where&str
is now expected, and the body of message still uses the message_0: &str
argument in contexts that require a mut c_char
. We fix these usingtype_fix_rules
:
type_fix_rules
'*, *mut __t, &str =>
::std::ffi::CStr::from_ptr(__old).to_str().unwrap()'
'*, &str, *const __t =>
::std::ffi::CString::new(__old.to_owned()).unwrap().as_ptr()'
;
Diff #60
The first rule handles callers of message
, using CStr
methods to converttheir mut c_char
raw pointers into safe &str
references. The secondhandles errors in the body of message
, using CString
to convert &str
sback into const c_char
. Note we must use CString
instead of CStr
in thesecond rule because an allocation is required: a &str
is not guaranteed toend with a null terminator, so CString
must copy it into a larger buffer andadd the null terminator to produce a valid *const c_char
string pointer.Since the CString
is temporary, it will be deallocated at the end of thecontaining expression, but this is good enough for the code we encounter insideof message
. More complex string manipulation, however, would likely requirea different refactoring approach.
main_0
The Rust function main_0
is the translation of the C main
function ofrobotfindskitten
. The Rust main
is a c2rust
-generated wrapper thathandles the differences between C's main
signature and Rust's before invokingmain_0
.
As in the message
case, we wish to replace the unsafe pointer types inmain_0
's argument list with safe equivalents. However, in this case ourchoice of safe reference type is more constrained. main_0
callsargv.offset
to access the individual command-line arguments, so we must useCArray
(which supports such access patterns) for the outer pointer. For theinner pointer, we use Option<&CStr>
: CStr
supports the conversions we willneed to perform in main
and main_0
, and Option<&CStr>
can be safelyzero-initialized, which is required by CArray
.
We begin, as with message
, by rewriting the argument type:
select target
'item(main_0); child(arg && name("argv")); child(ty);' ;
rewrite_ty 'marked!(*mut *mut libc::c_char)'
'::c2rust_runtime::CArray<Option<&::std::ffi::CStr>>' ;
delete_marks target ;
Diff #61
Next, we fix type errors in main
, which is the only caller of main_0
.Since c2rust
always generates the same main
wrapper function, rather thanrefactor it, we can simply replace it entirely with a new version that iscompatible with main_0
's new signature:
select target 'item(main);' ;
create_item '
fn main() {
// Collect argv into a vector.
let mut args_owned: Vec<::std::ffi::CString> = Vec::new();
for arg in ::std::env::args() {
args_owned.push(::std::ffi::CString::new(arg).unwrap());
}
// Now that the length is known, we can build a CArray.
let mut args: ::c2rust_runtime::CArray<Option<&::std::ffi::CStr>> =
::c2rust_runtime::CArray::alloc(args_owned.len() + 1);
for i in 0 .. args_owned.len() {
args[i] = Some(&args_owned[i]);
}
// The last element of `args` remains `None`.
unsafe {
::std::process::exit(main_0(
(args.len() - 1) as libc::c_int,
args) as i32);
}
}
' after ;
delete_items ;
clear_marks ;
Diff #62
Now to fix errors in main_0
itself. We changed both the inner and outerpointer types of argv
, so there are two kinds of errors to clean up.
For the outer pointer, where we changed mut T
to CArray<T>
, the problemwe see is that argv.offset(…)
returns &CArrayOffset<T>
, not mut T
, and&CArrayOffset<T>
requires two derefs to obtain a T
(&CArrayOffset<T>
derefs to CArrayOffset<T>
, which derefs to T
) instead of just one. Wehandle this with type_fix_rules
, looking for cases where a single derefresulted in CArrayOffset<T>
but some other type was expected, and adding thesecond deref:
type_fix_rules
'*, ::c2rust_runtime::array::CArrayOffset<__t>, __u => *__old'
;
Diff #63
For the inner pointer type, which we changed from mut c_char
toOption<&CStr>
, we need only insert a simple conversion anywhere the new typeis used but mut c_char
is expected:
type_fix_rules
'*, ::std::option::Option<&::std::ffi::CStr>, *const i8 =>
opt_c_str_to_ptr(__old)'
;
Diff #64
The only quirk here is that we wrap up the conversion in a helper function,making it easier to recognize in the later refactoring step where we clean upredundant string conversions. Of course, now we must define that helperfunction:
select target 'item(main);' ;
create_item '
fn opt_c_str_to_ptr(x: Option<&::std::ffi::CStr>) -> *const libc::c_char {
match x {
None => ::std::ptr::null(),
Some(x) => x.as_ptr(),
}
}
' after ;
clear_marks ;
Diff #65
And with that, we are done. All raw pointer arguments in robotfindskitten
have now been replaced with safe equivalents.
commit
String conversion cleanup
A number of the previous refactoring steps involved changing the type of somevariable from a raw C string (*const c_char
) to a safe Rust string (&str
),inserting conversions between the two forms everywhere the variable wasinitialized or used. But now that we have finished transitioning the entirecrate to Rust strings, many of those conversions have become redundant.Essentially, we began with code like this:
fn f(s1: *const c_char) { ... }
fn g(s2: *const c_char) {
... f(s2) ...
}
By incrementally refactoring C strings into Rust string, we firsttransitioned to code like this:
fn f(s1: &str) { ... }
fn g(s2: *const c_char) {
... f(c_str_to_rust(s2)) ...
}
And then to code like this:
fn f(s1: &str) { ... }
fn g(s2: &str) {
... f(rust_str_to_c(c_str_to_rust(s2))) ...
}
But rust_str_to_c(c_str_to_rust(s2))
is the same as just s2
- the twoconversions are redundant and can be removed:
fn f(s1: &str) { ... }
fn g(s2: &str) {
... f(s2) ...
}
This doesn't merely affect readability - the actual conversion operationsrepresented by c_str_to_rust
are unsafe, so we must remove them to completeour refactoring of robotfindskitten
.
The actual refactoring process we apply to robotfindskitten
mostly consistsof removing specific types of redundant conversions with rewrite_expr
. Thepatterns we use here are general, taking advantage of overlap between differentconversion cases rather than hardcoding a rewrite for each distinct conversionin robotfindskitten
.
To begin with, converting CString
to *const c_char
to CStr
can bereplaced with a no-op (CString
derefs to CStr
, so it can be used almostanywhere a CStr
is required):
rewrite_expr
'::std::ffi::CStr::from_ptr(
cast!(typed!(__e, ::std::ffi::CString).as_ptr()))'
'__e' ;
Diff #67
Converting String
to CString
to Option<&str>
is not strictly a no-op, butcan still be simplified:
rewrite_expr
'::std::ffi::CString::new(__e).unwrap().to_str()'
'Some(&__e)' ;
Diff #68
In some places, the code actually converts &str
to const c_char
directly,rather than using CString
, and then converts const c_char
to CStr
to&str
. This is memory-safe only when the &str
already includes a nullterminator, and the CStr
to str
conversion will trim it off. We rewritethe code to simply trim off the null terminator directly, avoiding thesecomplex (and unsafe) conversions:
rewrite_expr
'::std::ffi::CStr::from_ptr(
cast!(typed!(__e, &str).as_ptr())).to_str()'
"Some(__e.trim_end_matches('\0'))" ;
Diff #69
For code in main_0
using the opt_c_str_to_ptr
helper function we introducedearlier, the Option<&CStr>
to &CStr
conversion can be replaced with asimple unwrap()
:
rewrite_expr
'::std::ffi::CStr::from_ptr(cast!(opt_c_str_to_ptr(__e)))'
'__e.unwrap()' ;
Diff #70
Conversions of bytestring literals (b"…"
, whose type is &[u8; ]
) to*const c_char
to CStr
to str
can be simplified down to a directconversion from &[u8;
]
to &str
, plus removal of the null terminator:
rewrite_expr
'::std::ffi::CStr::from_ptr(
cast!(typed!(__e, &[u8; __f]))).to_str()'
"Some(::std::str::from_utf8(__e).unwrap().trim_end_matches('\0'))" ;
Diff #71
This removes the unsafety, but with a little more work, we can further improvereadability. First, we convert the byte strings to ordinary string literals(b"…"
to "…"
):
select target
'crate; desc(match_expr(::std::str::from_utf8(__e))); desc(expr);' ;
bytestr_to_str ;
clear_marks ;
Diff #72
This introduces type errors, as the type of the literal has changed from &str
to &[u8]
. We fix these by inserting calls to str::as_bytes
:
type_fix_rules '*, &str, &[u8] => __old.as_bytes()' ;
Diff #73
Finally, we remove the redundant conversion from &str
to &[u8]
to &str
:
rewrite_expr
'::std::str::from_utf8(__e.as_bytes())'
'Some(__e)' ;
Diff #74
With the replacements above, we have removed all redundant string conversionsfrom the crate. This was the last major source of unnecessary unsafety inrobotfindskitten
.
The last few changes we make are purely cosmetic - they do not affect safety.First, Some(x).unwrap()
is the same as just x
:
rewrite_expr
'Some(__x).unwrap()'
'__x' ;
Diff #75
And second, "foo\0".trim_end_matches('\0')
is the same as just "foo"
. Thisone is a little more complicated to rewrite. We first remove null terminatorsthroughout the crate, then remove the calls to trim_end_matches
:
select target 'crate; desc(expr);' ;
remove_null_terminator ;
clear_marks ;
rewrite_expr "__e.trim_end_matches('\0')" '__e' ;
Diff #76
This indiscriminate use of remove_null_terminator
could introduce bugs(including memory unsafety) if the program still contained code that relies onthe presence of the null terminator, such as calls to CStr::from_ptr
orlibc
string functions. But previous refactoring steps have already removedall uses of those functions from robotfindskitten
, so this transformation issafe.
commit
Final cleanup
At this point, we have removed all the major sources of unsafety fromrobotfindskitten
. We finish the refactoring with an assortment of minorcleanup steps.
We want to remove unnecessary unsafe
blocks, but right now every unsafe
block is considered unused because they all occur inside unsafe fn
s. None ofthese functions actually need to be unsafe at this point, so we mark them safe:
select target 'crate; desc(item && fn);' ;
set_unsafety safe ;
clear_marks ;
Diff #78
This part can't be fully automated. In general, there is no easy way to tellwhether the safety of a given function relies on unchecked assumptions aboutits input, or whether it might break invariants that other functions rely on.In the case of robotfindskitten
, every function really is safe, but for otherapplications or libraries, it might be necessary to be more selective whenremoving the unsafe
qualifier.
Now that all functions are safe, fix_unused_unsafe
will remove any unsafe
blocks that contain no unsafe operations:
fix_unused_unsafe
Diff #79
Next, we remove a number of unused items from the crate. We have replaced alluses of the FFI declarations generated by c2rust
with alternatives, exceptfor one call to signal
. We generate a new extern "C"
block containing onlythe declaration of signal
, then delete the old unused extern "C"
blocks.remove the declarations now:
select target 'crate; desc(foreign_mod); last;' ;
create_item '
extern "C" {
fn signal(sig: libc::c_int, handler: __sighandler_t) -> __sighandler_t;
}
' after ;
select target 'crate; desc(foreign_mod && !marked(new));' ;
delete_items ;
Diff #80
Furthermore, we can delete a number of type declarations that were previouslyused only in foreign functions:
select target '
item(__time_t);
item(time_t);
item(pdat);
item(_win_st);
item(WINDOW);
' ;
delete_items ;
Diff #81
Similarly, we can delete the opt_c_str_to_ptr
helper function, which we usedonly temporarily while cleaning up string-pointer function arguments:
select target '
item(opt_c_str_to_ptr);
' ;
delete_items ;
Diff #82
Now we are done refactoring robotfindskitten
. We have preserved thefunctionality of the original C program, but all unsafe code has been removed,with the exception of a single signal
call that cannot be made safe.The refactored Rust version of robotfindskitten
is still unidiomatic andsomewhat difficult to read, but by removing nearly all of the unsafe code, wehave established a solid foundation for future improvements.
Final output
Here is the final refactored version of robotfindskitten
:
Diff #83