I/O & Serialization

CIS 198 Lecture 8


I/O


Traits!

  1. pub trait Read {
  2. fn read(&mut self, buf: &mut [u8]) -> Result<usize>;
  3. // Other methods implemented in terms of read().
  4. }
  5. pub trait Write {
  6. fn write(&mut self, buf: &[u8]) -> Result<usize>;
  7. fn flush(&mut self) -> Result<()>;
  8. // Other methods implemented in terms of write() and flush().
  9. }
  • Standard IO traits implemented for a variety of types:
    • Files, TcpStreams, Vec<T>s, &[u8]s.
  • Careful: return types are std::io::Result, not std::Result!
    • type Result<T> = Result<T, std::io::Error>;

std::io::Read

  1. use std::io;
  2. use std::io::prelude::*;
  3. use std::fs::File;
  4. let mut f = try!(File::open("foo.txt"));
  5. let mut buffer = [0; 10];
  6. // read up to 10 bytes
  7. try!(f.read(&mut buffer));
  • buffer is an array, so the max length to read is encoded into the type.
  • read returns the number of bytes read, or an Err specifying the problem.
    • A return value of Ok(n) guarantees that n <= buf.len().
    • It can be 0, if the reader is empty.

Ways of Reading

  1. /// Required.
  2. fn read(&mut self, buf: &mut [u8]) -> Result<usize>;
  3. /// Reads to end of the Read object.
  4. fn read_to_end(&mut self, buf: &mut Vec<u8>) -> Result<usize>
  5. /// Reads to end of the Read object into a String.
  6. fn read_to_string(&mut self, buf: &mut String) -> Result<usize>
  7. /// Reads exactly the length of the buffer, or throws an error.
  8. fn read_exact(&mut self, buf: &mut [u8]) -> Result<()>
  • Read provides a few different ways to read into a variety of buffers.
    • Default implementations are provided for them using read.
  • Notice the different type signatures.

Reading Iterators

  1. fn bytes(self) -> Bytes<Self> where Self: Sized
  2. // Unstable!
  3. fn chars(self) -> Bytes<Self> where Self: Sized
  • bytes transforms some Read into an iterator which yields byte-by-byte.
  • The associated Item is Result<u8>.

    • So the type returned from calling next() on the iterator is Option<Result<u8>>.
    • Hitting an EOF corresponds to None.
  • chars does the same, and will try to interpret the reader’s contents as a UTF-8 character sequence.

    • Unstable; Rust team is not currently sure what the semantics of this should be. See issue #27802.

Iterator Adaptors

  1. fn chain<R: Read>(self, next: R) -> Chain<Self, R>
  2. where Self: Sized
  • chain takes a second reader as input, and returns an iterator over all bytes from self, then next.
  1. fn take<R: Read>(self, limit: u64) -> Take<Self>
  2. where Self: Sized
  • take creates an iterator which is limited to the first limit bytes of the reader.

std::io::Write

  1. pub trait Write {
  2. fn write(&mut self, buf: &[u8]) -> Result<usize>;
  3. fn flush(&mut self) -> Result<()>;
  4. // Other methods omitted.
  5. }
  • Write is a trait with two required methods, write() and flush()
    • Like Read, it provides other default methods implemented in terms of these.
  • write (attempts to) write to the buffer and returns the number of bytes written (or queued).
  • flush ensures that all written data has been pushed to the target.
    • Writes may be queued up, for optimization.
    • Returns Err if not all queued bytes can be written successfully.

Writing

  1. let mut buffer = try!(File::create("foo.txt"));
  2. try!(buffer.write("Hello, Ferris!"));

Writing Methods

  1. /// Attempts to write entire buffer into self.
  2. fn write_all(&mut self, buf: &[u8]) -> Result<()> { ... }
  3. /// Writes a formatted string into self.
  4. /// Don't call this directly, use `write!` instead.
  5. fn write_fmt(&mut self, fmt: Arguments) -> Result<()> { ... }
  6. /// Borrows self by mutable reference.
  7. fn by_ref(&mut self) -> &mut Self where Self: Sized { ... }

write!

  • Actually using writers can be kind of clumsy when you’re doing a general application.
    • Especially if you need to format your output.
  • The write! macro provides string formatting by abstracting over write_fmt.
  • Returns a Result.
  1. let mut buf = try!(File::create("foo.txt"));
  2. write!(buf, "Hello {}!", "Ferris").unwrap();

IO Buffering

  • IO operations are really slow.
  • Like, really slow:
  1. TODO: demonstrate how slow IO is.
  • Why?

IO Buffering

  • Your running program has very few privileges.
  • Reads are done through the operating system (via system call).
    • Your program will do a context switch, temporarily stopping execution so the OS can gather input and relay it to your program.
    • This is veeeery slow.
  • Doing a lot of reads in rapid succession suffers hugely if you make a system call on every operation.
    • Solve this with buffers!
    • Read a huge chunk at once, store it in a buffer, then access it little-by-little as your program needs.
  • Exact same story with writes.

BufReader

  1. fn new(inner: R) -> BufReader<R>;
  1. let mut f = try!(File::open("foo.txt"));
  2. let buffered_reader = BufReader::new(f);
  • BufReader is a struct that adds buffering to any reader.
  • BufReader itself implements Read, so you can use it transparently.

BufReader

  • BufReader also implements a separate interface BufRead.
  1. pub trait BufRead: Read {
  2. fn fill_buf(&mut self) -> Result<&[u8]>;
  3. fn consume(&mut self, amt: usize);
  4. // Other optional methods omitted.
  5. }

BufReader

  • Because BufReader has access to a lot of data that has not technically been read by your program, it can do more interesting things.
  • It defines two alternative methods of reading from your input, reading up until a certain byte has been reached.
  1. fn read_until(&mut self, byte: u8, buf: &mut Vec<u8>)
  2. -> Result<usize> { ... }
  3. fn read_line(&mut self, buf: &mut String)
  4. -> Result<usize> { ... }
  • It also defines two iterators.
  1. fn split(self, byte: u8)
  2. -> Split<Self> where Self: Sized { ... }
  3. fn lines(self)
  4. -> Lines<Self> where Self: Sized { ... }

BufWriter

  • BufWriter does the same thing, wrapping around writers.
  1. let f = try!(File::create("foo.txt"));
  2. let mut writer = BufWriter::new(f);
  3. try!(buffer.write(b"Hello world"));
  • BufWriter doesn’t implement a second interface like BufReader does.
  • Instead, it just caches all writes until the BufWriter goes out of scope, then writes them all at once.

StdIn

  1. let mut buffer = String::new();
  2. try!(io::stdin().read_line(&mut buffer));
  • This is a very typical way of reading from standard input (terminal input).
  • io::stdin() returns a value of struct StdIn.
  • stdin implements read_line directly, instead of using BufRead.

StdInLock

  • A “lock” on standard input means only that current instance of StdIn can read from the terminal.
    • So no two threads can read from standard input at the same time.
  • All read methods call self.lock() internally.
  • You can also create a StdInLock explicitly with the stdin::lock() method.
  1. let lock: io::StdInLock = io::stdin().lock();
  • A StdInLock instance implements Read and BufRead, so you can call any of the methods defined by those traits.

StdOut

  • Similar to StdIn but interfaces with standard output instead.
  • Directly implements Write.
  • You don’t typically use stdout directly.
    • Prefer print! or println! instead, which provide string formatting.
  • You can also explicitly lock standard out with stdout::lock().

Special IO Structs

  • repeat(byte: u8): A reader which will infinitely yield the specified byte.
    • It will always fill the provided buffer.
  • sink(): “A writer which will move data into the void.”
  • empty(): A reader which will always return Ok(0).
  • copy(reader: &mut R, writer: &mut W) -> Result<u64>: copies all bytes from the reader into the writer.

Serialization


rustc-serialize

  • Implements automatic serialization for Rust structs.
    • (Via compiler support.)
  • Usually used with JSON output:
  1. extern crate rustc_serialize;
  2. use rustc_serialize::json;
  3. #[derive(RustcDecodable, RustcEncodable)]
  4. pub struct X { a: i32, b: String }
  5. fn main() {
  6. let object = X { a: 6, b: String::from("half dozen") };
  7. let encoded = json::encode(&object).unwrap();
  8. // ==> the string {"a":6,"b":"half dozen"}
  9. let decoded: X = json::decode(&encoded).unwrap();
  10. }
  • Also has support for hex- and base64- encoded text output.

Serde

  • Serialization/Deserialization.
  • Next generation of Rust serialization: faster, more flexible.
    • But API is currently in flux! We’re talking about serde 0.7.0, released yesterday. (Not on crates.io as of this writing.)
  • Serde is easy in Rust nightly!
    • A compiler plugin creates attributes and auto-derived traits.
  • Slightly harder to use in Rust stable:
    • Compiler plugins aren’t available.
    • Instead, Rust code is generated before building (via build.rs).
      • serde_codegen generates .rs files from .rs.in files.
    • And you use the include! macro to include the resulting files.
  • Separate crates for each output format:
    • Support for binary, JSON, MessagePack, XML, YAML.

Serde

  • Code looks similar to rustc_serialize:
  1. #![feature(custom_derive, plugin)]
  2. #![plugin(serde_macros)]
  3. extern crate serde;
  4. extern crate serde_json;
  5. #[derive(Serialize, Deserialize, Debug)]
  6. pub struct X { a: i32, b: String }
  7. fn main() {
  8. let object = X { a: 6, b: String::from("half dozen") };
  9. let encoded = serde_json::to_string(&object).unwrap();
  10. // ==> the string {"a":6,"b":"half dozen"}
  11. let decoded: X = serde_json::from_str(&encoded).unwrap();
  12. }

Serde

  • But there are more features!
  • Serializers are generated using the visitor pattern, producing code like the following.
    • Which can also be written manually and customized.
  1. use serde;
  2. use serde::*;
  3. use serde::ser::*;
  4. struct Point { x: i32, y: i32 }
  5. impl Serialize for Point {
  6. fn serialize<S>(&self, sr: &mut S)
  7. -> Result<(), S::Error> where S: Serializer {
  8. sr.serialize_struct("Point",
  9. PointMapVisitor { value: self, state: 0 })
  10. }
  11. }

Serde

  1. struct PointMapVisitor<'a> { value: &'a Point, state: u8 }
  2. impl<'a> MapVisitor for PointMapVisitor<'a> {
  3. fn visit<S>(&mut self, sr: &mut S)
  4. -> Result<Option<()>, S::Error> where S: Serializer {
  5. match self.state {
  6. 0 => { // On first call, serialize x.
  7. self.state += 1;
  8. Ok(Some(try!(sr.serialize_struct_elt("x", &self.value.x))))
  9. }
  10. 1 => { // On second call, serialize y.
  11. self.state += 1;
  12. Ok(Some(try!(sr.serialize_struct_elt("y", &self.value.y))))
  13. }
  14. _ => Ok(None) // Subsequently, there is no more to serialize.
  15. }
  16. }
  17. }
  • Deserialization code is also generated - similar but messier.

Serde

  • Custom serializers are flexible, but complicated.
  • Serde also provides customization via #[serde(something)] attributes. something can be:
    • On fields and enum variants:
      • rename = "foo": overrides the serialized key name
    • On fields:
      • default: use Default trait to generate default values
      • default = "func" use func() to generate default values
      • skip_serializing: skips this field
      • skip_serializing_if = "func": skips this field if !func(val)
      • serialize_with = "enc": serialize w/ enc(val, serializer)
      • deserialize_with = "dec": deserialize w/ dec(deserializer)
    • On containers (structs, enums):
      • deny_unknown_fields: error instead of ignoring unknown fields