layout: post
title: “Swapping type-safety for high performance using compiler directives”
description: “An experiment in how to have your cake and eat it too”
categories: []
TL;DR; An experiment: you can use lots of domain modelling types at development time and swap them out for a more performant implementation later using compiler directives.
Domain Modelling vs. Performance
I am a big fan of using types for domain modelling — lots and lots and lots of types!
These types act both as documentation and as a compile time constraint to ensure that only valid data is used.
For example, if I have two types CustomerId
and OrderId
, I can represent them as separate types:
type CustomerId = CustomerId of int
type OrderId = OrderId of int
and by doing this, I guarantee that I can’t use an OrderId
where I need an CustomerId
.
The problem is that adding a layer of indirection like this can affect performance:
- the extra indirection can cause data access to be much slower.
- the wrapper class needs extra memory, creating memory pressure.
- this in turn triggers the garbage collector more often, which can often be the cause of performance problems in managed code.
In general, I don’t generally worry about micro-performance like this at design-time.
Many many things will have a much bigger impact on performance, including any kind of I/O, and the algorithms you choose.
As a result, I am very much against doing micro-benchmarks out of context. You should always profile a real app in a real context,
rather than worrying too much over things that might not be important.
Having said that, I am now going to do some micro-benchmarks!
Micro-benchmarking a wrapper type
Let’s see how a wrapper type fares when used in large numbers. Let’s say we want to:
- create ten million customer ids
- then, map over them twice
- then, filter them
Admittedly, it’s a bit silly adding 1 to a customer id — we’ll look at a better example later.
Anyway, here’s the code:
// type is an wrapper for a primitive type
type CustomerId = CustomerId of int
// create two silly functions for mapping and filtering
let add1ToCustomerId (CustomerId i) =
CustomerId (i+1)
let isCustomerIdSmall (CustomerId i) =
i < 100000
// ---------------------------------
// timed with a 1 million element array
// ---------------------------------
#time
Array.init 1000000 CustomerId
// map it
|> Array.map add1ToCustomerId
// map it again
|> Array.map add1ToCustomerId
// filter it
|> Array.filter isCustomerIdSmall
|> ignore
#time
The code sample above is available on GitHub.
(Again, let me stress that this is a terrible way to profile code!)
A typical timed result looks like this:
Real: 00:00:00.296, CPU: 00:00:00.296, GC gen0: 6, gen1: 4, gen2: 0
That is, it takes about 0.3 seconds to do those steps, and it creates quite a bit of garbage, triggering four gen1 collections.
If you are not sure what “gen0”, “gen1”, and “gen2” mean, then this is a good place to start.
DISCLAIMER: I’m going to be doing all my benchmarking in F# interactive. Compiled code with optimizations might have a completely different performance profile.
Past performance is no guarantee of future results. Draw conclusions at your own risk. Etc., etc.
If we increase the array size to 10 million, we get a more than 10x slower result:
Real: 00:00:03.489, CPU: 00:00:03.541, GC gen0: 68, gen1: 46, gen2: 2
That is, it takes about 3.5 seconds to do those steps, and it creates a lot of garbage, including a few gen2 GC’s, which are really bad.
In fact, you might even get an “out of memory” exception, in which case, you’ll have to restart F# Interactive!
So what are the alternatives to using a wrapper? There are two common approaches:
- Using type aliases
- Using units of measure
Let’s start with type aliases.
Using type aliases
In the type alias approach, I would simply dispense with the wrapper, but keep the type around as documentation.
type CustomerId = int
type OrderId = int
If I want to use the type as documentation, I must then annotate the functions appropriately.
For example, in the add1ToCustomerId
below
both the parameter and the return value have been annotated so that it has the type CustomerId -> CustomerId
rather than int -> int
.
let add1ToCustomerId (id:CustomerId) :CustomerId =
id+1
Micro-benchmarking a type alias
Let’s create another micro-benchmark:
type CustomerId = int
// create two silly functions for mapping and filtering
let add1ToCustomerId (id:CustomerId) :CustomerId =
id+1
// val add1ToCustomerId : id:CustomerId -> CustomerId
let isCustomerIdSmall (id:CustomerId) =
id < 100000
// val isCustomerIdSmall : id:CustomerId -> bool
// ---------------------------------
// timed with a 1 million element array
// ---------------------------------
#time
Array.init 1000000 (fun i -> i)
// map it
|> Array.map add1ToCustomerId
// map it again
|> Array.map add1ToCustomerId
// filter it
|> Array.filter isCustomerIdSmall
|> Array.length
#time
The code sample above is available on GitHub.
The results are spectacularly better!
Real: 00:00:00.017, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0
It takes about 17 milliseconds to do those steps, and more importantly, very little garbage was generated.
If we increase the array size to 10 million, we get a 10x slower result, but still no garbage:
Real: 00:00:00.166, CPU: 00:00:00.156, GC gen0: 0, gen1: 0, gen2: 0
Compared with the earlier version at over three seconds, that’s excellent.
Problems with type aliases
Alas, the problem with type aliases is that we have completely lost type safety now!
To demonstrate, here’s some code that creates a CustomerId
and an OrderId
:
type CustomerId = int
type OrderId = int
// create two
let cid : CustomerId = 12
let oid : OrderId = 12
And sadly, the two ids compare equal, and we can pass an OrderId
to function expecting a CustomerId
without any complaint from the compiler.
cid = oid // true
// pass OrderId to function expecting a CustomerId
add1ToCustomerId oid // CustomerId = 13
Ok, so that doesn’t look promising! What next?
Using units of measure
The other common option is to use units of measure to distinguish the two types, like this:
type [<Measure>] CustomerIdUOM
type [<Measure>] OrderIdUOM
type CustomerId = int<CustomerIdUOM>
type OrderId = int<OrderIdUOM>
CustomerId
and OrderId
are still two different types, but the unit of measure is erased, so by the time the JIT sees it the type looks like an primitive int.
We can see that this is true when we time the same steps as before:
// create two silly functions for mapping and filtering
let add1ToCustomerId id =
id+1<CustomerIdUOM>
let isCustomerIdSmall i =
i < 100000<CustomerIdUOM>
// ---------------------------------
// timed with a 1 million element array
// ---------------------------------
#time
Array.init 1000000 (fun i -> LanguagePrimitives.Int32WithMeasure<CustomerIdUOM> i)
// map it
|> Array.map add1ToCustomerId
// map it again
|> Array.map add1ToCustomerId
// filter it
|> Array.filter isCustomerIdSmall
|> ignore
#time
The code sample above is available on GitHub.
A typical timed result looks like this:
Real: 00:00:00.022, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0
Again, the code is very fast (22 milliseconds), and just as importantly, very little garbage was generated again.
If we increase the array size to 10 million, we maintain the high performance (just as with the type alias approach) and still no garbage:
Real: 00:00:00.157, CPU: 00:00:00.156, GC gen0: 0, gen1: 0, gen2: 0
Problems with units of measure
The advantage of units of measure is that the CustomerId
and OrderId
types are incompatible, so we get the type safety that we want.
But I find them unsatisfactory from an esthetic point of view. I like my wrapper types!
And also, units of measure are really meant to be used with numeric values. For example, I can create a customer id and order id:
let cid = 12<CustomerIdUOM>
let oid = 4<OrderIdUOM>
and then I can divide CustomerId(12) by OrderId(4) to get three…
let ratio = cid / oid
// val ratio : int<CustomerIdUOM/OrderIdUOM> = 3
Three what though? Three customer ids per order id? What does that even mean?
Yes, surely this will never happen in practice, but still it bothers me!
Using compiler directives to get the best of both worlds
Did I mention that I really like wrapper types? I really like them up until I get a call saying that production systems are having performance hiccups because of too many big GCs.
So, can we get the best of both worlds? Type-safe wrapper types AND fast performance?
I think so, if you are willing to put up with some extra work during development and build.
The trick is to have both the “wrapper type” implemention and the “type alias” implementation available to you, and then switch between them based on a compiler directive.
For this to work:
- you will need to tweak your code to not access the type directly, but only via functions and pattern matching.
- you will need to create a “type alias” implementation that implements a “constructor”, various “getters” and for pattern matching, active patterns.
Here’s an example, using the COMPILED
and INTERACTIVE
directives so that you can play with it interactively.
Obviously, in real code, you would use your own directive such as FASTTYPES
or similar.
#if COMPILED // uncomment to use aliased version
//#if INTERACTIVE // uncomment to use wrapped version
// type is an wrapper for a primitive type
type CustomerId = CustomerId of int
// constructor
let createCustomerId i = CustomerId i
// get data
let customerIdValue (CustomerId i) = i
// pattern matching
// not needed
#else
// type is an alias for a primitive type
type CustomerId = int
// constructor
let inline createCustomerId i :CustomerId = i
// get data
let inline customerIdValue (id:CustomerId) = id
// pattern matching
let inline (|CustomerId|) (id:CustomerId) :int = id
#endif
You can see that for both versions I’ve created a constructor createCustomerId
and a getter customerIdValue
and, for the type alias version, an active pattern that looks just like CustomerId
.
With this code in place, we can use CustomerId
without caring about the implementation:
// test the getter
let testGetter c1 c2 =
let i1 = customerIdValue c1
let i2 = customerIdValue c2
printfn "Get inner value from customers %i %i" i1 i2
// Note that the signature is as expected:
// c1:CustomerId -> c2:CustomerId -> unit
// test pattern matching
let testPatternMatching c1 =
let (CustomerId i) = c1
printfn "Get inner value from Customers via pattern match: %i" i
match c1 with
| CustomerId i2 -> printfn "match/with %i" i
// Note that the signature is as expected:
// c1:CustomerId -> unit
let test() =
// create two ids
let c1 = createCustomerId 1
let c2 = createCustomerId 2
let custArray : CustomerId [] = [| c1; c2 |]
// test them
testGetter c1 c2
testPatternMatching c1
And now we can run the same micro-benchmark with both implementations:
// create two silly functions for mapping and filtering
let add1ToCustomerId (CustomerId i) =
createCustomerId (i+1)
let isCustomerIdSmall (CustomerId i) =
i < 100000
// ---------------------------------
// timed with a 1 million element array
// ---------------------------------
#time
Array.init 1000000 createCustomerId
// map it
|> Array.map add1ToCustomerId
// map it again
|> Array.map add1ToCustomerId
// filter it
|> Array.filter isCustomerIdSmall
|> Array.length
#time
The code sample above is available on GitHub.
The results are similar to the previous examples. The aliased version is much faster and does not create GC pressure:
// results using wrapped version
Real: 00:00:00.408, CPU: 00:00:00.405, GC gen0: 7, gen1: 4, gen2: 1
// results using aliased version
Real: 00:00:00.022, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0
and for the 10 million element version:
// results using wrapped version
Real: 00:00:03.199, CPU: 00:00:03.354, GC gen0: 67, gen1: 45, gen2: 2
// results using aliased version
Real: 00:00:00.239, CPU: 00:00:00.202, GC gen0: 0, gen1: 0, gen2: 0
A more complex example
In practice, we might want something more complex than a simple wrapper.
For example, here is an EmailAddress
(a simple wrapper type, but constrained to be non-empty and containing a “@”) and
some sort of Activity
record that stores an email and the number of visits, say.
module EmailAddress =
// type with private constructor
type EmailAddress = private EmailAddress of string
// safe constructor
let create s =
if System.String.IsNullOrWhiteSpace(s) then
None
else if s.Contains("@") then
Some (EmailAddress s)
else
None
// get data
let value (EmailAddress s) = s
module ActivityHistory =
open EmailAddress
// type with private constructor
type ActivityHistory = private {
emailAddress : EmailAddress
visits : int
}
// safe constructor
let create email visits =
{emailAddress = email; visits = visits }
// get data
let email {emailAddress=e} = e
let visits {visits=a} = a
As before, for each type there is a constructor and a getter for each field.
NOTE: Normally I would define a type outside a module, but because the real constructor needs to be private,
I’ve put the type inside the module and given the module and the type the same name. If this is too awkward, you can rename the module to be different
from the type, or use the OCaml convention of calling the main type in a module just “T”, so you get EmailAddress.T
as the type name.
To make a more performant version, we replace EmailAddress
with a type alias, and Activity
with a struct, like this:
module EmailAddress =
// aliased type
type EmailAddress = string
// safe constructor
let inline create s :EmailAddress option =
if System.String.IsNullOrWhiteSpace(s) then
None
else if s.Contains("@") then
Some s
else
None
// get data
let inline value (e:EmailAddress) :string = e
module ActivityHistory =
open EmailAddress
[<Struct>]
type ActivityHistory(emailAddress : EmailAddress, visits : int) =
member this.EmailAddress = emailAddress
member this.Visits = visits
// safe constructor
let create email visits =
ActivityHistory(email,visits)
// get data
let email (act:ActivityHistory) = act.EmailAddress
let visits (act:ActivityHistory) = act.Visits
This version reimplements the constructor and a getter for each field.
I could have made the field names for ActivityHistory
be the same in both cases too, but. in the struct case, type inference would not work.
By making them different, the user is forced to use the getter functions rather than dotting in.
Both implementations have the same “API”, so we can create code that works with both:
let rand = new System.Random()
let createCustomerWithRandomActivityHistory() =
let emailOpt = EmailAddress.create "abc@example.com"
match emailOpt with
| Some email ->
let visits = rand.Next(0,100)
ActivityHistory.create email visits
| None ->
failwith "should not happen"
let add1ToVisits activity =
let email = ActivityHistory.email activity
let visits = ActivityHistory.visits activity
ActivityHistory.create email (visits+1)
let isCustomerInactive activity =
let visits = ActivityHistory.visits activity
visits < 3
// execute creation and iteration for a large number of records
let mapAndFilter noOfRecords =
Array.init noOfRecords (fun _ -> createCustomerWithRandomActivity() )
// map it
|> Array.map add1ToVisits
// map it again
|> Array.map add1ToVisits
// filter it
|> Array.filter isCustomerInactive
|> ignore // we don't actually care!
Pros and cons of this approach
One nice thing about this approach is that it is self-correcting — it forces you to use the “API” properly.
For example, if I started accessing fields directly by dotting into
the ActivityHistory
record, then that code would break when the compiler directive was turned on and the struct implementation was used.
Of course, you could also create a signature file to enforce the API.
On the negative side, we do lose some of the nice syntax such as {rec with ...}
.
But you should really only be using this technique with small records (2-3 fields), so not having with
is not a big burden.
Timing the two implementations
Rather than using #time
, this time I wrote a custom timer that runs a function 10 times and prints out the GC and memory used on each run.
/// Do countN repetitions of the function f and print the
/// time elapsed, number of GCs and change in total memory
let time countN label f =
let stopwatch = System.Diagnostics.Stopwatch()
// do a full GC at the start but NOT thereafter
// allow garbage to collect for each iteration
System.GC.Collect()
printfn "Started"
let getGcStats() =
let gen0 = System.GC.CollectionCount(0)
let gen1 = System.GC.CollectionCount(1)
let gen2 = System.GC.CollectionCount(2)
let mem = System.GC.GetTotalMemory(false)
gen0,gen1,gen2,mem
printfn "======================="
printfn "%s (%s)" label WrappedOrAliased
printfn "======================="
for iteration in [1..countN] do
let gen0,gen1,gen2,mem = getGcStats()
stopwatch.Restart()
f()
stopwatch.Stop()
let gen0',gen1',gen2',mem' = getGcStats()
// convert memory used to K
let changeInMem = (mem'-mem) / 1000L
printfn "#%2i elapsed:%6ims gen0:%3i gen1:%3i gen2:%3i mem:%6iK" iteration stopwatch.ElapsedMilliseconds (gen0'-gen0) (gen1'-gen1) (gen2'-gen2) changeInMem
The code sample above is available on GitHub.
Let’s now run mapAndFilter
with a million records in the array:
let size = 1000000
let label = sprintf "mapAndFilter: %i records" size
time 10 label (fun () -> mapAndFilter size)
The results are shown below:
=======================
mapAndFilter: 1000000 records (Wrapped)
=======================
# 1 elapsed: 820ms gen0: 13 gen1: 8 gen2: 1 mem: 72159K
# 2 elapsed: 878ms gen0: 12 gen1: 7 gen2: 0 mem: 71997K
# 3 elapsed: 850ms gen0: 12 gen1: 6 gen2: 0 mem: 72005K
# 4 elapsed: 885ms gen0: 12 gen1: 7 gen2: 0 mem: 72000K
# 5 elapsed: 6690ms gen0: 16 gen1: 10 gen2: 1 mem:-216005K
# 6 elapsed: 714ms gen0: 12 gen1: 7 gen2: 0 mem: 72003K
# 7 elapsed: 668ms gen0: 12 gen1: 7 gen2: 0 mem: 71995K
# 8 elapsed: 670ms gen0: 12 gen1: 7 gen2: 0 mem: 72001K
# 9 elapsed: 6676ms gen0: 16 gen1: 11 gen2: 2 mem:-215998K
#10 elapsed: 712ms gen0: 13 gen1: 7 gen2: 0 mem: 71998K
=======================
mapAndFilter: 1000000 records (Aliased)
=======================
# 1 elapsed: 193ms gen0: 7 gen1: 0 gen2: 0 mem: 25325K
# 2 elapsed: 142ms gen0: 8 gen1: 0 gen2: 0 mem: 23779K
# 3 elapsed: 143ms gen0: 8 gen1: 0 gen2: 0 mem: 23761K
# 4 elapsed: 138ms gen0: 8 gen1: 0 gen2: 0 mem: 23745K
# 5 elapsed: 135ms gen0: 7 gen1: 0 gen2: 0 mem: 25327K
# 6 elapsed: 135ms gen0: 8 gen1: 0 gen2: 0 mem: 23762K
# 7 elapsed: 137ms gen0: 8 gen1: 0 gen2: 0 mem: 23755K
# 8 elapsed: 140ms gen0: 8 gen1: 0 gen2: 0 mem: 23777K
# 9 elapsed: 174ms gen0: 7 gen1: 0 gen2: 0 mem: 25327K
#10 elapsed: 180ms gen0: 8 gen1: 0 gen2: 0 mem: 23762K
Now this code no longer consists of only value types, so the profiling is getting muddier now!
The mapAndFilter
function uses createCustomerWithRandomActivity
which in turn uses Option
, a reference type,
so there will be a large number of reference types being allocated. Just as in real life, it’s hard to keep things pure!
Even so, you can see that the wrapped version is slower than the aliased version (approx 800ms vs. 150ms) and creates more garbage on each iteration (approx 72Mb vs 24Mb)
and most importantly has two big GC pauses (in the 5th and 9th iterations), while the aliased version never even does a gen1 GC, let alone a gen2.
NOTE: The fact that aliased version is using up memory and yet there are no gen1s makes me a bit suspicious of these figures. I think they might be different if run outside of
F# interactive.
What about non-record types?
What if the type we want to optimise is a discriminated union rather than a record?
My suggestion is to turn the DU into a struct with a tag for each case, and fields for all possible data.
For example, let’s say that we have DU that classifies an Activity
into Active
and Inactive
, and for the Active
case we store the email and visits and for
the inactive case we only store the email:
module Classification =
open EmailAddress
open ActivityHistory
type Classification =
| Active of EmailAddress * int
| Inactive of EmailAddress
// constructor
let createActive email visits =
Active (email,visits)
let createInactive email =
Inactive email
// pattern matching
// not needed
To turn this into a struct, I would do something like this:
module Classification =
open EmailAddress
open ActivityHistory
open System
[<Struct>]
type Classification(isActive : bool, email: EmailAddress, visits: int) =
member this.IsActive = isActive
member this.Email = email
member this.Visits = visits
// constructor
let inline createActive email visits =
Classification(true,email,visits)
let inline createInactive email =
Classification(false,email,0)
// pattern matching
let inline (|Active|Inactive|) (c:Classification) =
if c.IsActive then
Active (c.Email,c.Visits)
else
Inactive (c.Email)
Note that Visits
is not used in the Inactive
case, so is set to a default value.
Now let’s create a function that classifies the activity history, creates a Classification
and then filters and extracts the email only for active customers.
open Classification
let createClassifiedCustomer activity =
let email = ActivityHistory.email activity
let visits = ActivityHistory.visits activity
if isCustomerInactive activity then
Classification.createInactive email
else
Classification.createActive email visits
// execute creation and iteration for a large number of records
let extractActiveEmails noOfRecords =
Array.init noOfRecords (fun _ -> createCustomerWithRandomActivityHistory() )
// map to a classification
|> Array.map createClassifiedCustomer
// extract emails for active customers
|> Array.choose (function
| Active (email,visits) -> email |> Some
| Inactive _ -> None )
|> ignore
The code sample above is available on GitHub.
The results of profiling this function with the two different implementations are shown below:
=======================
extractActiveEmails: 1000000 records (Wrapped)
=======================
# 1 elapsed: 664ms gen0: 12 gen1: 6 gen2: 0 mem: 64542K
# 2 elapsed: 584ms gen0: 14 gen1: 7 gen2: 0 mem: 64590K
# 3 elapsed: 589ms gen0: 13 gen1: 7 gen2: 0 mem: 63616K
# 4 elapsed: 573ms gen0: 11 gen1: 5 gen2: 0 mem: 69438K
# 5 elapsed: 640ms gen0: 15 gen1: 7 gen2: 0 mem: 58464K
# 6 elapsed: 4297ms gen0: 13 gen1: 7 gen2: 1 mem:-256192K
# 7 elapsed: 593ms gen0: 14 gen1: 7 gen2: 0 mem: 64623K
# 8 elapsed: 621ms gen0: 13 gen1: 7 gen2: 0 mem: 63689K
# 9 elapsed: 577ms gen0: 11 gen1: 5 gen2: 0 mem: 69415K
#10 elapsed: 609ms gen0: 15 gen1: 7 gen2: 0 mem: 58480K
=======================
extractActiveEmails: 1000000 records (Aliased)
=======================
# 1 elapsed: 254ms gen0: 32 gen1: 1 gen2: 0 mem: 33162K
# 2 elapsed: 221ms gen0: 33 gen1: 0 gen2: 0 mem: 31532K
# 3 elapsed: 196ms gen0: 32 gen1: 0 gen2: 0 mem: 33113K
# 4 elapsed: 185ms gen0: 33 gen1: 0 gen2: 0 mem: 31523K
# 5 elapsed: 187ms gen0: 33 gen1: 0 gen2: 0 mem: 31532K
# 6 elapsed: 186ms gen0: 32 gen1: 0 gen2: 0 mem: 33095K
# 7 elapsed: 191ms gen0: 33 gen1: 0 gen2: 0 mem: 31514K
# 8 elapsed: 200ms gen0: 32 gen1: 0 gen2: 0 mem: 33096K
# 9 elapsed: 189ms gen0: 33 gen1: 0 gen2: 0 mem: 31531K
#10 elapsed: 3732ms gen0: 33 gen1: 1 gen2: 1 mem:-256432K
As before, the aliased/struct version is more performant, being faster and generating less garbage (although there was a GC pause at the end, oh dear).
Questions
Isn’t this a lot of work, creating two implementations?
Yes! I don’t think you should do this in general. This is just an experiment on my part.
I suggest that turning records and DUs into structs is a last resort, only done after you have eliminated all other bottlenecks first.
However, there may be a few special cases where speed and memory are critical, and then, perhaps, it might be worth doing something like this.
What are the downsides?
In addition to all the extra work and maintenance, you mean?
Well, because the types are essentially private, we do lose some of the nice syntax available when you have access to the internals of the type,
such as {rec with ...}
, but as I said, you should really only be using this technique with small records anyway.
More importantly, value types like structs are not a silver bullet. They have their own problems.
For example, they can be slower when passed as arguments (because of copy-by-value) and you must be careful not to box them implicitly,
otherwise you end up doing allocations and creating garbage. Microsoft has guidelines on using classes vs structs,
but see also this commentary on breaking these guidelines and these rules.
What about using shadowing?
Shadowing is used when the client wants to use a different implementation. For example, you can
switch from unchecked to checked arithmetic by opening the Checked module.
More details here.
But that would not work here — I don’t want each client to decide which version of the type they will use. That would lead to all sorts of incompatibility problems.
Also, it’s not a per-module decision, it’s a decision based on deployment context.
What about more performant collection types?
I am using array
everywhere as the collection type. If you want other high performing collections,
check out FSharpx.Collections or Funq collections.
You’ve mixed up allocations, mapping, filtering. What about a more fine-grained analysis?
I’m trying to keep some semblage of dignity after I said that micro-benchmarking was bad!
So, yes, I deliberately created a case with mixed usage and measured it as a whole rather than benchmarking each part separately.
Your usage scenarios will obviously be different, so I don’t think there’s any need to go deeper.
Also, I’m doing all my benchmarking in F# interactive. Compiled code with optimizations might have a completely different performance profile.
What other ways are there to increase performance?
Since F# is a .NET language, the performance tips for C# work for F# as well, standard stuff like:
- Make all I/O async. Use streaming IO over random access IO if possible. Batch up your requests.
- Check your algorithms. Anything worse than O(n log(n)) should be looked at.
- Don’t do things twice. Cache as needed.
- Keep things in the CPU cache by keeping objects in contiguous memory and avoiding too many deep reference (pointer) chains. Things that help with this are using arrays instead of lists, value types instead of reference types, etc.
- Avoid pressure on the garbage collector by minimizing allocations. Avoid creating long-lived objects that survive gen0 collections.
To be clear, I don’t claim to be an expert on .NET performance and garbage collection. In fact, if you see something wrong with this analysis, please let me know!
Here are some sources that helped me:
- The book Writing High-Performance .NET Code by Ben Watson.
- Martin Thompson has a great blog
on performance and some excellent videos, such as Top 10 Performance Folklore.
(Good summary here.) - Understanding Latency, a video by Gil Tene.
- Essential Truths Everyone Should Know about Performance in a Large Managed Codebase, a video by Dustin Cambell at Microsoft.
- For F# in particular:
- Yan Cui has some blog posts on records vs structs and memory layout.
- Jon Harrop has a number of good articles such as this one but some of it is behind a paywall.
- Video: High Performance F# in .NET and on the GPU with Jack Pappas. The sound is bad, but the slides and discussion are good!
- Resources for Math and Statistics on fsharp.org
Summary
“Keep it clean; keep it simple; aim to be elegant.”
— Martin Thompson
This was a little experiment to see if I could have my cake and eat it too. Domain modelling using lots of types, but with the ability to get performance when needed in an elegant way.
I think that this is quite a nice solution, but as I said earlier, this optimization (and uglification)
should only ever be needed for a small number of heavily used core types that are allocated many millions of times.
Finally, I have not used this approach myself in a large production system (I’ve never needed to),
so I would be interested in getting feedback from people in the trenches on what they do.
The code samples used in this post are available on GitHub.