jsPerf.com

While Benchmark.js is useful for testing the performance of your code in whatever JS environment you’re running, it cannot be stressed enough that you need to compile test results from lots of different environments (desktop browsers, mobile devices, etc.) if you want to have any hope of reliable test conclusions.

For example, Chrome on a high-end desktop machine is not likely to perform anywhere near the same as Chrome mobile on a smartphone. And a smartphone with a full battery charge is not likely to perform anywhere near the same as a smartphone with 2% battery life left, when the device is starting to power down the radio and processor.

If you want to make assertions like “X is faster than Y” in any reasonable sense across more than just a single environment, you’re going to need to actually test as many of those real world environments as possible. Just because Chrome executes some X operation faster than Y doesn’t mean that all browsers do. And of course you also probably will want to cross-reference the results of multiple browser test runs with the demographics of your users.

There’s an awesome website for this purpose called jsPerf (http://jsperf.com). It uses the Benchmark.js library we talked about earlier to run statistically accurate and reliable tests, and makes the test on an openly available URL that you can pass around to others.

Each time a test is run, the results are collected and persisted with the test, and the cumulative test results are graphed on the page for anyone to see.

When creating a test on the site, you start out with two test cases to fill in, but you can add as many as you need. You also have the ability to set up setup code that is run at the beginning of each test cycle and teardown code run at the end of each cycle.

Note: A trick for doing just one test case (if you’re benchmarking a single approach instead of a head-to-head) is to fill in the second test input boxes with placeholder text on first creation, then edit the test and leave the second test blank, which will delete it. You can always add more test cases later.

You can define the initial page setup (importing libraries, defining utility helper functions, declaring variables, etc.). There are also options for defining setup and teardown behavior if needed — consult the “Setup/Teardown” section in the Benchmark.js discussion earlier.

Sanity Check

jsPerf is a fantastic resource, but there’s an awful lot of tests published that when you analyze them are quite flawed or bogus, for any of a variety of reasons as outlined so far in this chapter.

Consider:

  1. // Case 1
  2. var x = [];
  3. for (var i=0; i<10; i++) {
  4. x[i] = "x";
  5. }
  6. // Case 2
  7. var x = [];
  8. for (var i=0; i<10; i++) {
  9. x[x.length] = "x";
  10. }
  11. // Case 3
  12. var x = [];
  13. for (var i=0; i<10; i++) {
  14. x.push( "x" );
  15. }

Some observations to ponder about this test scenario:

  • It’s extremely common for devs to put their own loops into test cases, and they forget that Benchmark.js already does all the repetition you need. There’s a really strong chance that the for loops in these cases are totally unnecessary noise.
  • The declaring and initializing of x is included in each test case, possibly unnecessarily. Recall from earlier that if x = [] were in the setup code, it wouldn’t actually be run before each test iteration, but instead once at the beginning of each cycle. That means x would continue growing quite large, not just the size 10 implied by the for loops.

    So is the intent to make sure the tests are constrained only to how the JS engine behaves with very small arrays (size 10)? That could be the intent, but if it is, you have to consider if that’s not focusing far too much on nuanced internal implementation details.

    On the other hand, does the intent of the test embrace the context that the arrays will actually be growing quite large? Is the JS engines’ behavior with larger arrays relevant and accurate when compared with the intended real world usage?

  • Is the intent to find out how much x.length or x.push(..) add to the performance of the operation to append to the x array? OK, that might be a valid thing to test. But then again, push(..) is a function call, so of course it’s going to be slower than [..] access. Arguably, cases 1 and 2 are fairer than case 3.

Here’s another example that illustrates a common apples-to-oranges flaw:

  1. // Case 1
  2. var x = ["John","Albert","Sue","Frank","Bob"];
  3. x.sort();
  4. // Case 2
  5. var x = ["John","Albert","Sue","Frank","Bob"];
  6. x.sort( function mySort(a,b){
  7. if (a < b) return -1;
  8. if (a > b) return 1;
  9. return 0;
  10. } );

Here, the obvious intent is to find out how much slower the custom mySort(..) comparator is than the built-in default comparator. But by specifying the function mySort(..) as inline function expression, you’ve created an unfair/bogus test. Here, the second case is not only testing a custom user JS function, but it’s also testing creating a new function expression for each iteration.

Would it surprise you to find out that if you run a similar test but update it to isolate only for creating an inline function expression versus using a pre-declared function, the inline function expression creation can be from 2% to 20% slower!?

Unless your intent with this test is to consider the inline function expression creation “cost,” a better/fairer test would put mySort(..)‘s declaration in the page setup — don’t put it in the test setup as that’s unnecessary redeclaration for each cycle — and simply reference it by name in the test case: x.sort(mySort).

Building on the previous example, another pitfall is in opaquely avoiding or adding “extra work” to one test case that creates an apples-to-oranges scenario:

  1. // Case 1
  2. var x = [12,-14,0,3,18,0,2.9];
  3. x.sort();
  4. // Case 2
  5. var x = [12,-14,0,3,18,0,2.9];
  6. x.sort( function mySort(a,b){
  7. return a - b;
  8. } );

Setting aside the previously mentioned inline function expression pitfall, the second case’s mySort(..) works in this case because you have provided it numbers, but would have of course failed with strings. The first case doesn’t throw an error, but it actually behaves differently and has a different outcome! It should be obvious, but: a different outcome between two test cases almost certainly invalidates the entire test!

But beyond the different outcomes, in this case, the built in sort(..)‘s comparator is actually doing “extra work” that mySort() does not, in that the built-in one coerces the compared values to strings and does lexicographic comparison. The first snippet results in [-14, 0, 0, 12, 18, 2.9, 3] while the second snippet results (likely more accurately based on intent) in [-14, 0, 0, 2.9, 3, 12, 18].

So that test is unfair because it’s not actually doing the same task between the cases. Any results you get are bogus.

These same pitfalls can even be much more subtle:

  1. // Case 1
  2. var x = false;
  3. var y = x ? 1 : 2;
  4. // Case 2
  5. var x;
  6. var y = x ? 1 : 2;

Here, the intent might be to test the performance impact of the coercion to a Boolean that the ? : operator will do if the x expression is not already a Boolean (see the Types & Grammar title of this book series). So, you’re apparently OK with the fact that there is extra work to do the coercion in the second case.

The subtle problem? You’re setting x‘s value in the first case and not setting it in the other, so you’re actually doing work in the first case that you’re not doing in the second. To eliminate any potential (albeit minor) skew, try:

  1. // Case 1
  2. var x = false;
  3. var y = x ? 1 : 2;
  4. // Case 2
  5. var x = undefined;
  6. var y = x ? 1 : 2;

Now there’s an assignment in both cases, so the thing you want to test — the coercion of x or not — has likely been more accurately isolated and tested.