Generating counterbalanced orders

Problem

You want to generate counterbalanced sequences for an experiment.

Solution

The function latinsquare() (defined below) can be used to generate Latin squares.

  1. latinsquare(4)
  2. #> [,1] [,2] [,3] [,4]
  3. #> [1,] 1 2 4 3
  4. #> [2,] 2 1 3 4
  5. #> [3,] 3 4 1 2
  6. #> [4,] 4 3 2 1
  7. # To generate 2 Latin squares of size 4 (in sequence)
  8. latinsquare(4, reps=2)
  9. #> [,1] [,2] [,3] [,4]
  10. #> [1,] 3 4 1 2
  11. #> [2,] 4 3 2 1
  12. #> [3,] 1 2 4 3
  13. #> [4,] 2 1 3 4
  14. #> [5,] 4 2 1 3
  15. #> [6,] 2 3 4 1
  16. #> [7,] 1 4 3 2
  17. #> [8,] 3 1 2 4
  18. # It is better to put the random seed in the function call, to make it repeatable
  19. # This will return the same sequence of two Latin squares every time
  20. latinsquare(4, reps=2, seed=5873)
  21. #> [,1] [,2] [,3] [,4]
  22. #> [1,] 1 4 2 3
  23. #> [2,] 4 1 3 2
  24. #> [3,] 2 3 4 1
  25. #> [4,] 3 2 1 4
  26. #> [5,] 3 2 4 1
  27. #> [6,] 1 4 2 3
  28. #> [7,] 4 3 1 2
  29. #> [8,] 2 1 3 4

There are 576 Latin squares of size 4. The latinsquare function will, in effect, randomly select n of these squares and return them in sequence. This is known as a replicated Latin square design.

Once you generate your Latin squares, it is a good idea to inspect them to make sure that there are not many duplicated sequences. This is not uncommon with smaller squares (3x3 or 4x4).

A function for generating Latin squares

This function generates Latin squares. It uses a somewhat brute-force algorithm to generate each square, which can sometimes fail because it runs out of available numbers to put in a given location. In such cases, it just tries again. There may be an elegant way out there to do it, but I am not aware of it.

  1. ## - len is the size of the latin square
  2. ## - reps is the number of repetitions - how many Latin squares to generate
  3. ## - seed is a random seed that can be used to generate repeatable sequences
  4. ## - returnstrings tells it to return a vector of char strings for each square,
  5. ## instead of a big matrix. This option is only really used for checking the
  6. ## randomness of the squares.
  7. latinsquare <- function(len, reps=1, seed=NA, returnstrings=FALSE) {
  8. # Save the old random seed and use the new one, if present
  9. if (!is.na(seed)) {
  10. if (exists(".Random.seed")) { saved.seed <- .Random.seed }
  11. else { saved.seed <- NA }
  12. set.seed(seed)
  13. }
  14. # This matrix will contain all the individual squares
  15. allsq <- matrix(nrow=reps*len, ncol=len)
  16. # Store a string id of each square if requested
  17. if (returnstrings) { squareid <- vector(mode = "character", length = reps) }
  18. # Get a random element from a vector (the built-in sample function annoyingly
  19. # has different behavior if there's only one element in x)
  20. sample1 <- function(x) {
  21. if (length(x)==1) { return(x) }
  22. else { return(sample(x,1)) }
  23. }
  24. # Generate each of n individual squares
  25. for (n in 1:reps) {
  26. # Generate an empty square
  27. sq <- matrix(nrow=len, ncol=len)
  28. # If we fill the square sequentially from top left, some latin squares
  29. # are more probable than others. So we have to do it random order,
  30. # all over the square.
  31. # The rough procedure is:
  32. # - randomly select a cell that is currently NA (call it the target cell)
  33. # - find all the NA cells sharing the same row or column as the target
  34. # - fill the target cell
  35. # - fill the other cells sharing the row/col
  36. # - If it ever is impossible to fill a cell because all the numbers
  37. # are already used, then quit and start over with a new square.
  38. # In short, it picks a random empty cell, fills it, then fills in the
  39. # other empty cells in the "cross" in random order. If we went totally randomly
  40. # (without the cross), the failure rate is much higher.
  41. while (any(is.na(sq))) {
  42. # Pick a random cell which is currently NA
  43. k <- sample1(which(is.na(sq)))
  44. i <- (k-1) %% len +1 # Get the row num
  45. j <- floor((k-1) / len) +1 # Get the col num
  46. # Find the other NA cells in the "cross" centered at i,j
  47. sqrow <- sq[i,]
  48. sqcol <- sq[,j]
  49. # A matrix of coordinates of all the NA cells in the cross
  50. openCell <-rbind( cbind(which(is.na(sqcol)), j),
  51. cbind(i, which(is.na(sqrow))))
  52. # Randomize fill order
  53. openCell <- openCell[sample(nrow(openCell)),]
  54. # Put center cell at top of list, so that it gets filled first
  55. openCell <- rbind(c(i,j), openCell)
  56. # There will now be three entries for the center cell, so remove duplicated entries
  57. # Need to make sure it's a matrix -- otherwise, if there's just
  58. # one row, it turns into a vector, which causes problems
  59. openCell <- matrix(openCell[!duplicated(openCell),], ncol=2)
  60. # Fill in the center of the cross, then the other open spaces in the cross
  61. for (c in 1:nrow(openCell)) {
  62. # The current cell to fill
  63. ci <- openCell[c,1]
  64. cj <- openCell[c,2]
  65. # Get the numbers that are unused in the "cross" centered on i,j
  66. freeNum <- which(!(1:len %in% c(sq[ci,], sq[,cj])))
  67. # Fill in this location on the square
  68. if (length(freeNum)>0) { sq[ci,cj] <- sample1(freeNum) }
  69. else {
  70. # Failed attempt - no available numbers
  71. # Re-generate empty square
  72. sq <- matrix(nrow=len, ncol=len)
  73. # Break out of loop
  74. break;
  75. }
  76. }
  77. }
  78. # Store the individual square into the matrix containing all squares
  79. allsqrows <- ((n-1)*len) + 1:len
  80. allsq[allsqrows,] <- sq
  81. # Store a string representation of the square if requested. Each unique
  82. # square has a unique string.
  83. if (returnstrings) { squareid[n] <- paste(sq, collapse="") }
  84. }
  85. # Restore the old random seed, if present
  86. if (!is.na(seed) && !is.na(saved.seed)) { .Random.seed <- saved.seed }
  87. if (returnstrings) { return(squareid) }
  88. else { return(allsq) }
  89. }

Testing the function for randomness

Some algorithms for generating Latin squares may not create them very randomly. There are 576 unique 4x4 squares, and each one should have an equal probability of being created, but some algorithms do not properly do this. There is probably no need to test the randomness of the function above, but here is some code that will do it. Previous algorithms that I used did not have a good random distribution, which was discovered by running the code below.

This code creates 10,000 4x4 Latin squares, then counts how often each of the 576 unique squares appears. The counts should form a not-too-wide normal distribution; otherwise the distribution not very random. I believe the expected standard deviation of the distribution (assuming randomly generated squares) is sqrt(10000/576).

  1. # Set up the size and number of squares to generate
  2. squaresize <- 4
  3. numsquares <- 10000
  4. # Get number of unique squares of a given size.
  5. # There is not a general solution to finding the number of unique nxn squares
  6. # so we just hard-code the values here. (From http://oeis.org/A002860)
  7. uniquesquares <- c(1, 2, 12, 576, 161280, 812851200)[squaresize]
  8. # Generate the squares
  9. s <- latinsquare(squaresize, numsquares, seed=122, returnstrings=TRUE)
  10. # Get the list of all squares and counts for each
  11. slist <- rle(sort(s))
  12. scounts <- slist[[1]]
  13. hist(scounts, breaks=(min(scounts):(max(scounts)+1)-.5))
  14. cat(sprintf("Expected and actual standard deviation: %.4f, %.4f\n",
  15. sqrt(numsquares/uniquesquares), sd(scounts) ))
  16. #> Expected and actual standard deviation: 4.1667, 4.0883

plot of chunk unnamed-chunk-3