David Dobor
The Good, the Bad, and the OK Shuffles
Introduction
A few years ago, the Software Security Group at Reliable Software Technologies discovered a major security flaw in the implementation of card games distributed by ASF Software Inc., a company that no longer exists. At least three major online poker firms that allow their clients to compete for real money, including planetpoker.com, had used ASF's poker implementation and found themselves vulnerable to a variety of exploits. The problem was in the shuffling of a deck of cards, and in the random number generator used to shuffle it. That generator wasn't random at all. It was perfectly predictable.
The piece of code that performed the shuffle included a call to a random number generator that chose a seed based on the number of milliseconds since midnight. There are only about 86 and a half million milliseconds in a day, a number alarmingly small compared to the $52!$ possible shuffles of a deck of cards. With only about 86 million shuffles of the deck to consider, it became possible for an attacker to determine exactly, in real time, during play, the unexposed cards in the hands of the opponents. An interesting account of the consequences of this can be found in this article .
But today's blog isn't about software security. It is about a simple shuffling algorithm that long predates computers. Sometimes called Knuth's shuffle, this algorithm is very efficient (its running time is linear in the size of the deck to be shuffled). It is also easy to implement, as we'll see in a minute, but just as easy to implement incorrectly without suspecting anything awry. In fact, it was an incorrect implementation of a simple shuffling algorithm that led to ASF's demise.
A Shuffle
Programmers usually first see shuffling algorithms in the context of sorting. It may sound counter intuitive, but sorting is actually one of the better ways of shuffling.
A shuffle is a uniform random permutation of array elements. Consider a computer program that takes as input an array of size $n$ and outputs one of the $n!$ permutations of the array elements. If each output has an equal probability of $1/n!$ of being output, then we'll call the outputs fair shuffles , or uniform random shuffles, and the program a fair shuffler.
Most computer languages allow for a quick and easy way of calling a (pseudo) random number generator that returns draws from the uniform distribution over [0, 1] ($U(0,1)$). So here's one way to shuffle an array of size $n$:
- for every array entry, generate a random number from $U(0,1)$
- sort the array using those random numbers as keys.
This is an effective shuffle. It is easy to show that it produces a uniform random permutation of the input (assuming no duplicate values).
But this shuffle requires a sort, and a sort seems like a lot of work for this problem. Sorting takes $O(nlog(n))$ time. Do we really have to pay that much in time in order to shuffle an array of $n$ elements?
We can do better than this! Here's how to do it in linear time.
A Better Shuffle and a Bad Shuffle
Here is a simple way of rearranging an array a so that the result is a uniform random permutation of its elements.
Let i be the loop variable that runs through the deck. In one pass through the deck, do:
- in iteration i, pick integer r between 0 and i uniformly at random
- swap a[i] and a[r]
Thus we are always swapping the $i^{th}$ card with a randomly selected card to its left, or itself. (Note: swapping the $i^{th}$ card with a randomly selected unseen card to its right would also work.)
It was shown well before the advent of computers [Fisher-Yates 1938] that this algorithm produces a uniformly random shuffle in linear time.
For comparison consider the following wrong implementation of shuffling by ASF. This algorithm is not a fair shuffler. However, ironically, it was publicly advertised by ASF as the fair way to shuffle a deck of cards.
Again, let i be the loop variable that runs through the deck. In one pass through the deck, do:
- in iteration i, pick integer r between 0 and N-1 uniformly at random
- swap a[i] and a[r]
To see that this second shuffle does not produce a uniform shuffle, consider shuffling an array of just 3 elements. Build a decision tree of depth $3$ by tracing the steps of the algorithm. Each depth level in this tree corresponds to the value of the loop variable $i$ - the $i^{th}$ card currently being considered. At each node at depth $i$ the decision is: which of the $3$ randomly selected cards should card $i$ be swapped with? Thus each node, except for the leaves, has $3$ branches leaving it. Leaves correspond to permutations of the 3 elements, the shuffles. Notice that not all permutations occur with the same frequency and therefore the algorithm does not produce a fair shuffle.
Anyway, here's a picture of the incorrect shuffle of a 3-element array. Notice that deck $231$ occurs more often than deck $123$. These uneven probabilities become even more exaggerated as the size of the array increases.
Java Implementation
To see all this in action, I've created a java class called FairShuffler. It has a static method shuffle() that implements the correct shuffling algorithm discussed above. To run it, we'll need a separate text file that represents the deck of cards. You will find a link to one such file in the comments of the code that follows. Instructions on how to compile and run the code are also in the comments:
/************************************************************************* * Compilation: javac FairShuffler.java * Execution: java FairShuffler * Data files: https://raw2.github.com/david-dobor/algs/master/algs4/shuffle/cards.txt * * Reads in a list of strings and prints them in random order. * The Knuth (or Fisher-Yates) shuffling algorithm guarantees * to rearrange the elements in uniformly random order, under * the assumption that Math.random() generates independent and * uniformly distributed numbers between 0 and 1. * * the cards.txt file is assumed to be stored locally * % more cards.txt * 2C 3C 4C 5C 6C 7C 8C 9C 10C JC QC KC AC * 2D 3D 4D 5D 6D 7D 8D 9D 10D JD QD KD AD * 2H 3H 4H 5H 6H 7H 8H 9H 10H JH QH KH AH * 2S 3S 4S 5S 6S 7S 8S 9S 10S JS QS KS AS * * % java FairShuffler * 8H AS JC 5S * 6C 8S 6D 2S * 2H 6H 10H 9D * 7C KC 3D 5C * ... * QH 10C 3S 3H * *************************************************************************/ import java.io.File; import java.io.IOException; import java.util.Scanner; public class FairShuffler { public static void shuffle(Object[] a) { int N = a.length; for (int i = 0; i < N; i++) { // choose index uniformly in [i, N-1] int r = i + (int) (Math.random() * (N - i)); Object swap = a[r]; a[r] = a[i]; a[i] = swap; } } public static void main(String[] args) { // read in the data String[] deck = read("cards.txt"); FairShuffler.shuffle(deck); // print permutation for (int i = 0; i < deck.length; i++) System.out.printf("%-4s%s", deck[i], ((i + 1) % 4 == 0) ? "\n" : ""); } //read in a local text file containing the cards //return the cards as a String[] array public static String[] read(String fileName) { Scanner scanner = null; try { // read local file File file = new File(fileName); if (file.exists()) { scanner = new Scanner(file); } } catch (IOException ioe) { System.err.println("Could not open " + fileName); } if (!scanner.hasNextLine()) { return null; } // reference: http://weblogs.java.net/blog/pat/archive/2004/10/stupid_scanner_1.html //return scanner.useDelimiter("\\A").next(); String[] cards = scanner.useDelimiter("\\A").next().trim().split("\\s+"); return cards; } }
Empirical Tests
Exercise: run an empirical test to check that this algorithm works as described (I mentioned that the math proof was easy but never gave it). Write a program called ShuffleTest.java that takes command-line arguments M and N, does N shuffles of an array of size M that is initialized with a[i] = i before each shuffle, and prints an M-by-M table such that each row i gives the number of times i wound up in position j for all j. All entries in this array should be close to M/N.
Solution: here's my take on it:
public class ShuffleTest { public static void main(String[] args) { int M = 7; //number of elements in array (deck size) int N = 700000; //number of times to shuffle the deck //for all shuffles,count the number of times element i ends up in position j int[][] count = new int[M][M]; for (int n = 0; n < N; n++) { Integer[] deck = makeDeck(M); FairShuffler.shuffle(deck); for (int i = 0; i < M; i++) { for (int j = 0; j < M; j++) { if (deck[i] == j) count[i][j]++; } } } //print the counts for (int i = 0; i < M; i++) { for (int j = 0; j < M; j++) { System.out.printf("%-7d ", count[i][j]); } System.out.println(); } } //initialize and return a deck of size M public static Integer[] makeDeck(int M) { Integer[] deck = new Integer[M]; for (int i = 0; i < M; i++) { deck[i] = i; } return deck; } }
Final Word
Finally, here's the recommended approach for shuffling a deck of cards if you are in the business of online poker or something like that:
- get a cryptographically secure pseudo-random number generator,
- assign a random 64-bit number to each card,
- sort the cards according to their number.
That's it for today. And you thought shuffling a deck of cards was easy.