I wrote on how to shuffle an array in Java nearly four years ago. Since then, it has come to my attention that the algorithms presented in that original post are wrong and definitely incomplete. Commenter Ken Arnold pointed this out and I was very curious to see some data on the variability of randomization using the original algorithm, his modified version and another powered my the built in Collections
shuffle method in Java.
This all pertains to the randomness of cards and specifically the distribution of cards when shuffled from a fresh deck. Following that, any state after that initial shuffle is harder to predict but it may still be predictable. You can view the original algorithm defined in the original post from Leepoint. However, while this example is clearly defined in the context of cards, it is not strictly deck-card based – if you need to randomize any set, using that original algorithm probably will not yield the results you are looking for. You will see why soon.
The modified version runs along these lines.
public static int[] getOriginalShuffledDeck() { Random rnd = new Random(); int[] cards = new int[52]; for (int i = 0; i < cards.length; i++) { cards[i] = i; } for (int i = 0; i < cards.length; i++) { int position = i + rnd.nextInt(cards.length - i); int temp = cards[i]; cards[i] = cards[position]; cards[position] = temp; } return cards; }
Aside from the minor stylistic differences, the major logical difference is the in the declaration of the position
. The i + rnd.nextInt(cards.length - i)
changes the position properly, so that elements are not repeatedly moved around the deck of cards. I am not entirely sure of the logic here – an example chart could be useful, but I’ll leave that to you.
The other algorithm I tested against was the built-in Collections.shuffle
randomization. It appears that this method is consistent with the modified method above, which is good to know. Because there is a lack of shuffling for a regular non-object array, there is a cumbersome conversion between Integer objects and an ArrayList and then back to regular ints
.
public static int[] getCollectionsShuffledDeck() { int[] cards = new int[52]; ArrayList<Integer> cards_objs = new ArrayList<Integer>(); for (int i=0; i < cards.length; i++) { cards_objs.add(i); } Collections.shuffle(cards_objs); for (int i = 0; i < cards.length; i++) { cards[i] = cards_objs.get(i); } return cards; }
With those three methods, I ran a test with each. I ran one million iterations of randomizing fresh decks of cards. I recorded each position (position 0 to 51) and each card in those positions (card 0 to 51, because we don’t actually care which card, just that they’re all uniquely identifiable) and the number of times they appeared in those million trials. So without further ado, look at the astoundingly poor randomness generated by the original algorithm.
The spikes are where there is an obvious failure. I am definitely not a statistician but I can guarantee this is not a great distribution of what is supposed to be random values, especially when the number of trials is so high. With that said, I think it is quite clear that the original method of randomizing a deck of cards is flawed. Now, that’s not to say it is not random, since there is some randomness there, but just not enough. Now, on to the modified method’s chart.
The distribution here is much better. It just so happens that one million divided by 52 is approximately 19230, which is what most of these position-card indexes reach. Clearly, the modified version is better in this case.
The Collections.shuffle
random was about the same. This is a great benefit of using the built in routine because you know that this way, it will always be random with that great low variance. That’s not to say there aren’t highs and lows in these two data sets, but it is lesser than the original. I want to be clear though: this may seem at first limited to strictly a card-deck problem but this could present itself anywhere that something starts ordered and needs to be non-ordered after some operation. You may need to randomize your genome project, your particle layout or maybe even your favorite file cabinet.
You can grab the testing program and the excel file for the raw data used to make the charts.
Update: You can read an interesting Stack Overflow question on, “Is Collection.shuffle random enough?“