Lazy evaluation is no friend of mutable state
Posted by Jonas Elfström Sat, 01 Jan 2011 16:08:00 GMT
A couple of days ago I accidentally landed on a page about implementing merge sort in C#. I thought it would be a nice exercise to try to implement that as a generic method and so I did.
I also wanted to learn more about the characteristics of IEnumerable so I used IEnumerable<T>
instead of List<T>
. That choice got me in trouble and I opted for help on Stack Overflow.
People said it was sorting correctly but Jon Skeet also asked if I tested it correctly and that I did not. I digged deeper into the problem and extended the question on SO. I had a hunch that it was the mutable state of List<T>
and the lazy evaluation of IEnumerable
that was the problem but I couldn't quite figure out how.
Along came Kobi and his answer finally made me understand why a .Sort()
of the list messed up the result of my sorting.
I then changed the implementation to be fully lazy evaluated and now it looks like this.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
public class MergeSort<T> { public IEnumerable<T> Sort(IEnumerable<T> arr) { if (arr.Count() <= 1) return arr; int middle = arr.Count() / 2; var left = arr.Take(middle); var right = arr.Skip(middle); return Merge(Sort(left), Sort(right)); } private static IEnumerable<T> Merge(IEnumerable<T> left, IEnumerable<T> right) { IEnumerable<T> arrSorted = Enumerable.Empty<T>(); while (left.Count() > 0 && right.Count() > 0) { if (Comparer<T>.Default.Compare(left.First(), right.First()) < 0) { arrSorted=arrSorted.Concat(left.Take(1)); left = left.Skip(1); } else { arrSorted=arrSorted.Concat(right.Take(1)); right = right.Skip(1); } } return arrSorted.Concat(left).Concat(right); } } |
Please be aware that this is but an exercise and not very efficient.
Now to the problems that you can encounter with the above and an explanation to the title of the post. Let's MergeSort a simple List<int>
followed by a call to the built-in .Sort()
on List<T>
.
1 2 3 4 5 6 |
var ints = new List<int> { 2, 3, 1 }; var mergeSortInt = new MergeSort<int>(); var sortedInts = mergeSortInt.Sort(ints); // sortedInts.ToList() is {1, 2, 3} ints.Sort(); // sortedInts.ToList() is {3, 1, 2} |
So what's going on here? As far as I can tell it's something like this. sortedInts
isn't evaluated until the first call to MoveNext()
(or Tolist()
or any of those). Before that it only has lazy pointers to the original enumerable values. ints.Sort()
messes up the beauty of lazy evaluation by changing the underlying data structure. It can do that because List<T>
is mutable (writeable).
But why {3, 1, 2} after ints.Sort()
? The original sequence was { 2, 3, 1} and that is what the MergeSort sorted, not by creating a new sequence but only by lazy pointers.
At first the MergeSort sorts like this
but after the source list has been sorted/changed it will do this
What can you do to stop this to happening to you? The boring way is to never use lazy evaluation but that is kind of hard if you happen to use LINQ (and you should, it's great). Another way is to use immutable data structures and that is how functional languages tackles this problem. In C# we have the ReadOnlyCollectionSort()
method.
A nice feature of the MergeSort above is that it has no problems with an immutable collection.
1 2 3 4 5 |
var rints = new ReadOnlyCollection<int>(ints); var sortedRints = mergeSortInt.Sort(rints); // sortedRints.ToList() is {1, 2, 3} ints.Sort(); // sortedRints.ToList() is {3, 1, 2} |
What?! How could an immutable collection get messed up like this? It turns out that ReadOnlyCollection<T>
is only a facade for mutable collections and that is what bit us here. You have to pass a copy of the list. Example follows.
1 2 3 |
var rints = new ReadOnlyCollection<int>(new List<int>(ints)); var sortedRints = mergeSortInt.Sort(rints); ints.Sort(); |
That also works for List<T>
.
1 2 3 4 5 6 |
var ints = new List<int> { 2, 3, 1 }; var mergeSortInt = new MergeSort<int>(); var sortedInts = mergeSortInt.Sort(new List<int>(ints)); // sortedInts.ToList() is {1, 2, 3} ints.Sort(); // sortedInts.ToList() is {1, 2, 3} |
Finally I would like to send a big thank you to Kobi for sorting out my problems with the original solution.
Today the .NET team released a preview of Immutable Collections. http://blogs.msdn.com/b/bclteam/archive/2012/12/18/preview-of-immutable-collections-released-on-nuget.aspx