Alice, Bob, and Mallory: Lazy evaluation is no friend of mutable statemetasyntacticstag:alicebobandmallory.com,2005:TypoTypo2011-01-03T01:08:10+01:00Jonas Elfströmurn:uuid:a704f3a7-860c-4e49-95e0-62c194fe075e2011-01-01T17:08:00+01:002011-01-03T01:08:10+01:00Lazy evaluation is no friend of mutable state<p>A couple of days ago I accidentally landed on <a href="http://www.cs.nyu.edu/~vs667/articles/mergesort/">a page</a> about implementing <a href="http://en.wikipedia.org/wiki/Merge_sort">merge sort</a> in C#. I thought it would be a nice exercise to try to <a href="http://stackoverflow.com/questions/4545090/listt-and-ienumerable-difference">implement that</a> as a generic method and so I did. <a href="http://www.sorting-algorithms.com/merge-sort"><img title="merge sort" src="http://www.sorting-algorithms.com/animation/40/random-initial-order/merge-sort.gif" style="display: inline-block; float:right" border="0"></a> </p>
<p>I also wanted to learn more about the characteristics of <a href="http://msdn.microsoft.com/en-us/library/9eekhta0.aspx">IEnumerable</a> so I used <code>IEnumerable<T></code> instead of <code>List<T></code>. That choice got me in trouble and I opted for help on <a href="http://www.stackoverflow.com/">Stack Overflow</a>.</p>
<p>People said it was sorting correctly but <a href="http://stackoverflow.com/questions/4545090/listt-and-ienumerable-difference/4545170#4545170">Jon Skeet also asked if I tested it correctly</a> and that I did not. I digged deeper into the problem and extended the question on SO. I had a hunch that it was the mutable state of <code>List<T></code> and the lazy evaluation of <code>IEnumerable</code> that was the problem but I couldn't quite figure out how. </p>
<p>Along came <a href="http://stackoverflow.com/users/7586/kobi">Kobi</a> and <a href="http://stackoverflow.com/questions/4545090/listt-and-ienumerable-difference/4565811#4565811">his answer</a> finally made me understand why a <code>.Sort()</code> of the list messed up the result of my sorting. </p>
<p>I then changed the implementation to be fully lazy evaluated and now it looks like this.</p>
<table class="CodeRay"><tr>
<td class="line-numbers" title="double click to toggle" ondblclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre><a href="#n1" name="n1">1</a>
<a href="#n2" name="n2">2</a>
<a href="#n3" name="n3">3</a>
<a href="#n4" name="n4">4</a>
<a href="#n5" name="n5">5</a>
<a href="#n6" name="n6">6</a>
<a href="#n7" name="n7">7</a>
<a href="#n8" name="n8">8</a>
<a href="#n9" name="n9">9</a>
<strong><a href="#n10" name="n10">10</a></strong>
<a href="#n11" name="n11">11</a>
<a href="#n12" name="n12">12</a>
<a href="#n13" name="n13">13</a>
<a href="#n14" name="n14">14</a>
<a href="#n15" name="n15">15</a>
<a href="#n16" name="n16">16</a>
<a href="#n17" name="n17">17</a>
<a href="#n18" name="n18">18</a>
<a href="#n19" name="n19">19</a>
<strong><a href="#n20" name="n20">20</a></strong>
<a href="#n21" name="n21">21</a>
<a href="#n22" name="n22">22</a>
<a href="#n23" name="n23">23</a>
<a href="#n24" name="n24">24</a>
<a href="#n25" name="n25">25</a>
<a href="#n26" name="n26">26</a>
<a href="#n27" name="n27">27</a>
<a href="#n28" name="n28">28</a>
<a href="#n29" name="n29">29</a>
<strong><a href="#n30" name="n30">30</a></strong>
<a href="#n31" name="n31">31</a>
<a href="#n32" name="n32">32</a>
<a href="#n33" name="n33">33</a>
</pre></td>
<td class="code"><pre>public class MergeSort<T>
{
public IEnumerable<T> Sort(IEnumerable<T> arr)
{
<span style="color:#080;font-weight:bold">if</span> (arr.Count() <= <span style="color:#00D">1</span>) <span style="color:#080;font-weight:bold">return</span> arr;
<span style="color:#0a5;font-weight:bold">int</span> middle = arr.Count() / <span style="color:#00D">2</span>;
var left = arr.Take(middle);
var right = arr.Skip(middle);
<span style="color:#080;font-weight:bold">return</span> Merge(Sort(left), Sort(right));
}
private <span style="color:#088;font-weight:bold">static</span> IEnumerable<T> Merge(IEnumerable<T> left, IEnumerable<T> right)
{
IEnumerable<T> arrSorted = Enumerable.Empty<T>();
<span style="color:#080;font-weight:bold">while</span> (left.Count() > <span style="color:#00D">0</span> && right.Count() > <span style="color:#00D">0</span>)
{
<span style="color:#080;font-weight:bold">if</span> (Comparer<T>.Default.Compare(left.First(), right.First()) < <span style="color:#00D">0</span>)
{
arrSorted=arrSorted.Concat(left.Take(<span style="color:#00D">1</span>));
left = left.Skip(<span style="color:#00D">1</span>);
}
<span style="color:#080;font-weight:bold">else</span>
{
arrSorted=arrSorted.Concat(right.Take(<span style="color:#00D">1</span>));
right = right.Skip(<span style="color:#00D">1</span>);
}
}
<span style="color:#080;font-weight:bold">return</span> arrSorted.Concat(left).Concat(right);
}
}</pre></td>
</tr></table>
<p><br>
Please be aware that this is but an exercise and not very efficient.</p>
<p>Now to the problems that you can encounter with the above and an explanation to the title of the post. Let's MergeSort a simple <code>List<int></code> followed by a call to the built-in <code>.Sort()</code> on <code>List<T></code>.</p>
<table class="CodeRay"><tr>
<td class="line-numbers" title="double click to toggle" ondblclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre><a href="#n1" name="n1">1</a>
<a href="#n2" name="n2">2</a>
<a href="#n3" name="n3">3</a>
<a href="#n4" name="n4">4</a>
<a href="#n5" name="n5">5</a>
<a href="#n6" name="n6">6</a>
</pre></td>
<td class="code"><pre>var ints = new List<<span style="color:#0a5;font-weight:bold">int</span>> { <span style="color:#00D">2</span>, <span style="color:#00D">3</span>, <span style="color:#00D">1</span> };
var mergeSortInt = new MergeSort<<span style="color:#0a5;font-weight:bold">int</span>>();
var sortedInts = mergeSortInt.Sort(ints);
<span style="color:#777">// sortedInts.ToList() is {1, 2, 3}</span>
ints.Sort();
<span style="color:#777">// sortedInts.ToList() is {3, 1, 2}</span></pre></td>
</tr></table>
<p><br>
So what's going on here? As far as I can tell it's something like this. <code>sortedInts</code> isn't evaluated until the first call to <code>MoveNext()</code> (or <code>Tolist()</code> or any of those). Before that it only has lazy pointers to the original enumerable values. <code>ints.Sort()</code> messes up the beauty of lazy evaluation by changing the underlying data structure. It can do that because <code>List<T></code> is mutable (writeable).</p>
<p>But why {3, 1, 2} after <code>ints.Sort()</code>? The original sequence was { 2, 3, 1} and that is what the MergeSort sorted, not by creating a new sequence but only by lazy pointers. </p>
<p>At first the MergeSort sorts like this </p>
<p><img title="diagram before .Sort()" src="http://dl.dropbox.com/u/26840/beforeSort.png" border="0"/></p>
<p>but after the source list has been sorted/changed it will do this
<img title="diagram after .Sort()" src="http://dl.dropbox.com/u/26840/afterSort.png" border="0"/></p>
<p>What can you do to stop this to happening to you? The boring way is to never use lazy evaluation but that is kind of hard if you happen to use <a href="http://en.wikipedia.org/wiki/Language_Integrated_Query">LINQ</a> (and you should, it's great). Another way is to use immutable data structures and that is how functional languages tackles this problem. In C# we have the <a href="http://msdn.microsoft.com/en-us/library/ms132474.aspx">ReadOnlyCollection<T></a> and it obviously has no <code>Sort()</code> method. </p>
<p>A nice feature of the MergeSort above is that it has no problems with an immutable collection.</p>
<table class="CodeRay"><tr>
<td class="line-numbers" title="double click to toggle" ondblclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre><a href="#n1" name="n1">1</a>
<a href="#n2" name="n2">2</a>
<a href="#n3" name="n3">3</a>
<a href="#n4" name="n4">4</a>
<a href="#n5" name="n5">5</a>
</pre></td>
<td class="code"><pre>var rints = new ReadOnlyCollection<<span style="color:#0a5;font-weight:bold">int</span>>(ints);
var sortedRints = mergeSortInt.Sort(rints);
<span style="color:#777">// sortedRints.ToList() is {1, 2, 3}</span>
ints.Sort();
<span style="color:#777">// sortedRints.ToList() is {3, 1, 2}</span></pre></td>
</tr></table>
<p><br>
What?! How could an immutable collection get messed up like this? It turns out that <code>ReadOnlyCollection<T></code> is only <a href="http://blogs.msdn.com/b/jaredpar/archive/2008/04/22/api-design-readonlycollection-t.aspx">a facade for mutable collections</a> and that is what bit us here. You have to pass a copy of the list. Example follows.</p>
<table class="CodeRay"><tr>
<td class="line-numbers" title="double click to toggle" ondblclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre><a href="#n1" name="n1">1</a>
<a href="#n2" name="n2">2</a>
<a href="#n3" name="n3">3</a>
</pre></td>
<td class="code"><pre>var rints = new ReadOnlyCollection<<span style="color:#0a5;font-weight:bold">int</span>>(new List<<span style="color:#0a5;font-weight:bold">int</span>>(ints));
var sortedRints = mergeSortInt.Sort(rints);
ints.Sort();</pre></td>
</tr></table>
<p><br>
That also works for <code>List<T></code>.</p>
<table class="CodeRay"><tr>
<td class="line-numbers" title="double click to toggle" ondblclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre><a href="#n1" name="n1">1</a>
<a href="#n2" name="n2">2</a>
<a href="#n3" name="n3">3</a>
<a href="#n4" name="n4">4</a>
<a href="#n5" name="n5">5</a>
<a href="#n6" name="n6">6</a>
</pre></td>
<td class="code"><pre>var ints = new List<<span style="color:#0a5;font-weight:bold">int</span>> { <span style="color:#00D">2</span>, <span style="color:#00D">3</span>, <span style="color:#00D">1</span> };
var mergeSortInt = new MergeSort<<span style="color:#0a5;font-weight:bold">int</span>>();
var sortedInts = mergeSortInt.Sort(new List<<span style="color:#0a5;font-weight:bold">int</span>>(ints));
<span style="color:#777">// sortedInts.ToList() is {1, 2, 3}</span>
ints.Sort();
<span style="color:#777">// sortedInts.ToList() is {1, 2, 3}</span></pre></td>
</tr></table>
<p><br></p>
<p>Finally I would like to send a big thank you to <a href="https://twitter.com/Kobi">Kobi</a> for sorting out my problems with the original solution.</p><p>A couple of days ago I accidentally landed on <a href="http://www.cs.nyu.edu/~vs667/articles/mergesort/">a page</a> about implementing <a href="http://en.wikipedia.org/wiki/Merge_sort">merge sort</a> in C#. I thought it would be a nice exercise to try to <a href="http://stackoverflow.com/questions/4545090/listt-and-ienumerable-difference">implement that</a> as a generic method and so I did. <a href="http://www.sorting-algorithms.com/merge-sort"><img title="merge sort" src="http://www.sorting-algorithms.com/animation/40/random-initial-order/merge-sort.gif" style="display: inline-block; float:right" border="0"></a> </p>
<p>I also wanted to learn more about the characteristics of <a href="http://msdn.microsoft.com/en-us/library/9eekhta0.aspx">IEnumerable</a> so I used <code>IEnumerable<T></code> instead of <code>List<T></code>. That choice got me in trouble and I opted for help on <a href="http://www.stackoverflow.com/">Stack Overflow</a>.</p>
<p>People said it was sorting correctly but <a href="http://stackoverflow.com/questions/4545090/listt-and-ienumerable-difference/4545170#4545170">Jon Skeet also asked if I tested it correctly</a> and that I did not. I digged deeper into the problem and extended the question on SO. I had a hunch that it was the mutable state of <code>List<T></code> and the lazy evaluation of <code>IEnumerable</code> that was the problem but I couldn't quite figure out how. </p>
<p>Along came <a href="http://stackoverflow.com/users/7586/kobi">Kobi</a> and <a href="http://stackoverflow.com/questions/4545090/listt-and-ienumerable-difference/4565811#4565811">his answer</a> finally made me understand why a <code>.Sort()</code> of the list messed up the result of my sorting. </p>
<p>I then changed the implementation to be fully lazy evaluated and now it looks like this.</p>
<table class="CodeRay"><tr>
<td class="line-numbers" title="double click to toggle" ondblclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre><a href="#n1" name="n1">1</a>
<a href="#n2" name="n2">2</a>
<a href="#n3" name="n3">3</a>
<a href="#n4" name="n4">4</a>
<a href="#n5" name="n5">5</a>
<a href="#n6" name="n6">6</a>
<a href="#n7" name="n7">7</a>
<a href="#n8" name="n8">8</a>
<a href="#n9" name="n9">9</a>
<strong><a href="#n10" name="n10">10</a></strong>
<a href="#n11" name="n11">11</a>
<a href="#n12" name="n12">12</a>
<a href="#n13" name="n13">13</a>
<a href="#n14" name="n14">14</a>
<a href="#n15" name="n15">15</a>
<a href="#n16" name="n16">16</a>
<a href="#n17" name="n17">17</a>
<a href="#n18" name="n18">18</a>
<a href="#n19" name="n19">19</a>
<strong><a href="#n20" name="n20">20</a></strong>
<a href="#n21" name="n21">21</a>
<a href="#n22" name="n22">22</a>
<a href="#n23" name="n23">23</a>
<a href="#n24" name="n24">24</a>
<a href="#n25" name="n25">25</a>
<a href="#n26" name="n26">26</a>
<a href="#n27" name="n27">27</a>
<a href="#n28" name="n28">28</a>
<a href="#n29" name="n29">29</a>
<strong><a href="#n30" name="n30">30</a></strong>
<a href="#n31" name="n31">31</a>
<a href="#n32" name="n32">32</a>
<a href="#n33" name="n33">33</a>
</pre></td>
<td class="code"><pre>public class MergeSort<T>
{
public IEnumerable<T> Sort(IEnumerable<T> arr)
{
<span style="color:#080;font-weight:bold">if</span> (arr.Count() <= <span style="color:#00D">1</span>) <span style="color:#080;font-weight:bold">return</span> arr;
<span style="color:#0a5;font-weight:bold">int</span> middle = arr.Count() / <span style="color:#00D">2</span>;
var left = arr.Take(middle);
var right = arr.Skip(middle);
<span style="color:#080;font-weight:bold">return</span> Merge(Sort(left), Sort(right));
}
private <span style="color:#088;font-weight:bold">static</span> IEnumerable<T> Merge(IEnumerable<T> left, IEnumerable<T> right)
{
IEnumerable<T> arrSorted = Enumerable.Empty<T>();
<span style="color:#080;font-weight:bold">while</span> (left.Count() > <span style="color:#00D">0</span> && right.Count() > <span style="color:#00D">0</span>)
{
<span style="color:#080;font-weight:bold">if</span> (Comparer<T>.Default.Compare(left.First(), right.First()) < <span style="color:#00D">0</span>)
{
arrSorted=arrSorted.Concat(left.Take(<span style="color:#00D">1</span>));
left = left.Skip(<span style="color:#00D">1</span>);
}
<span style="color:#080;font-weight:bold">else</span>
{
arrSorted=arrSorted.Concat(right.Take(<span style="color:#00D">1</span>));
right = right.Skip(<span style="color:#00D">1</span>);
}
}
<span style="color:#080;font-weight:bold">return</span> arrSorted.Concat(left).Concat(right);
}
}</pre></td>
</tr></table>
<p><br>
Please be aware that this is but an exercise and not very efficient.</p>
<p>Now to the problems that you can encounter with the above and an explanation to the title of the post. Let's MergeSort a simple <code>List<int></code> followed by a call to the built-in <code>.Sort()</code> on <code>List<T></code>.</p>
<table class="CodeRay"><tr>
<td class="line-numbers" title="double click to toggle" ondblclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre><a href="#n1" name="n1">1</a>
<a href="#n2" name="n2">2</a>
<a href="#n3" name="n3">3</a>
<a href="#n4" name="n4">4</a>
<a href="#n5" name="n5">5</a>
<a href="#n6" name="n6">6</a>
</pre></td>
<td class="code"><pre>var ints = new List<<span style="color:#0a5;font-weight:bold">int</span>> { <span style="color:#00D">2</span>, <span style="color:#00D">3</span>, <span style="color:#00D">1</span> };
var mergeSortInt = new MergeSort<<span style="color:#0a5;font-weight:bold">int</span>>();
var sortedInts = mergeSortInt.Sort(ints);
<span style="color:#777">// sortedInts.ToList() is {1, 2, 3}</span>
ints.Sort();
<span style="color:#777">// sortedInts.ToList() is {3, 1, 2}</span></pre></td>
</tr></table>
<p><br>
So what's going on here? As far as I can tell it's something like this. <code>sortedInts</code> isn't evaluated until the first call to <code>MoveNext()</code> (or <code>Tolist()</code> or any of those). Before that it only has lazy pointers to the original enumerable values. <code>ints.Sort()</code> messes up the beauty of lazy evaluation by changing the underlying data structure. It can do that because <code>List<T></code> is mutable (writeable).</p>
<p>But why {3, 1, 2} after <code>ints.Sort()</code>? The original sequence was { 2, 3, 1} and that is what the MergeSort sorted, not by creating a new sequence but only by lazy pointers. </p>
<p>At first the MergeSort sorts like this </p>
<p><img title="diagram before .Sort()" src="http://dl.dropbox.com/u/26840/beforeSort.png" border="0"/></p>
<p>but after the source list has been sorted/changed it will do this
<img title="diagram after .Sort()" src="http://dl.dropbox.com/u/26840/afterSort.png" border="0"/></p>
<p>What can you do to stop this to happening to you? The boring way is to never use lazy evaluation but that is kind of hard if you happen to use <a href="http://en.wikipedia.org/wiki/Language_Integrated_Query">LINQ</a> (and you should, it's great). Another way is to use immutable data structures and that is how functional languages tackles this problem. In C# we have the <a href="http://msdn.microsoft.com/en-us/library/ms132474.aspx">ReadOnlyCollection<T></a> and it obviously has no <code>Sort()</code> method. </p>
<p>A nice feature of the MergeSort above is that it has no problems with an immutable collection.</p>
<table class="CodeRay"><tr>
<td class="line-numbers" title="double click to toggle" ondblclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre><a href="#n1" name="n1">1</a>
<a href="#n2" name="n2">2</a>
<a href="#n3" name="n3">3</a>
<a href="#n4" name="n4">4</a>
<a href="#n5" name="n5">5</a>
</pre></td>
<td class="code"><pre>var rints = new ReadOnlyCollection<<span style="color:#0a5;font-weight:bold">int</span>>(ints);
var sortedRints = mergeSortInt.Sort(rints);
<span style="color:#777">// sortedRints.ToList() is {1, 2, 3}</span>
ints.Sort();
<span style="color:#777">// sortedRints.ToList() is {3, 1, 2}</span></pre></td>
</tr></table>
<p><br>
What?! How could an immutable collection get messed up like this? It turns out that <code>ReadOnlyCollection<T></code> is only <a href="http://blogs.msdn.com/b/jaredpar/archive/2008/04/22/api-design-readonlycollection-t.aspx">a facade for mutable collections</a> and that is what bit us here. You have to pass a copy of the list. Example follows.</p>
<table class="CodeRay"><tr>
<td class="line-numbers" title="double click to toggle" ondblclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre><a href="#n1" name="n1">1</a>
<a href="#n2" name="n2">2</a>
<a href="#n3" name="n3">3</a>
</pre></td>
<td class="code"><pre>var rints = new ReadOnlyCollection<<span style="color:#0a5;font-weight:bold">int</span>>(new List<<span style="color:#0a5;font-weight:bold">int</span>>(ints));
var sortedRints = mergeSortInt.Sort(rints);
ints.Sort();</pre></td>
</tr></table>
<p><br>
That also works for <code>List<T></code>.</p>
<table class="CodeRay"><tr>
<td class="line-numbers" title="double click to toggle" ondblclick="with (this.firstChild.style) { display = (display == '') ? 'none' : '' }"><pre><a href="#n1" name="n1">1</a>
<a href="#n2" name="n2">2</a>
<a href="#n3" name="n3">3</a>
<a href="#n4" name="n4">4</a>
<a href="#n5" name="n5">5</a>
<a href="#n6" name="n6">6</a>
</pre></td>
<td class="code"><pre>var ints = new List<<span style="color:#0a5;font-weight:bold">int</span>> { <span style="color:#00D">2</span>, <span style="color:#00D">3</span>, <span style="color:#00D">1</span> };
var mergeSortInt = new MergeSort<<span style="color:#0a5;font-weight:bold">int</span>>();
var sortedInts = mergeSortInt.Sort(new List<<span style="color:#0a5;font-weight:bold">int</span>>(ints));
<span style="color:#777">// sortedInts.ToList() is {1, 2, 3}</span>
ints.Sort();
<span style="color:#777">// sortedInts.ToList() is {1, 2, 3}</span></pre></td>
</tr></table>
<p><br></p>
<p>Finally I would like to send a big thank you to <a href="https://twitter.com/Kobi">Kobi</a> for sorting out my problems with the original solution.</p>Jonasurn:uuid:ab976999-9634-4a5e-8da5-0edf70bdc2dc2012-12-20T07:51:46+01:002013-01-07T00:31:40+01:00Comment on Lazy evaluation is no friend of mutable state by Jonas<p>Today the .NET team released a preview of Immutable Collections.
<a href="http://blogs.msdn.com/b/bclteam/archive/2012/12/18/preview-of-immutable-collections-released-on-nuget.aspx">http://blogs.msdn.com/b/bclteam/archive/2012/12/18/preview-of-immutable-collections-released-on-nuget.aspx</a></p>