<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div>On Mon, Aug 04, 2014 at 09:25:12AM -0700, Chris Barker wrote:<br>
<br>
If only that were the case, but it isn't. Here's a cautionary tale for<br>
<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">It shouldn't be hard to demonstrate the difference between repeated<br>
string concatenation and join, all you need do is defeat sum()'s<br>
<div><br></div><div>a_numpy_array.sum()</div><div><br></div><div>is going to be a lot faster than:</div><div><br></div><div>sum(a_numpy_array)</div><div><br></div><div>and why I'll tell everyone that is working with lots of numbers to use numpy. ndarray.sum know what data type it's deaing with,a nd can do the loop in C. similarly with ''.join() (though not as optimized. </div>
<div><br></div><div>But I'm not sure we're seeing the big O difference here at all -- but rather the extra calls though each element in the list's __add__ method.</div><div><br></div><div>In the case where you already HAVE a big list of strings, then yes, ''.join is the clear winner. </div>
<div><br></div><div>But I think the case we're often talking about, and I've tested with students, is when you are building up a long string on the fly out of little strings. In that case, you need to profile the full "append to list, then call join()", not just the join() call:</div>
<div><br></div><div># continued adding of strings ( O(n^2)? )</div><div><div>In [6]: def add_strings(l):</div><div>Â Â ...: Â Â s = ''</div><div>Â Â ...: Â Â for i in l:</div><div>Â Â ...: Â Â Â Â s+=i</div><div>Â Â ...: Â Â return s</div>
</div><div><br></div><div>Using append and then join ( O(n)? )</div><div><div>In [14]: def join_strings(list_of_strings):</div><div>Â Â ....: Â Â l = []</div><div>Â Â ....: Â Â for i in list_of_strings:</div><div>Â Â ....: Â Â Â Â l.append(i)</div>
<div>Â Â ....: Â Â return ''.join(l)</div></div><div><br></div><div><div>In [23]: timeit add_strings(strings)</div><div>1000000 loops, best of 3: 831 ns per loop</div><div><br></div><div>In [24]: timeit join_strings(strings)</div>
<div>100000 loops, best of 3: 1.87 µs per loop</div></div><div><br></div><div>## hmm -- concatenating is faster for a small list of tiny strings....</div><div><br></div><div><div>In [31]: strings = list('Hello World')* 1000</div>
</div><div><br></div><div>strings *= 1000</div><div><div>In [26]: timeit add_strings(strings)</div>
<br>
<div>100 loops, best of 3: 10.1 ms per loop</div><div>
<div><br></div><div>In [33]: timeit join_strings(strings)</div><div>1 loops, best of 3: 1.05 s per loop</div></div><div><br></div><div>there we go -- slight advantage to joining.....</div><div><br></div><div>So this is why we've said that the common wisdom about string concatenating isn't really a practical issue.</div>
<div><br></div><div>But if you already have the strings all in a list, then yes, join() is a major win over sum()</div><div><br></div><div>In fact, I tried the above with sum() -- and it was really, really slow. So slow I didn't have the patience to wait for it.</div>
<div><br></div><div>Here is a smaller example:</div><div><br></div><div><div>In [22]: strings = list('Hello World')* 10000</div><div><br></div><div>In [23]: timeit add_strings(strings)</div><div>100 loops, best of 3: 9.61 ms per loop</div>
<div><br></div><div>In [24]: timeit sum( strings, Faker() )</div><div>1 loops, best of 3: 246 ms per loop</div></div><div><br></div><div>So why is sum() so darn slow with strings compared to a simple loop with += ?</div>
<div>-Chris</div><div><br></div><div><br></div><div><br></div></div><br clear="all"><div><br></div>-- <br><br>Christopher Barker, Ph.D.<br>Oceanographer<br><br>Emergency Response Division<br>NOAA/NOS/OR&R Â Â Â Â Â Â <a href="tel:%28206%29%20526-6959" value="+12065266959" target="_blank">(206) 526-6959</a>Â Â voice<br>
7600 Sand Point Way NE Â Â <a href="tel:%28206%29%20526-6329" value="+12065266329" target="_blank">(206) 526-6329</a>Â Â fax<br>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4