Update: With the last batch of checkins, all sorts on Kevin's company database are faster (a little to a killer lot) under 2.3a0 than under 2.2.1. A reminder of what this looks like: > A record looks like this after running his script to turn them > into Python dicts: > > {'Address': '395 Page Mill Road\nPalo Alto, CA 94306', > 'Company': 'Agilent Technologies Inc.', > 'Exchange': 'NYSE', > 'NumberOfEmployees': '41,000', > 'Phone': '(650) 752-5000', > 'Profile': 'http://biz.yahoo.com/p/a/a.html', > 'Symbol': 'A', > 'Web': 'http://www.agilent.com'} > > It appears to me that the XML file is maintained by hand, in order > of ticker symbol. But people make mistakes when alphabetizing > by hand, and there are 37 indices i such that > > data[i]['Symbol'] > data[i+1]['Symbol'] > > So it's "almost sorted" by that measure ... > The proper order of Yahoo profile URLs is also strongly correlated > with ticker symbol, while both the company name and web address > look weakly correlated > [and Address, NumberOfEmployess, and Phone are essentially > randomly ordered] Here are the latest (and I expect the last) timings, in milliseconds per sort, on the list of (key, index, record) tuples values = [(x.get(fieldname), i, x) for i, x in enumerate(data)] [I wrote a little generator to simulate 2.3's enumerate() in 2.2.1] There are 6635 companies in the database, but not all fields are present in all records; .get() plugs in a key of None for those cases, and the index is to prevent equal-key cases from falling into breaking the tie via expensive dict comparison (each record x is a dict!): Sorting on field 'Address' 2.2.1: 41.57 2.3a0: 40.96 Sorting on field 'Company' 2.2.1: 40.14 2.3a0: 29.79 Sorting on field 'Exchange' 2.2.1: 53.83 2.3a0: 24.79 Sorting on field 'NumberOfEmployees' 2.2.1: 47.89 2.3a0: 45.74 Sorting on field 'Phone' 2.2.1: 48.09 2.3a0: 47.15 Sorting on field 'Profile' 2.2.1: 58.41 2.3a0: 8.77 Sorting on field 'Symbol' 2.2.1: 40.78 2.3a0: 6.30 Sorting on field 'Web' 2.2.1: 46.79 2.3a0: 35.64 This may have been sorted more times by now than any other database on Earth <wink>.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4