A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/GlobalNamesArchitecture/gnparser/issues/322 below:

Detect Bacteria and figure out if we can distinguish strains from authorships · Issue #322 · GlobalNamesArchitecture/gnparser · GitHub

egrep " [A-Z][a-z ]+.*[A-Z].* (19|20)[0-9][0-9] " ~/a/ot/repo/reference-taxonomy/tax/ncbi/taxonomy.tsv | fgrep -v "." >tmp.tmp

(should also work with the names.txt file that ships with NCBI)

This yields 72 results, many of which are parsed incorrectly. Unfortunately any rules you make for heuristically dealing with these are going to be baroque, and increasingly so as you try to get the false positive and false negative rates down. So I really don't expect this issue to be addressed. But I thought you should know.

Examples:

but also I'm impressed by how many gnparse gets right, e.g.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4