parse_number_fxx
Test Data
This repository contains test data for parse_number_fxx
implementations (for fxx
being f16
, f32
or f64
), also known as StringToDouble
, strtod
, atof
, etc. These convert from an ASCII string to a 16-, 32- or 64-bit value (IEEE 754 half-, single- or double-precision floating point).
Most of the data/*.txt
files were derived by running script/extract-numbery-strings.go
on various repositories or zip files, listed further below. Their contents look like:
3C00 3F800000 3FF0000000000000 1
3D00 3FA00000 3FF4000000000000 1.25
3D9A 3FB33333 3FF6666666666666 1.4
57B7 42F6E979 405EDD2F1A9FBE77 123.456
622A 44454000 4088A80000000000 789
7C00 7F800000 7FF0000000000000 123.456e789
For example, parsing "1.4"
as a float32
gives the bits 0x3FB33333
.
In this case, the final line's float16
, float32
and float64
values are all infinity. The largest finite float{16,32,64}
values are approximately 6.55e+4
, 3.40e+38
and 1.80e+308
.
For each line of these data/*.txt
files, the f16
, f32
and f64
hexadecimal digits and the ASCII string subslices are:
[0..4]
, [5..13]
, [14..30]
and [31..]
.[1..5]
, [6..14]
, [15..31]
and [32..]
.The first half (the high 16 bits) of the f32
hexadecimal digits are also known as the bfloat16
format.
In the data
directory:
exhaustive-float16.txt
is an exhaustive list of float16
values.freetype-2-7.txt
was extracted from Freetype 2.7google-double-conversion.txt
was extracted from google/double-conversiongoogle-wuffs.txt
was extracted from google/wuffsibm-fpgen.txt
was extracted from IBM's IEEE 754R test suitelemire-fast-double-parser.txt
was extracted from lemire/fast_double_parserlemire-fast-float.txt
was extracted from lemire/fast_floatmore-test-cases.txt
was extracted from this repository's manually curated collection of more test casestencent-rapidjson.txt
was extracted from Tencent/rapidjsonulfjack-ryu.txt
was extracted from ulfjack/ryuThe data/remyoudompheng-fptest-?.txt
files were created by running go test -test.run=TestTortureAtof64
in the remyoudompheng/fptest repository (with the following patch), running the resultant TestTortureAtof64.txt
file through script/extract-numbery-strings.go
and then using sed
to split what would be a 189 MiB file into multiple (million line) files:
diff --git a/torture_test.go b/torture_test.go index 87ba7e7..59887ff 100644 --- a/torture_test.go +++ b/torture_test.go @@ -1,8 +1,11 @@ package fptest import ( + "bufio" "bytes" + "fmt" "math" + "os" "strconv" "testing" @@ -124,6 +127,11 @@ func TestTortureShortest32(t *testing.T) { } func TestTortureAtof64(t *testing.T) { + tmpFile, _ := os.Create("/tmp/TestTortureAtof64.txt") + defer tmpFile.Close() + tmpWriter := bufio.NewWriter(tmpFile) + defer tmpWriter.Flush() + count := 0 buf := make([]byte, 64) roundUp := false @@ -140,6 +148,7 @@ func TestTortureAtof64(t *testing.T) { t.Errorf("could not parse %q: %s", s, err) return } + fmt.Fprintf(tmpWriter, "%s\n", s) expect := x if roundUp { expect = y
Programs that use this test data set:
script/manual-test-parse-number-f64.cc
in google/wuffstestsuite/json/test_json_decimal_to_number.adb
in AdaCore/VSSAs of November 2021, data/*.txt
contains over 5 million test cases. Parsing them all should take tens of seconds at most. For example, on a mid-range x86_64
laptop (2016; Skylake):
$ grep model.name /proc/cpuinfo | uniq
model name : Intel(R) Core(TM) m3-6Y30 CPU @ 0.90GHz
$ git clone --depth 1 --quiet https://github.com/google/wuffs.git
$ gcc -O3 wuffs/script/manual-test-parse-number-f64.cc
$ time ./a.out data/*.txt
31745 OK in data/exhaustive-float16.txt
3566 OK in data/freetype-2-7.txt
564745 OK in data/google-double-conversion.txt
10744 OK in data/google-wuffs.txt
102792 OK in data/ibm-fpgen.txt
94313 OK in data/lemire-fast-double-parser.txt
3299 OK in data/lemire-fast-float.txt
60 OK in data/more-test-cases.txt
1000000 OK in data/remyoudompheng-fptest-0.txt
1000000 OK in data/remyoudompheng-fptest-1.txt
1000000 OK in data/remyoudompheng-fptest-2.txt
885708 OK in data/remyoudompheng-fptest-3.txt
3563 OK in data/tencent-rapidjson.txt
599458 OK in data/ulfjack-ryu.txt
real 0m6.790s
user 0m6.707s
sys 0m0.082s
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4