4
4
5
5
## Contents
6
6
7
+
* [Changes in 1.0](ReleaseNotes.md#changes-in-10)
7
8
* [Changes in 0.9.0](ReleaseNotes.md#changes-in-090)
8
9
* [Changes in 0.8.1](ReleaseNotes.md#changes-in-081)
9
10
* [Changes in 0.8.0](ReleaseNotes.md#changes-in-080)
40
41
* [Changes in 0.0.2](ReleaseNotes.md#changes-in-002)
41
42
* [Changes in 0.0.1](ReleaseNotes.md#changes-in-001)
42
43
44
+
## Changes in 1.0
45
+
May 6, 2024
46
+
47
+
Improvements present in 1.0:
48
+
49
+
Framework:
50
+
- Initialize/finalize BLIS via a new `bli_pthread_switch_t` API. (Field Van Zee, Devin Matthews)
51
+
- Revamped `bli_init()` to use TLS where feasible. (Field Van Zee, Edward Smyth, Minh Quan Ho)
52
+
- Implemented support for fat multithreading.
53
+
- Implemented tile-level load balancing (tlb), or tile-level partitioning, in jr/ir loops for `gemm`, `gemmt`, and `trmm` macrokernels. (Field Van Zee, Devin Matthews, Leick Robinson, Minh Quan Ho)
54
+
- Added padding to `thrcomm_t` fields to avoid false sharing of cache lines. (Leick Robinson)
55
+
- Rewrote/fixed broken tree barrier implementation. (Leick Robinson)
56
+
- Refactored some `rntm_t` management code. (Field Van Zee, Devin Matthews)
57
+
- Initialize `rntm_t` nt/ways fields with 1 (not -1). (Field Van Zee, Jeff Diamond, Leick Robinson, Devin Matthews)
58
+
- Defined `invscalv`, `invscalm`, `invscald` operations.
59
+
- Added consistent `NaN`/`Inf` handling in `sumsqv`. (Devin Matthews)
60
+
- Implemented support for HPX as a threading backend option. (Christopher Taylor, Srinivas Yadav)
61
+
- Relocated the pba, sba pool (from the `rntm_t`), and `mem_t` (from the `cntl_t`) to the `thrinfo_t` object.
62
+
- Modified which communicator is associated with a given node of the `thrinfo_t` tree. (Devin Matthews)
63
+
- Refactored level-3 thread decorator into two parts: a thread launcher and a function to pass operands. (Devin Matthews)
64
+
- Refactored strucure awareness in `bli_packm_blk_var1.c`. (Devin Matthews)
65
+
- Reimplemented `bli_l3_determine_kc()`. (Devin Matthews)
66
+
- Implemented `cntx_t` pointer caching in gks. (Field Van Zee, Harihara Sudhan S)
67
+
- Added `const` keyword to pointers in kernel APIs. (Field Van Zee, Nisanth M P)
68
+
- Migrated all kernel APIs to use `void*` pointers.
69
+
- Defined new global scalar constants: `BLIS_ONE_I`, `BLIS_MINUS_ONE_I`, `BLIS_NAN`. (Devin Matthews)
70
+
- Disabled modification of KC in the `gemmsup` kernels. (Devin Matthews)
71
+
- Defined `lt`, `lte`, `gt`, `gte` operations and other miscellaneous updates.
72
+
- Consolidated `INSERT_` macro sets via variadic macros. (Devin Matthews)
73
+
- De-templatized macrokernels for `gemmt`, `trmm`, and `trsm` to match that of `gemm`. (Devin Matthews)
74
+
- De-templatized `bli_l3_sup_var1n2m.c` and unified `_sup_packm_a/b()`. (Devin Matthews)
75
+
- Fixed 1m enablement for `herk`/`her2k`/`syrk`/`syr2k`. (Devin Matthews)
76
+
- Fixed `trmm[3]`/`trsm` performance bug introduced in `cf7d616`. (Field Van Zee, Leick Robinson)
77
+
- Fixed a 1m optimization bug in right-sided `hemm`/`symm`. (Field Van Zee, Nisanth M P)
78
+
- Fixed a bug in sup threshold registration. (Devin Matthews, Field Van Zee)
79
+
- Fixed brokenness in the small block allocator (sba) when the sba is disabled. (Field Van Zee, John Mather)
80
+
- Fixed type bug in `bli_cntx_set_ukr_prefs()`. (Field Van Zee, Leick Robinson, Devin Matthews, Jeff Diamond)
81
+
- Fixed incorrect `sizeof(type)` in edge case macros. (@moon-chilled)
82
+
- Fixed bugs and added sanity check in `bli_pool.c`. (Devin Matthews)
83
+
- Fixed a typo in the macro defintion for `VEXTRACTF64X2` in `bli_x86_asm_macros.h`. (Harsh Dave)
84
+
- Fixed a typo in `bli_type_defs.h` where `BLIS_BLAS_INT_TYPE_SIZE` was misspelled. (Devin Matthews)
85
+
- Typecast `printf()` args in `bli_thread_range_tlb.c` to avoid compiler warnings. (Lee Killough)
86
+
- Minor tweaks to `bli_l3_check.c`.
87
+
- Partial addition of `const` to all interfaces above the (micro)kernels. (Devin Matthews)
88
+
- Fixed a harmless misspelling of `xpbys` in gemm macrokernel.
89
+
- Various internal API renaming/reorganization.
90
+
- Various other fixes.
91
+
92
+
Compatibility:
93
+
- Implemented `[cz]symv_()`, `[cz]syr_()`, `[cz]rot_()`. (Field Van Zee, James Foster)
94
+
- Fixed compilation errors when `BLIS_DISABLE_BLAS_DEFS` is defined. (Field Van Zee, Edward Smyth, Devin Matthews)
95
+
- Include `bli_config.h` before `bli_system.h` in `cblas.h` so that `BLIS_ENABLE_SYSTEM` is defined in time for proper OS detection. (Edward Smyth)
96
+
97
+
Kernels:
98
+
- Updated ARMv8a kernels to fix two prefetching issues and re-enable general stride IO. (Jeff Diamond)
99
+
- Restored general storage case to `armsve` kernels. (RuQing Xu)
100
+
- Added arm64 `dgemmsup` with extended MR and NR. (RuQing Xu)
101
+
- Reorganized the way `packm` kernels are stored within the `cntx_t` so that BLIS only stores two `packm` kernels per datatype: one for MRxk upanels and one for kxNR upanels. (Devin Matthews)
102
+
- Fixed bugs in `scal2v` reference kernel when alpha == 1.
103
+
- Fixed out-of-bounds read in `haswell` `gemmsup` kernels. (Daniël de Kok, Bhaskar Nallani, Madeesh Kannan)
104
+
- Fixed k = 0 edge case in `power10` microkernels. (Nisanth M P)
105
+
- Disabled `power10` kernels other than `sgemm`, `dgemm`. (Nisanth M P)
106
+
- Fixed `bli_gemm_small()` prototype mismatch. (Jeff Diamond)
107
+
108
+
Extras:
109
+
- Use the conventional level-3 sup thread decorator within the `gemmlike` sandbox.
110
+
- Fixed type-mismatch errors in `power10` sandbox. (Nisanth M P)
111
+
- Fixed `gemmlike` sandbox bug that stems from reuse of `bli_thrinfo_sup_grow()`.
112
+
113
+
Build system:
114
+
- Added two arm64 subconfigs: `altra` and `altramax`. (Jeff Diamond, Leick Robinson)
115
+
- Added support for RISC-V configuration targets. (Angelika Schwarz, Lee Killough)
116
+
- Auto-detect the RISC-V ABI of the compiler and use `-mabi=` during RISC-V builds. (Lee Killough)
117
+
- Added `sifive_x280` subconfig and kernel set. (Aaron Hutchinson, Lee Killough, Devin Matthews, and Angelika Schwarz)
118
+
- Added AddressSanitizer (--enable-asan) option to `configure`. (Devin Matthews)
119
+
- Added option to disable thread-local storage via `--disable-tls`. (Field Van Zee, Nick Knight)
120
+
- Exclude `-lrt` on Android with Bionic libraries. (Lee Killough)
121
+
- Omit `-fPIC` option when shared library build is disabled. (Field Van Zee, Nick Knight)
122
+
- Move `-fPIC` option insertion to subconfigs' `make_defs.mk` files. (Field Van Zee, Nick Knight)
123
+
- Install one-line helper headers to `INCDIR` prefix so that user can `#include "blis.h"` instead of `#include <blis/blis.h>` and/or `"cblas.h"` instead of `<blis/cblas.h>` if CBLAS is enabled). (Field Van Zee, Jed Brown, Devin Matthews, Mo Zhou)
124
+
- Enhanced detection of Fortran compiler when checking the version string for the purposes of determining a default return convention for complex domain values. (Bart Oldeman)
125
+
- Added detection of the NVIDIA nvhpc compiler (`nvc`) in `configure`. (Ajay Panyala)
126
+
- Updated `zen3` subconfig to support NVHPC compilers. (Abhishek Bagusetty)
127
+
- Use kernel CFLAGS for `kernels` subdirs in addons. (AMD, Mithun Mohan)
128
+
- Created `power` umbrella configuration family (which currently includes `power9` and `power10` subconfigs). (Nisanth M P)
129
+
- Defined `BLIS_VERSION_STRING` in `blis.h` instead of via command line argument during compilation. (Field Van Zee, Mohsen Aznaveh, Tim Davis)
130
+
- Rewrote `regen-symbols.sh` as `gen-libblis-symbols.sh`. (Field Van Zee)
131
+
- Support `clang` targetting MinGW. (Isuru Fernando)
132
+
- Added autodetection (via `/proc/cpuinfo`) for POWER7, POWER9 and POWER10 microarchitectures. (Alexander Grund)
133
+
- Added `#line` directives to flattened `blis.h` to facilitate easier debugging. (Devin Matthews)
134
+
- Added `--nosup` and `--sup` shorthand options to `configure`.
135
+
- Use here-document syntax for `configure --help` output. (Lee Killough)
136
+
- Updated `configure` to pass all `shellcheck` checks. (Lee Killough)
137
+
- Tweaks to `.dir-locals.el` to enchance emacs formatting of C files. (Lee Killough)
138
+
- Removed buggy cruft from `power10` subconfig. (Field Van Zee, Nicholai Tukanov)
139
+
- Added missing `#include <io.h>` for Windows. (@h-vetinari)
140
+
- Fixed hardware auto-detection for `firestorm` (Apple M1) subconfig. (Devin Matthews)
141
+
- Fixed bug in detection of Fortran compiler vendor. (Devin Matthews)
142
+
- Fixed version check for `znver3`, which needs gcc >= 10.3. (Jed Brown)
143
+
- Fixed typo in `configure --help` text. (Lee Killough)
144
+
- Fixed warning about regular expressions with stray backslashes as the result of recent changes to `grep`.
145
+
- Added `output.testsuite` to `.gitignore`.
146
+
- Minor changes to .gitignore and LICENSE files. (Jeff Diamond)
147
+
- Minor decluttering of top-level directory.
148
+
- Very minor tweaks to common.mk.
149
+
150
+
Testing:
151
+
- Rewrote `test/3` drivers to take parameters via command line arguments. (Field Van Zee, Jeff Diamond, Leick Robinson)
152
+
- Added `arm64` entry to `.travis.yml` so that Travis CI will compile/test ARM builds. (Field Van Zee, RuQing Xu)
153
+
- Test the `gemmlike` sandbox via AppVeyor. (Jeff Diamond)
154
+
- Added `-q` quiet mode option to testsuite.
155
+
- Fixed non-deterministic segfault in standalone `test/3` drivers. (Field Van Zee, Leick Robinson)
156
+
- Fixed a crash that occurs when either `cblat1` or `zblat1` are linked with a build of BLIS that was compiled with `--complex-return=intel`. (Bart Oldeman)
157
+
- Other minor fixes/tweaks.
158
+
159
+
Documentation:
160
+
- Added Discord documentation (`docs/Discord.md`) and logo to `README.md`.
161
+
- Added the `mm_algorithm` files (for bp and pb) to `docs/diagrams`.
162
+
- Added mention of Wilkinson Prize to `README.md`.
163
+
- Minor fixes and improvements to `docs/Multithreading.md`.
164
+
- Fix typos in docs + example code comments. (Igor Zhuravlov)
165
+
- Fixed broken "tagged releases" link in `README.md`.
166
+
- Added SMU citation to `README.md` intro.
167
+
43
168
## Changes in 0.9.0
44
169
April 1, 2022
45
170
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4