A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/flame/blis/commit/06dddf1e51ccff70d77ee8cb731c3217e70eb730 below:

ReleaseNotes.md update. · flame/blis@06dddf1 · GitHub

4 4 5 5

## Contents

6 6 7 +

* [Changes in 1.0](ReleaseNotes.md#changes-in-10)

7 8

* [Changes in 0.9.0](ReleaseNotes.md#changes-in-090)

8 9

* [Changes in 0.8.1](ReleaseNotes.md#changes-in-081)

9 10

* [Changes in 0.8.0](ReleaseNotes.md#changes-in-080)

40 41

* [Changes in 0.0.2](ReleaseNotes.md#changes-in-002)

41 42

* [Changes in 0.0.1](ReleaseNotes.md#changes-in-001)

42 43 44 +

## Changes in 1.0

45 +

May 6, 2024

46 + 47 +

Improvements present in 1.0:

48 + 49 +

Framework:

50 +

- Initialize/finalize BLIS via a new `bli_pthread_switch_t` API. (Field Van Zee, Devin Matthews)

51 +

- Revamped `bli_init()` to use TLS where feasible. (Field Van Zee, Edward Smyth, Minh Quan Ho)

52 +

- Implemented support for fat multithreading.

53 +

- Implemented tile-level load balancing (tlb), or tile-level partitioning, in jr/ir loops for `gemm`, `gemmt`, and `trmm` macrokernels. (Field Van Zee, Devin Matthews, Leick Robinson, Minh Quan Ho)

54 +

- Added padding to `thrcomm_t` fields to avoid false sharing of cache lines. (Leick Robinson)

55 +

- Rewrote/fixed broken tree barrier implementation. (Leick Robinson)

56 +

- Refactored some `rntm_t` management code. (Field Van Zee, Devin Matthews)

57 +

- Initialize `rntm_t` nt/ways fields with 1 (not -1). (Field Van Zee, Jeff Diamond, Leick Robinson, Devin Matthews)

58 +

- Defined `invscalv`, `invscalm`, `invscald` operations.

59 +

- Added consistent `NaN`/`Inf` handling in `sumsqv`. (Devin Matthews)

60 +

- Implemented support for HPX as a threading backend option. (Christopher Taylor, Srinivas Yadav)

61 +

- Relocated the pba, sba pool (from the `rntm_t`), and `mem_t` (from the `cntl_t`) to the `thrinfo_t` object.

62 +

- Modified which communicator is associated with a given node of the `thrinfo_t` tree. (Devin Matthews)

63 +

- Refactored level-3 thread decorator into two parts: a thread launcher and a function to pass operands. (Devin Matthews)

64 +

- Refactored strucure awareness in `bli_packm_blk_var1.c`. (Devin Matthews)

65 +

- Reimplemented `bli_l3_determine_kc()`. (Devin Matthews)

66 +

- Implemented `cntx_t` pointer caching in gks. (Field Van Zee, Harihara Sudhan S)

67 +

- Added `const` keyword to pointers in kernel APIs. (Field Van Zee, Nisanth M P)

68 +

- Migrated all kernel APIs to use `void*` pointers.

69 +

- Defined new global scalar constants: `BLIS_ONE_I`, `BLIS_MINUS_ONE_I`, `BLIS_NAN`. (Devin Matthews)

70 +

- Disabled modification of KC in the `gemmsup` kernels. (Devin Matthews)

71 +

- Defined `lt`, `lte`, `gt`, `gte` operations and other miscellaneous updates.

72 +

- Consolidated `INSERT_` macro sets via variadic macros. (Devin Matthews)

73 +

- De-templatized macrokernels for `gemmt`, `trmm`, and `trsm` to match that of `gemm`. (Devin Matthews)

74 +

- De-templatized `bli_l3_sup_var1n2m.c` and unified `_sup_packm_a/b()`. (Devin Matthews)

75 +

- Fixed 1m enablement for `herk`/`her2k`/`syrk`/`syr2k`. (Devin Matthews)

76 +

- Fixed `trmm[3]`/`trsm` performance bug introduced in `cf7d616`. (Field Van Zee, Leick Robinson)

77 +

- Fixed a 1m optimization bug in right-sided `hemm`/`symm`. (Field Van Zee, Nisanth M P)

78 +

- Fixed a bug in sup threshold registration. (Devin Matthews, Field Van Zee)

79 +

- Fixed brokenness in the small block allocator (sba) when the sba is disabled. (Field Van Zee, John Mather)

80 +

- Fixed type bug in `bli_cntx_set_ukr_prefs()`. (Field Van Zee, Leick Robinson, Devin Matthews, Jeff Diamond)

81 +

- Fixed incorrect `sizeof(type)` in edge case macros. (@moon-chilled)

82 +

- Fixed bugs and added sanity check in `bli_pool.c`. (Devin Matthews)

83 +

- Fixed a typo in the macro defintion for `VEXTRACTF64X2` in `bli_x86_asm_macros.h`. (Harsh Dave)

84 +

- Fixed a typo in `bli_type_defs.h` where `BLIS_BLAS_INT_TYPE_SIZE` was misspelled. (Devin Matthews)

85 +

- Typecast `printf()` args in `bli_thread_range_tlb.c` to avoid compiler warnings. (Lee Killough)

86 +

- Minor tweaks to `bli_l3_check.c`.

87 +

- Partial addition of `const` to all interfaces above the (micro)kernels. (Devin Matthews)

88 +

- Fixed a harmless misspelling of `xpbys` in gemm macrokernel.

89 +

- Various internal API renaming/reorganization.

90 +

- Various other fixes.

91 + 92 +

Compatibility:

93 +

- Implemented `[cz]symv_()`, `[cz]syr_()`, `[cz]rot_()`. (Field Van Zee, James Foster)

94 +

- Fixed compilation errors when `BLIS_DISABLE_BLAS_DEFS` is defined. (Field Van Zee, Edward Smyth, Devin Matthews)

95 +

- Include `bli_config.h` before `bli_system.h` in `cblas.h` so that `BLIS_ENABLE_SYSTEM` is defined in time for proper OS detection. (Edward Smyth)

96 + 97 +

Kernels:

98 +

- Updated ARMv8a kernels to fix two prefetching issues and re-enable general stride IO. (Jeff Diamond)

99 +

- Restored general storage case to `armsve` kernels. (RuQing Xu)

100 +

- Added arm64 `dgemmsup` with extended MR and NR. (RuQing Xu)

101 +

- Reorganized the way `packm` kernels are stored within the `cntx_t` so that BLIS only stores two `packm` kernels per datatype: one for MRxk upanels and one for kxNR upanels. (Devin Matthews)

102 +

- Fixed bugs in `scal2v` reference kernel when alpha == 1.

103 +

- Fixed out-of-bounds read in `haswell` `gemmsup` kernels. (Daniël de Kok, Bhaskar Nallani, Madeesh Kannan)

104 +

- Fixed k = 0 edge case in `power10` microkernels. (Nisanth M P)

105 +

- Disabled `power10` kernels other than `sgemm`, `dgemm`. (Nisanth M P)

106 +

- Fixed `bli_gemm_small()` prototype mismatch. (Jeff Diamond)

107 + 108 +

Extras:

109 +

- Use the conventional level-3 sup thread decorator within the `gemmlike` sandbox.

110 +

- Fixed type-mismatch errors in `power10` sandbox. (Nisanth M P)

111 +

- Fixed `gemmlike` sandbox bug that stems from reuse of `bli_thrinfo_sup_grow()`.

112 + 113 +

Build system:

114 +

- Added two arm64 subconfigs: `altra` and `altramax`. (Jeff Diamond, Leick Robinson)

115 +

- Added support for RISC-V configuration targets. (Angelika Schwarz, Lee Killough)

116 +

- Auto-detect the RISC-V ABI of the compiler and use `-mabi=` during RISC-V builds. (Lee Killough)

117 +

- Added `sifive_x280` subconfig and kernel set. (Aaron Hutchinson, Lee Killough, Devin Matthews, and Angelika Schwarz)

118 +

- Added AddressSanitizer (--enable-asan) option to `configure`. (Devin Matthews)

119 +

- Added option to disable thread-local storage via `--disable-tls`. (Field Van Zee, Nick Knight)

120 +

- Exclude `-lrt` on Android with Bionic libraries. (Lee Killough)

121 +

- Omit `-fPIC` option when shared library build is disabled. (Field Van Zee, Nick Knight)

122 +

- Move `-fPIC` option insertion to subconfigs' `make_defs.mk` files. (Field Van Zee, Nick Knight)

123 +

- Install one-line helper headers to `INCDIR` prefix so that user can `#include "blis.h"` instead of `#include <blis/blis.h>` and/or `"cblas.h"` instead of `<blis/cblas.h>` if CBLAS is enabled). (Field Van Zee, Jed Brown, Devin Matthews, Mo Zhou)

124 +

- Enhanced detection of Fortran compiler when checking the version string for the purposes of determining a default return convention for complex domain values. (Bart Oldeman)

125 +

- Added detection of the NVIDIA nvhpc compiler (`nvc`) in `configure`. (Ajay Panyala)

126 +

- Updated `zen3` subconfig to support NVHPC compilers. (Abhishek Bagusetty)

127 +

- Use kernel CFLAGS for `kernels` subdirs in addons. (AMD, Mithun Mohan)

128 +

- Created `power` umbrella configuration family (which currently includes `power9` and `power10` subconfigs). (Nisanth M P)

129 +

- Defined `BLIS_VERSION_STRING` in `blis.h` instead of via command line argument during compilation. (Field Van Zee, Mohsen Aznaveh, Tim Davis)

130 +

- Rewrote `regen-symbols.sh` as `gen-libblis-symbols.sh`. (Field Van Zee)

131 +

- Support `clang` targetting MinGW. (Isuru Fernando)

132 +

- Added autodetection (via `/proc/cpuinfo`) for POWER7, POWER9 and POWER10 microarchitectures. (Alexander Grund)

133 +

- Added `#line` directives to flattened `blis.h` to facilitate easier debugging. (Devin Matthews)

134 +

- Added `--nosup` and `--sup` shorthand options to `configure`.

135 +

- Use here-document syntax for `configure --help` output. (Lee Killough)

136 +

- Updated `configure` to pass all `shellcheck` checks. (Lee Killough)

137 +

- Tweaks to `.dir-locals.el` to enchance emacs formatting of C files. (Lee Killough)

138 +

- Removed buggy cruft from `power10` subconfig. (Field Van Zee, Nicholai Tukanov)

139 +

- Added missing `#include <io.h>` for Windows. (@h-vetinari)

140 +

- Fixed hardware auto-detection for `firestorm` (Apple M1) subconfig. (Devin Matthews)

141 +

- Fixed bug in detection of Fortran compiler vendor. (Devin Matthews)

142 +

- Fixed version check for `znver3`, which needs gcc >= 10.3. (Jed Brown)

143 +

- Fixed typo in `configure --help` text. (Lee Killough)

144 +

- Fixed warning about regular expressions with stray backslashes as the result of recent changes to `grep`.

145 +

- Added `output.testsuite` to `.gitignore`.

146 +

- Minor changes to .gitignore and LICENSE files. (Jeff Diamond)

147 +

- Minor decluttering of top-level directory.

148 +

- Very minor tweaks to common.mk.

149 + 150 +

Testing:

151 +

- Rewrote `test/3` drivers to take parameters via command line arguments. (Field Van Zee, Jeff Diamond, Leick Robinson)

152 +

- Added `arm64` entry to `.travis.yml` so that Travis CI will compile/test ARM builds. (Field Van Zee, RuQing Xu)

153 +

- Test the `gemmlike` sandbox via AppVeyor. (Jeff Diamond)

154 +

- Added `-q` quiet mode option to testsuite.

155 +

- Fixed non-deterministic segfault in standalone `test/3` drivers. (Field Van Zee, Leick Robinson)

156 +

- Fixed a crash that occurs when either `cblat1` or `zblat1` are linked with a build of BLIS that was compiled with `--complex-return=intel`. (Bart Oldeman)

157 +

- Other minor fixes/tweaks.

158 + 159 +

Documentation:

160 +

- Added Discord documentation (`docs/Discord.md`) and logo to `README.md`.

161 +

- Added the `mm_algorithm` files (for bp and pb) to `docs/diagrams`.

162 +

- Added mention of Wilkinson Prize to `README.md`.

163 +

- Minor fixes and improvements to `docs/Multithreading.md`.

164 +

- Fix typos in docs + example code comments. (Igor Zhuravlov)

165 +

- Fixed broken "tagged releases" link in `README.md`.

166 +

- Added SMU citation to `README.md` intro.

167 + 43 168

## Changes in 0.9.0

44 169

April 1, 2022

45 170

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4