Releases · google/highway

14 Aug 07:39

jan-wassenberg

1.3.0

ac0d5d2

1.3.0 Latest

Latest

Add:

AddLower, PairwiseAdd/Sub, MaskedAbsOr, BitsFromMask
AVX10_2 and Loongson LASX/LSX targets
AVX3_SPR F16, WASM_EMU256 F64 types
CeilInt/FloorInt, DemoteToNearestInt and F16/F64 NearestInt
Complex number operations, F16/BF16 assignment operators
emulated bf16/f16 Load/StoreInterleaved
hwy::Warn/HWY_WARN, use instead of fprintf
HWY_UNREACHABLE, HWY_VISIT_TARGETS
i16 Dot, AverageRound, RoundingShiftRight/RoundingShr
InterleaveEvenBlocks/InterleaveOddBlocks, MinMagnitude/MaxMagnitude
masked comparisons, promote, round, GetBiasedExponent
MulByPow2/MulByFloorPow2, MulRound, MulLower/MulAddLower
PositiveInfOrHighestValue/NegativeInfOrLowestValue
RVV groundwork for runtime dispatch, enable tuples
spin wait, NanoSleep, Counter2/4 barrier, Divisor64, perf_counters

Improvements:

dpbf16 WidenMulPairwiseAdd Exp2, AVX10.2 float->int, AVX3 GetExponent
header-only abort.h/cc, tests runnable with Bazel8
HWY_BROKEN_*: allow individual override
Lanes: 'optional constexpr', AllBits1
MaskedEq/Ne, NEON SumOfMulQuadAccumulate, MaskedReduceMin/Max, MulEven
Profiler: report concurrency stats, 1.36x less overhead
RVV various ops via superoptimizer
SetThreadName: support more systems
SVE2 SatWidenMulPairwiseAccumulate, SSE2/SSSE3 U16 Min/Max
TargetName: no longer returns unknown for other arch
ThreadPool autotune, avoid WakeAll
topology: add NUMA node, support Windows/Apple

Fixes:

avoid wraparound for -ftrapv, topology for offline CPUs/RVV
warnings from -Wmissing-declarations/prototypes
AdvSIMD_HPFPCvt on OSX
f32->bf16 rounding: avoid unspecified built-in cast
MSAN, PPC InvariantTicksPerSecond on QEMU, HWY_RCAST_ALIGNED, IsNaN
vqsort for ascending order, add 8-bit test

Thanks to all contributors, especially johnplatts and eustas!

Assets 4

31 May 17:04

jan-wassenberg

1.2.0

457c891

1.2.0

Add InterleaveEven/InterleaveOdd, BitShuffle, GatherIndexNOr
Add IsNegative, IfNegativeThenElseZero, IfNegativeThenZeroElse
Add NEON_BF16, HWY_VERSION_GE/LT, HWY_EXPORT_T/HWY_DYNAMIC_DISPATCH_T
Add PromoteInRangeTo/ConvertInRangeTo/DemoteInRangeTo
Add Rol/Ror, RotateLeft/RotateLeftSame/RotateRightSame
Add SatWidenMulPairwiseAccumulate, SatWidenMulAccumFixedPoint
Add stats.h, bit_set.h, IsEitherNaN
Add UI8/UI32/UI64 MulHigh, I64 MulEven/MulOdd/Mul128
Add WidenMulAccumulate, MulEvenAdd, MulOddAdd
contrib/bit_pack: support 32/64-bit lanes
contrib/math: Add Exp2, Hypot
contrib/matvec: Add MatVecAdd
contrib/sort: Add VQ/HeapSelect, partial sort
contrib/topology: add affinity, detect topology/cache size/CPU name
Enable runtime dispatch for NEON/RVV, bazel modules, abort handler
Remove DASSERT for negative Gather indices
Support opting out of GUnit dependency
Use SPR/ZEN4 bf16 dot product
Known GCC 13 RVV issue: parts of sort_test and bit_pack_test disabled
Known Clang RVV/QEMU issue: incorrect rounding mode in upper/lower halves

Assets 4

18 Feb 01:33

jan-wassenberg

1.1.0

58b52a7

1.1.0

Add BitCastScalar, DispatchedTarget, Foreach
Add Div/Mod and MaskedDiv/ModOr, SaturatedAbs, SaturatedNeg
Add InterleaveWholeLower/Upper, Dup128VecFromValues
Add IsInteger, IsIntegerLaneType, RemoveVolatile, RemoveCvRef
Add MaskedAdd/Sub/Mul/Div/Gather/Min/Max/SatAdd/SatSubOr
Add MaskFalse, IfNegativeThenNegOrUndefIfZero, PromoteEven/OddTo
Add ReduceMin/Max, 8-bit reductions, f16 <-> f64 conversions
Add Span, AlignedArray, matrix-vector mul
Add SumsOf2/4, I8 SumsOf8, SumsOfAdjQuadAbsDiff, SumsOfShuffledQuadAbsDiff
Add ThreadPool, hierarchical profiler
Build: use bazel_platforms
Enable clang16 Arm/PPC runtime dispatch, F16 for GCC AVX3_SPR
Extend Dot to f32*bf16, FMA to integer
Fix: RVV 8-bit overflow, UB in vqsort, big-endian bugs, PPC HTM
Improved codegen in various ops, fp16/bf16 tests and conversions
New targets: HWY_Z14, HWY_Z15
Test: add foreign_arch builders, CodeQL

Assets 3

30 Aug 07:06

jan-wassenberg

1.0.7

ba0900a

1.0.7

Add LoadNOr, GatherIndexN, ScatterIndexN
Add additional float<->int conversions
Codegen improvements for 8-bit shift, PPC Compress/Expand
Fixes for MSVC, PPC, RVV, WASM, GCC 13, GCC 8.2, i686, f16 type, QEMU 7.2
Support CMake args in Debian packaging

Assets 3

11 Aug 15:01

jan-wassenberg

1.0.6

591ad35

1.0.6

Add MaskedGatherIndex, MaskedScatterIndex, LoadN, StoreN
Add SatWidenMulPairwiseAdd, SumOfMulQuadAccumulate, PromoteUpperLowerTo
Add F64 for Wasm, F64 AbsDiff
Add F16 support to AVX3_SPR, RVV tuple (both not yet enabled)
Validate all D args in x86 function signatures
License: now dual Apache2/BSD3
Doc: new users, vcpkg install instructions, AVX10 plans
Doc: advice on dynamic dispatch plus -march flags
Build: avoid installing hwy_test if !HWY_ENABLE_TESTS
Codegen: improved PPC9 Find*True, variable-length CopyBytes
Fix: GCC 8.2, MSVC, ICC, PPC9, SVE, arm64 MSVC issues
Fix: IfNegativeThenElse, MulFixedPoint15, Debian changelog format
Tests: faster builds (split up), use release builds

Assets 3

19 Jul 16:10

jan-wassenberg

1.0.5

f61a223

1.0.5

Add Insert/ExtractBlock, BroadcastBlock/Lane, NumBlocks
Add integer Le/Ge and [Neg]MulAdd, extend DemoteTo/PromoteTo
Add Leading/TrailingZeroCount, HighestSetBitIndex, ReverseBits
Add MaskedLoadOr, tuple Get/Set/Create, ReduceSum, WidenMulPairwiseAdd
Add [ZeroExtend]ResizeBitCast, BitwiseIfThenElse, Find[Known]LastTrue
Add AESRoundInv, AESKeyGenAssist
Add contrib/math Atan2/SinCos, contrib/unroller
Add fp16/bf16 support (Armv8, SVE, RVV), HWY_DYNAMIC_POINTER
Add OrderedTruncate2To, Per4LaneBlockShuffle, TwoTablesLookupLanes
Add SlideUp/Down[Blocks/Lanes], Slide1Up/Down, ReverseLaneBytes
Add SetBeforeFirst, SetAtOrBefore/AfterFirst, SetOnlyFirst
Add 8-bit Reverse2/4/8, Shl/Shr, RotateRight, Reverse, Mul
Add 8/16-bit DupEven/Odd, TableLookupLanes
Add F64 ApproximateReciprocal[Sqrt], 32/64-bit SaturatedAdd/Sub
Build: Support Bazel modules
Codegen improvements
Compiler: support Clang 15/16
Doc: add Github pages, support policy, evaluation
Doc: publish AVX-512 throttling/startup findings
Release: add signing
Test: add GCC to Github Actions
VQSort: small N speedups: fix seeding, func ptr, 8-wide network.
VQSort: add BenchAllColdSort, VQSortStatic
VQSort: fix subnormal/inf/NaN, support fp16, fix KV types
Workarounds: RVV VXRM, x87 excess precision, missing intrinsics

Assets 3

17 Mar 15:33

jan-wassenberg

1.0.4

46e365d

1.0.4

Add PPC8..10, SSE2, AVX3_ZEN4, NEON_WITHOUT_AES targets
Add Expand, LoadExpand, integer AbsDiff, SumsOf8AbsDiff
Improved Half/Twice support, codegen for Shift*Same
Support Wasm in Godbolt
Faster KV128 sorting
Fix armv7 build config, CMake config mode
Update RVV intrinsics for 1.0-draft

Assets 3

19 Jan 15:20

jan-wassenberg

1.0.3

58746ca

1.0.3

Add RearrangeToOddPlusEven, Xor3, 8-bit CompressStore, HWY_ASSUME
Add contrib/bit_pack for 8/16-bit lanes
Add WASM_EMU256 target
Documentation improvements
Allow opting out of C++ stdlib usage for Compiler Explorer
Update for new RVV intrinsics; faster WASM min/max and extmul/q15mul
Fix UB, GCC atomic

Assets 2

28 Oct 11:05

jan-wassenberg

1.0.2

293693e

1.0.2

Add ExclusiveNeither, FindKnownFirstTrue, Ne128
Add 16-bit SumOfLanes/ReorderWidenMulAccumulate/ReorderDemote2To
Faster sort for low-entropy input, improved pivot selection
Add GN build system, Highway FAQ, k32v32 type to vqsort
CMake: Support find_package(GTest), add rvv-inl.h, add HWY_ENABLE_TESTS
Fix MIPS and C++20 build, Apple LLVM 10.3 detection, EMU128 AllTrue on RVV
Fix missing exec_prefix, RVV build, warnings, libatomic linking
Work around GCC 10.4 issue, disabled RDCYCLE, arm7 with vfpv3
Documentation/example improvements
Support static dispatch to SVE2_128 and SVE_256

Assets 2

24 Aug 16:43

jan-wassenberg

1.0.1

22e3d72

1.0.1

Add Eq128, i64 Mul, unsigned->float ConvertTo
Faster sort for few unique keys, more robust pivot selection
Fix: floating-point generator for sort tests, Min/MaxOfLanes for i16
Fix: avoid always_inline in debug, link atomic
GCC warnings: string.h, maybe-uninitialized, ignored-attributes
GCC warnings: preprocessor int overflow, spurious use-after-free/overflow
Doc: <=HWY_AVX3, Full32/64/128, how to use generic-inl

Assets 2

Releases: google/highway

1.3.0

Uh oh!

1.2.0

Uh oh!

1.1.0

Uh oh!

1.0.7

Uh oh!

1.0.6

Uh oh!

1.0.5

Uh oh!

1.0.4

Uh oh!

1.0.3

Uh oh!

1.0.2

Uh oh!

1.0.1

Uh oh!