Releases: gorgonia/tensor
Bugfix: `ShallowClone` had a subtle bug where not all the properties of `Dense` was cloned.
v0.9.4 Fixes a bug in `ShallowClone` (#56)
Pointer Arithmetics fixed
Thanks to the new checkptr feature in go tip, a bunch of pointer arithmetics issues were found and fixed.
No API changes
Bugfix: axis decrement in aggregation functions
@bdleitner fixed a bug where axes in the along array were only ever decremented by 1 to account for prior reductions, no matter how many reductions had occurred.
v0.9.1
v0.9.0-beta
v0.9.0
The changes made in this PR is aimed at better supporting v0.9.0 of Gorgonia itself. Along the way there are some new features and optimizations, as well as some bug fixes.
The majority of the work in supporting v0.9.0 of Gorgonia is to shore up the underlying architecture to support CUDA related engines. This means moving more things to rely on Engine while keeping the engine interface overheads low. Additionally this also means better support for column major data layouts.
- Heavier reliance on
Enginefor most functions. This allows for extensibility on the data structure. - Long standing bugbear - concepts of
RowVecandColVechas been removed (thanks to @matdodgson)- Touch points:
ap.go,iterator.go,iterator_mult.go.shape.go, and the tests that were correct prior to this change have semantic meaning changes too. - POTENTIAL TECH DEBT:
iterator_mult.go- the solution of filling with ones is a little too dodgy for my liking. The alternative would be to changeBroadcastStrideswhich will change even more things (Concat,Stacketc)
- Touch points:
- Optimization:
APhas been depointerized in*Dense(thanks to @docmerlin). This reduces some amount of GC pointer chasing, but not all- allocation is slightly improved. (
(array).fromSliceOrArrayer,(array).fix()and(array).forcefix()are part of the improvement around the logic of allocating data.
- Bug fixes:
- Fixes subtle errors in linear algebra functions. The result is a slightly longer function but easier to reason with.
- Fixes some subtle bugs in
Concat- see also gorgonia/gorgonia#218 - Fixed some small bugs with regards to
SampleIndexthat only show up when the slices have extreme lengths. This API should have been deprecated 2 years ago, but eh... it touched a lot of external projects.
- API changes:
Diagis made available. Relies heavily on anEngine's implementationNewFlatIteratoris unexported.NewAPis unexported.MakeAPis used instead.(Tensor).DataOrder()is added to the definiiton of what aTensoris.(Shape).IsScalarEquiv()is a new method. This corresponds to the change of semantics of what aShapeshould be.(Shape).CalcStrides()is exported now. This enables users to correctly calculate strides that are consistent to what the package expects.(Shape).CalcStridesColMajor()is exported as the method to calculate the strides of a Col-Major*Dense.
- New Interfaces:
NonStdEngineis anEngine that does not allocate using the default allocator. This allows for both embedding aDefaultEngine` while overriding the allocation behaviour.Diager- any engine that can return a tensor that only contains the diagonal values of the inputNaNCheckerandInfChecker- engines that can check a tensor for NaN and Inf
- New Features:
- New Subpackages:
nativeis a subpackage that essentially gives users a native, Go-based iterator. Basically the ability to go from a*Denseto a[][]Tor[][][]Twithout extra allocations (for the data). This was pulled intomasterearlier, but as of v0.9.0, the generic version is available too.
- Semantic Changes:
Shapehas semantic changes regarding whether or not a shape is scalar. A scalar shape is defined to beShape{}orShape{1}only. Formerly,Shape{1,1}was also considered to be scalar. Now they're considered to beScalarEquivalent(along withShape{1, 1, .... , 1})- A
Dtypethat is is orderable is also now comparable for equality. IfRegisterOrdis called with a newDtype, it is also automatically registered asEq.
- Cosmetic Changes:
- README has been updated to point to correct doc pages
v0.8.1
v0.8.1 sees the built in transpose function use a different algorithm to perform inplace transpose.
Prior to this version the transpose uses a cycle-chasing algorithm. This turns out to have poor cache locality. So the solution is to replace that with one that allocates a new temporary array. The transpose operation is then simpy an iterative copying to the new array. The data is then copied from the temp array back to the original array.
v0.8.1 also sees an improvement contributed by @stuartcarnie on the FlatIterator structure. Here's the benchmark results.
benchmark old ns/op new ns/op delta
BenchmarkComplicatedGet-8 228778 199737 -12.69%
benchmark old allocs new allocs delta
BenchmarkComplicatedGet-8 2 2 +0.00%
benchmark old bytes new bytes delta
BenchmarkComplicatedGet-8 112 112 +0.00%