Skip to content

Commit 3b39252

Browse files
authored
Expose DiskArrays.cache (#417)
* forward DiskArrays.cache * add some docs * bump version
1 parent 664590f commit 3b39252

File tree

5 files changed

+179
-133
lines changed

5 files changed

+179
-133
lines changed

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name = "YAXArrays"
22
uuid = "c21b50f5-aa40-41ea-b809-c0f5e47bfa5c"
33
authors = ["Fabian Gans <[email protected]>"]
4-
version = "0.5.9"
4+
version = "0.5.10"
55

66
[deps]
77
CFTime = "179af706-886a-5703-950a-314cd64e0468"

docs/src/UserGuide/cache.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Caching YAXArrays
2+
3+
For some applications like interactive plotting of large datasets it can not be avoided that the same data must be accessed several times. In these cases it can be useful to store recently accessed data in a cache. In YAXArrays this can be easily achieved using the `cache` function. For example, if we open a large dataset from a remote source and want to keep data in a cache of size 500MB one can use:
4+
5+
````julia
6+
using YAXArrays, Zarr
7+
ds = open_dataset("path/to/source")
8+
cachesize = 500 #MB
9+
cache(ds,maxsize = cachesize)
10+
````
11+
12+
The above will wrap every array in the dataset into its own cache, where the 500MB are distributed equally across datasets.
13+
Alternatively individual caches can be applied to single `YAXArray`s
14+
15+
````julia
16+
yax = ds.avariable
17+
cache(yax,maxsize = 1000)
18+
````

src/Cubes/Cubes.jl

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ The functions provided by YAXArrays are supposed to work on different types of c
33
Data types that
44
"""
55
module Cubes
6-
using DiskArrays: DiskArrays, eachchunk, approx_chunksize, max_chunksize, grid_offset, GridChunks
6+
using DiskArrays: DiskArrays, eachchunk, approx_chunksize, max_chunksize, grid_offset, GridChunks, cache
77
using Distributed: myid
88
using Dates: TimeType, Date
99
using IntervalSets: Interval, (..)
@@ -17,7 +17,7 @@ using Tables: istable, schema, columns
1717
using DimensionalData: DimensionalData as DD, AbstractDimArray, NoName
1818
import DimensionalData: name
1919

20-
export concatenatecubes, caxes, subsetcube, readcubedata, renameaxis!, YAXArray, setchunks
20+
export concatenatecubes, caxes, subsetcube, readcubedata, renameaxis!, YAXArray, setchunks, cache
2121

2222
"""
2323
This function calculates a subset of a cube's data
@@ -179,6 +179,7 @@ function Base.permutedims(c::YAXArray, p)
179179
newchunks = DiskArrays.GridChunks(eachchunk(c).chunks[collect(dimnums)])
180180
YAXArray(newdims, newdata, c.properties, newchunks, c.cleaner)
181181
end
182+
DiskArrays.cache(a::YAXArray;maxsize=1000) = DD.rebuild(a,cache(a.data;maxsize))
182183

183184
# DimensionalData overloads
184185

src/DatasetAPI/Datasets.jl

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,15 @@ function Base.getindex(x::Dataset, i::Vector{Symbol})
145145
cubesnew = [j => x.cubes[j] for j in i]
146146
Dataset(; cubesnew...)
147147
end
148+
function DiskArrays.cache(ds::Dataset;maxsize=1000)
149+
#Distribute cache size equally across cubes
150+
maxsize = maxsize ÷ length(ds.cubes)
151+
cachedcubes = OrderedDict{Symbol,YAXArray}(
152+
k => DiskArrays.cache(ds.cubes[k];maxsize) for k in keys(ds.cubes)
153+
)
154+
Dataset(cachedcubes,ds.axes,ds.properties)
155+
end
156+
148157

149158
function fuzzyfind(s::String, comp::Vector{String})
150159
sl = lowercase(s)

0 commit comments

Comments
 (0)