wiki:PartialUpdates

Partial updates in rasdaman

Often very large data files need to be inserted in rasdaman, which don't fit in main memory. One way to insert such a large file is to split it into smaller parts, and then import each part one by one via partial updates, until the initial image is reconstructed in rasdaman.

This is done in two steps: initializing an MDD in a collection, and inserting each part in this MDD.

Initialization

Updates replace an area in a target MDD object with the data from a source MDD object, so first the target MDD object needs to be initialized in a collection. To initialize an MDD object it's sufficient to insert an MDD object of size 1 (a single point) to the collection:

insert into Coll values marray it in [0:0,0:0,...] values 0

Inserting some part of the data would work as well.

Partial updates

After we have an MDD initialized in the collection, we can continue with updating it with the individual parts using the update statement in rasql.

The update statement has the following syntax:

update collName as collIterator
set updateSpec assign mddExpr
[ where booleanExpr ]

Each element of the set named collName which fulfils the selection predicate booleanEpxr gets assigned the result of mddExpr. updateSpec can optionally contain a restricting interval for the area which can be updated, so both collIterator or collIterator[mintervalExpr] can be used.

mddExpr can be any expression that results in an MDD object M, like an marray construct, a format conversion function, etc. The position where M will be placed in the target MDD (collIterator) is determined by the spatial domain of M. When importing data in some format via the inv_* functions, by default the resulting MDD has a spatial domain of [0:width,0:height,..], which will place M at [0,0,..] in the target MDD. In order to place it in a different position, the spatial domain of M has to be explicitly set via the --mdddomain parameter of rasql, or by using the shift function in the query, for example:

rasql -q "update Coll as c set c assign inv_tiff($1)" --file in.tif --mddtype MddType --mdddomain "[100:200,100:200]"

is equivalent to

rasql -q "update Coll as c set c assign shift(inv_tiff($1),[100,100])" --file in.tif

Dynamically expanding MDDs

The update statement allows to dynamically expand MDDs (up to the limits of the MDD type if any have been specified), so it's not necessary to fully materialize an MDD.

When the MDD is first initialized with:

insert into Coll values marray it in [0:0,0:0,...] values 0

it has a spatial domain of [0:0,0:0,...] and only one point is materialized in the database. Updating this MDD later on, further expands the spatial domain when the source array M doesn't intersect any already materialized parts in the target array T (to be changed with ticket #123). Three cases can be distinguished (M is the red rectangle, and T is orange on the below figures, with T being an MDD already materialized in the database):

  1. M is within T: in this case the corresponding area of T is replaced by M, and the spatial domain stays the same.

  1. M is completely outside of T: M is materialized and becomes part of the MDD object T, and the spatial domain (dashed line) accordingly expands.

Note that in this case after the update, T will have some non-materialized empty parts; attempting to access these parts only will result in a "Specified domain does not intersect with spatial domain of MDD" error. This error is a bug, which will be fixed to return appropriate null values (0 usually) in ticket #122. Until this bug is fixed filling the non-materialized empty parts with 0 (or another null value) with an marray statement is necessary, if it's expected that queries will access these parts, e.g:

update Coll as c
set c assign marray it in [non-materialized bounding box] values 0
  1. M intersects T: here only the part of T that intersects M will get updated, without materializing the rest and expanding the spatial domain. This will be fixed with ticket #123 to materialize the non-overlapping area as well, accordingly updating the spatial domain.

An update of this type in 3D MDDs when there's slicing involved in order to specify the target domain, results in a "rasdaman error 202: Exception: Index violation ( index range [$low,$high], index $index )." error. This will be fixed in ticket #73

Example: 3D timeseries

Create the 3D collection first (suppose it's for arrays of type float):

create collection Coll FloatSet3

Initialize an array with a single cell in the collection:

insert into Coll values marray it in [0:0,0:0,0:0] values 0f

Update array with data at the first time slice:

rasql -q "update Coll as c set c[0,*:*,*:*] assign inv_tiff($1)" --file data_1.tif

Update array with data at the second time slice:

rasql -q "update Coll as c set c[1,*:*,*:*] assign inv_tiff($1)" --file data_2.tif

And so on.

Example: 3D cube of multiple 3D arrays

In this case we build a 3D cube by concatenating multiple smaller 3D cubes along a certain dimension, in other words we build a 3D mosaic.

Create the 3D collection first (suppose it's for arrays of type float):

create collection Coll FloatSet3

Initialize an array with a single cell in the collection:

insert into Coll values marray it in [0:0,0:0,0:0] values 0f

Update array with the first cube, which has itself sdom [0:3,0:100,0:100]:

rasql -q 'update Coll as c set c[0:3,0:100,0:100] assign decode($1, "netcdf")' --file data_1.nc

After this Coll has sdom [0:3,0:100,0:100].

Update array with the second cube, which has itself sdom [0:5,0:100,0:100]; note that now we want to place this one on top of the first one with respect to the first dimension, so its origin must be shifted by 5 so that its sdom will be in effect [5:10,0:100,0:100]:

rasql -q 'update Coll as c set c[5:10,0:100,0:100] assign shift(decode($1, "netcdf"), [5,0,0])' --file data_2.nc

The sdom of Coll is now [0:10,0:100,0:100].

Update array with the third cube, which has itself sdom [0:2,0:100,0:100]; note that now we want to place this one next to the first two with respect to the second dimension and a bit higher by 5 pixels, so that its sdom will be in effect [5:7,100:200,0:100]:

rasql -q 'update Coll as c set c[5:7,100:200,0:100] assign shift(decode($1, "netcdf"), [5,100,0])' --file data_3.nc

The sdom of Coll is now [0:10,100:200,0:100].

Last modified 7 months ago Last modified on Nov 4, 2016 8:17:42 AM

Attachments (3)

Download all attachments as: .zip