h5flow.data
- h5flow.data.lib.dereference(sel, ref, data=None, region=None, mask=None, ref_direction=(0, 1), indices_only=False, as_masked=True)[source]
Load
datareferred to byrefthat corresponds to the desired positions specified insel.- Parameters
sel – iterable of indices, an index, or a
sliceto match againstref[:,ref_direction[0]]. Return value will have same first dimension assel, e.g.dereference(slice(100), ref, data).shape[0] == 100ref – a shape (N,2)
h5py.Datasetor array of pairs of indices linkingselanddatadata – a
h5py.Datasetor array to load dereferenced data from, can be omitted ifindices_only==Trueregion – a 1D
h5py.Datasetor array with a structured array type of [(‘start’,’i8’), (‘stop’,’i8’)]; ‘start’ defines the earliest index within therefdataset for each value insel, and ‘stop’ defines the last index + 1 within therefdataset (optional). If ah5py.Datasetis used, theselspec will be used to load data from the dataset (i.e.region[sel]), otherwiselen(sel) == len(region)and a 1:1 correspondence is assumedmask – mask off specific items in selection (boolean, True == don’t dereference selection), len(mask) == len(np.r_[sel])
ref_direction – defines how to interpret second dimension of
ref.ref[:,ref_direction[0]]are matched against items insel, andref[:,ref_direction[1]]are indices into thedataarray (default=(0,1)). So for a simple example:dereference([0,1,2], [[1,0], [2,1]], ['A','B','C','D'], ref_direction=(0,1))returns an array equivalent to[[],['A'],['B']]anddereference([0,1,2], [[1,0], [2,1]], ['A','B','C','D'], ref_direction=(1,0))returns an array equivalent to[['B'],['C'],[]]indices_only – if
True, only returns the indices intodata, does not fetch data fromdata
- Returns
numpymasked array (or ifas_masked=Falsealist) of length equivalent tosel
- h5flow.data.lib.dereference_chain(sel, refs, data=None, regions=None, mask=None, ref_directions=None, indices_only=False)[source]
Load a “chain” of references. Allows traversal of multiple layers of references, e.g. for three datasets
A,B, andClinkedA->B->C. One can use a selection inAand load theCdata associated with it.Example usage:
sel = slice(0,100) refs = [f['A/ref/B/ref'], f['C/ref/B/ref']] ref_dirs = [(0,1), (1,0)] data = f['C/data'] regions = [f['A/ref/B/ref_region'], f['B/ref/C/ref_region']] mask = np.r_[sel] > 50 c_data = dereference_chain(sel, refs, data, regions=regions, mask=mask, ref_directions=ref_dirs) c_data.shape # (100, max_a2b_assoc, max_b2c_assoc)
- Parameters
sel – iterable of indices, a slice, or an integer, see
selargument indereferencerefs – a list of reference datasets to load, in order, see
refargument indereferencedata – a dataset to load dereferenced data from, optional if
indices_only=Trueregions – lookup table into
refsfor each selection, seeregionargument indereferencemask – a boolean mask into the first selection, true will not load the entry
ref_directions – intepretation of reference datasets, see
ref_directionargument indereferenceindices_only – flag to skip loading the data and instead just return indices into the final dataset
- class h5flow.data.h5flow_data_manager.H5FlowDataManager(filepath, mode='a', mpi=True, drop_list=None)[source]
Coordinates access to the output data file across multiple processes.
To initialize:
hfdm = H5FlowDataManager(<path to file>, mode=<'r'/'a'>, mpi=<True/False>)
Opening and closing the underlying resource is handled automatically when using the dedicated file access API, e.g.:
hfdm.dset_exists(...) hfdm.create_dset(...) hfdm.get_ref(...) hfdm.reserve_data(...) hfdm.write_ref(...) hfdm[...] ...
- attr_exists(name, key)[source]
Check if attribute
keyexists forname- Parameters
name –
strpath to object, e.g.stage0/obj0orstage0key –
strattribute name
- Returns
Trueif attribute exists
- create_dset(dataset_name, dtype, shape=())[source]
Create a 1D dataset of
dataset_namewith datatypedtype, if it doesn’t already exist- Parameters
dataset_name –
strpath to dataset, e.g.stage0/obj0dtype –
np.dtypeof dataset, can be a structured dtype
- create_ref(parent_dataset_name, child_dataset_name)[source]
Create a 1D dataset of references of
parent_dataset_name -> child_dataset_name, if it doesn’t already exist. Both datasets must already exist.- Parameters
parent_dataset_name –
strpath to parent dataset, e.g.stage0/obj0child_dataset_name –
strpath to child dataset, e.g.stage0/obj1
- delete(name)[source]
Delete object at and references to
name. Ignored if path is in temp file.- Parameters
name –
strpath to dataset to be deleted
- dset_exists(dataset_name)[source]
Check if data object of
dataset_nameexists- Parameters
dataset_name –
strpath to dataset, e.g.stage0/obj0- Returns
Trueif data object exists
- exists(path)[source]
Check if a path exists
- Parameters
path –
strpath to check- Returns
Trueif path is present
- property fh
Direct access to the underlying h5py
Fileobject. Not recommended for use. Instead, useget_dset(...),write_data(...), or the implemented__getitem__().
- get_attrs(name)[source]
Get attributes of
name- Parameters
name –
strpath to object, e.g.stage0- Returns
h5py.AttributeManager
- get_dset(dataset_name)[source]
Get dataset of
dataset_name- Parameters
dataset_name –
strpath to dataset, e.g.stage0/obj0- Returns
h5py.Dataset, e.g.stage0/obj0/data
- get_ref(parent_dataset_name, child_dataset_name)[source]
Get references of
parent_dataset_name -> child_dataset_name- Parameters
parent_dataset_name –
strpath to parent dataset, e.g.stage0/obj0child_dataset_name –
strpath to child dataset, e.g.stage0/obj1
- Returns
tupleofh5py.Dataset, reference direction; e.g.(stage0/obj0/ref/stage0/obj1/ref, (0,1))
- get_ref_region(parent_dataset_name, child_dataset_name)[source]
Get reference lookup regions for
parent_dataset_name -> child_dataset_name- Parameters
parent_dataset_name –
strpath to parent dataset, e.g.stage0/obj0child_dataset_name –
strpath to child dataset, e.g.stage0/obj1
- Returns
h5py.Dataset,stage0/obj0/ref/stage0/obj1/ref_region, (0,1)
- ref_exists(parent_dataset_name, child_dataset_name)[source]
Check if references for
parent_dataset_name -> child_dataset_nameexists- Parameters
parent_dataset_name –
strpath to parent dataset, e.g.stage0/obj0child_dataset_name –
strpath to child dataset, e.g.stage0/obj1
- Returns
Trueif references exists
- ref_region_exists(parent_dataset_name, child_dataset_name)[source]
Check if reference table for
parent_dataset_name -> child_dataset_nameexists- Parameters
parent_dataset_name –
strpath to parent dataset, e.g.stage0/obj0child_dataset_name –
strpath to child dataset, e.g.stage0/obj1
- Returns
Trueif reference table exists
- reserve_data(dataset_name, spec)[source]
Coordinate access into
dataset_name. Depending on the type ofspeca different access mode will be performed:int: access in append mode - will grant access tospecrows at the end of the datasetsliceor list ofintor list ofslice: access a specific section(s) of the dataset - will resize dataset if section does not exist
- Parameters
dataset_name –
strpath to dataset, e.g.stage0/obj0spec – see function description
- Returns
sliceintodataset_namewhere access is given
- set_attrs(name, **attrs)[source]
Update attributes of
name. Attributekey: valueare passed in as additional keyword arguments- Parameters
name –
strpath to object, e.g.stage0
- write_data(dataset_name, spec, data)[source]
Write
dataintodataset_nameatspec- Parameters
dataset_name –
strpath to dataset, e.g.stage0/obj0spec –
sliceintodataset_nameto writedatadata – numpy array or iterable to write
- write_ref(parent_dataset_name, child_dataset_name, refs)[source]
Add refs for
parent_dataset_name -> child_dataset_name. Note that references are never updated and can’t be removed after they are created.- Parameters
refs – an integer array of shape (N,2) with refs[:,0] corresponding to the index in the parent dataset and refs[:,1] corresponding to the index in the child dataset