Reference

class multitables.Streamer(filename, **kw_args)

Provides methods for streaming data out of HDF5 files.

class Queue(request_pool, stop, block_size)

Abstract queue that is backed by the internal circular buffer.

close()

Signals to the background processes to stop, and closes the queue.

get()

Get the next element from the queue of data. This method returns a guard object that synchronises access to the underlying buffer. The guard, when placed in a with statement, returns a reference to the next available element in the buffer. This method blocks until data is available.

Returns:A guard object that returns a reference to the element.
iter()

Convenience method for easy iteration over elements in the queue. Each iteration of the iterator will block until an element is available to be read.

Returns:An iterator for the queue.
get_generator(path, n_procs=None, read_ahead=None, cyclic=False, block_size=None, ordered=False, field=None, remainder=True)

Get a generator that allows convenient access to the streamed data. Elements from the dataset are returned from the generator one row at a time. Unlike the direct access queue, this generator also returns the remainder elements. Additional arguments are forwarded to get_queue. See the get_queue method for documentation of these parameters.

Parameters:path
Returns:A generator that iterates over the rows in the dataset.
get_queue(path, n_procs=None, read_ahead=None, cyclic=False, block_size=None, ordered=False, field=None, remainder=False)

Get a queue that allows direct access to the internal buffer. If the dataset to be read is chunked, the block_size should be a multiple of the chunk size to maximise performance. In this case it is best to leave it to the default. When cyclic=False, and block_size does not divide the dataset evenly, the remainder elements will not be returned by the queue. When cyclic=True, the remainder elements will be part of a block that wraps around the end and includes element from the beginning of the dataset. By default, blocks are returned in the order in which they become available. The ordered option will force blocks to be returned in on-disk order.

Parameters:
  • path – The HDF5 path to the dataset that should be read.
  • n_procs – The number of background processes used to read the datset in parallel.
  • read_ahead – The number of blocks to allocate in the internal buffer.
  • cyclic – True if the queue should wrap at the end of the dataset.
  • block_size – The size along the outer dimension of the blocks to be read. Defaults to a multiple of the chunk size, or to a 128KB sized block if the dataset is not chunked.
  • ordered – Force the reader return data in on-disk order. May result in performance penalty.
  • field – The field or column name to read. If omitted, all fields/columns are read.
  • remainder – Also return the remainder elements, these will be returned as array smaller than the block size.
Returns:

A queue object that allows access to the internal buffer.

get_remainder(path, block_size)

Get the remainder elements. These elements will not be read in the direct queue access cyclic=False mode.

Parameters:
  • path – The HDF5 path to the dataset to be read.
  • block_size – The block size is used to calculate which elements will remain.
Returns:

A copy of the remainder elements as a numpy array.

class multitables.Reader(filename, n_procs=4, notify=None, **kw_args)

Provides methods for random access of HDF5 datasets.

close(wait=False)

Close the reader. After this point, no more requests can be made. Pending requests will still be fulfilled. Any attempt to made additional requests will raise an exception. Once all requests have been fulfilled, the background processes and threads will be shut down.

Parameters:wait – If True, block until all background threads/processes have shut down. False by default.
get_dataset(path)

Create a dataset proxy that can be used to create requests. :param path: The internal HDF5 path to the dataset within the HDF5 file. :return: A dataset proxy object.

request(key, stage)

Generate and queue a request. The details of the request should be provided in the key argument, through operations on one of the dataset proxy objects generated by get_dataset. The result of the request will be stored in the provided stage. A request object will be returned, which can be used to wait on the result and access the result when it is ready. :param key: Operations created by a dataset proxy. :param stage: A stage or stage pool in which the result will be stored. :return: A request object.

stop()

Stop the reader. All background processes and threads will immediately shut down. This will invalidate all pending requests. Attempts to access pending requests, or already waiting requests will raise an exception stating that the reader has stopped.

class multitables.RequestPool

A helper class for managing a pool of requests.

add(req)

Add a request to the pool. :param req: An object instance that should be place in the pool.

next()

Get the next object in the pool. Blocks until an object is available. :return: The next object in the pool.

exception multitables.QueueClosedException
exception multitables.SubprocessException

Base class for forwarding exceptions that happen inside a subprocess.

exception multitables.SharedMemoryError
class multitables.dataset.TableDataset(reader, path, dtype, shape)

Proxy for dataset operations on pytables Tables.

col(name)

Proxy a column retrieval operation. The interface for this method is equivalent to the pytables method of the same name.

read(start=None, stop=None, step=None, field=None)

Proxy a read operation. The interface for this method is equivalent to the pytables method of the same name.

read_coordinates(coords, field=None)

Proxy a coordinate read operation. The interface for this method is equivalent to the pytables method of the same name.

read_sorted(sortby, checkCSI=False, field=None, start=None, stop=None, step=None)

Proxy a sorted read operation. The interface and requirements for this method are equivalent to the pytables method of the same name.

where(condition, condvars=None, start=None, stop=None, step=None)

Proxy a conditional selection operations. The interface for this method are equivalent to the pytables method of the same name.

class multitables.dataset.ArrayDataset(reader, path, dtype, shape)
read(start=None, stop=None, step=None)

Proxy a read operation. The interface for this method is equivalent to the pytables method of the same name.

class multitables.dataset.VLArrayDataset(reader, path, dtype, shape)
read(start=None, stop=None, step=None)

Proxy a read operation. The interface for this method is equivalent to the pytables method of the same name.

class multitables.request.Request(details, stage)

Public interface for managing requests.

get()

A safe method for accessing the result of the request. This method makes a copy of the result and returns it. This copy can be used in any fashion, as it no longer has resource contraints. :return: A copy of the result of the request.

get_direct(action)

A safer method for directly accessing the shared memory. This method blocks until the request is fulfilled. Once ready, it called the provided action function with a direct reference to the shared memory as an argument. Care should be taken that this direct reference does not leave the scope of the function, or else the problems enumerated in the get_unsafe context manager may result.

Parameters:action – A function that takes one argument, which will be supplied as a direct reference to the shared memory.
get_proxy()

A safe context manager for indirectly accessing the shared memory. This manager waits until the request is fulfilled. Once ready, it yields a proxy to the underlying shared memory. Once the context manager expires, the proxy will be released, and access to the shared memory is no longer possible. Any attempt to access the shared memory past this point raises an exception.

get_unsafe()

A context manager for accessing the result of the request directly. This manager waits until the request is fulfilled. Once ready, it yields a direct reference to the underlying shared memory. If an exception was raised when fielding this request, the exception is re-raised here. Use of this context manager can be unsafe, as it yields a direct reference to the shared memory. If this reference is not properly managed, it can lead to a dangling pointer that causes an exception when the associated stage is closed. The contents of this dangling pointer will also change when the associated stage is re-used for another request. It is recommended to use a safer access method, or immediately delete or set to None the local variable bound to the yielded reference after use.