API overview

The entire organization of pyRSKtools revolves around the RSK class. In fact, most (if not all) ancillary classes you will find later in this API reference, like those documented in the Channels and Datatypes sections, are used by the RSK class internally. This section serves to provide insight into the inner workings of the RSK class.

RSK attributes

The RSK class has several instance attributes that we have split into three logical groups. Logically grouping attributes allows us to explain how those within a group relate to one another and how each group differs in utility. Below, we summarize each grouping, if you are looking for information about attributes within each group, please refer to the RSK class documentation.

Internal state:

Internal state attributes hold metadata relating to the RSK class itself. For example, RSK.version holds the current pyRSKtools version, while RSK.logs is used to log information about the actions/methods conducted/invoked during the lifetime of an RSK class instance.

Informational:

Informational attributes hold metadata relating to an RSK file. They are populated with (meta)data read in from the opened RSK file you are dealing with. For example, RSK.dbInfo and RSK.instrument are populated when you invoke RSK.open(), they respectively contain information about the RSK database (e.g., version and type) and the instrument (e.g., serialID and model) the RSK file pertains to.

It is worth noting that, although these attributes contain a large amount of information about an RSK file, they primarily will be used internally by methods already provided by the RSK class. Despite this, we keep them accessible to the curious user or for any potential advanced/custom development.

Computational:

Computational attributes hold the sample/channel data contained within an RSK. For example, the RSK.data and RSK.channelNames computational attributes are populated by the RSK.readdata() method, they respectively contain data of the RSK file and the channel names used to index into said data.

Importantly, computational fields such as RSK.data are exposed as NumPY structured arrays. Below we provide a brief overview of a few key NumPY concepts to help pyRSKtools users get started, we recommend checking out the official NumPY reference documentation for more information.

A brief NumPY review

The two key NumPY concepts pyRSKtools users should get familiar with are structured arrays and datetime64 objects.

Structured arrays:

The NumPY structured array is a convenient datatype that allows users to efficiently store heterogeneous compound/composite data in a way that can be easily accessed/indexed via named labels.

To manually create a structured array, users must specify a properly formed dtype argument when creating a standard NumPY array type.

import numpy as np

data = np.array(
    [
        (1660571192060, 42.784, 22.93, 9.96),
        (1660571192065, 42.785, 22.92, 9.95),
    ],
    dtype=[
        ("timestamp", "datetime64[ms]"),
        ("conductivity", "float64"),
        ("temperature", "float64"),
        ("pressure", "float64"),
    ],
)

The above example creates a structured array with four labeled columns and two rows of data. The values along a given column may now be accessed by their respective labels, as shown below:

timestamps = data["timestamp"] # = ['2022-08-15T13:46:32.060', '2022-08-15T13:46:32.065']
c = data["conductivity"]       # = [42.784, 42.785]
t = data["temperature"]        # = [22.93, 22.92]
d = data["pressure"]           # = [9.96, 9.95]

Important: indexing a structured array by number will yield the entire row (starting from index 0), not a column. To access a specific value of a row from a given column, simply specify the row and column name. See the examples below:

data[0]                                 # = ('2022-08-15T13:46:32.060', 42.784, 22.93, 9.96)
data[0]['conductivity']                 # = 42.784
data["conductivity"][0]                 # = 42.784 (equivalent to above)
data[0][["conductivity", "pressure"]]   # = (42.784, 9.96)
data.dtype.names                        # = ('timestamp', 'conductivity', 'temperature', 'pressure')

Given that RSK data consists of multiple samples, each of which havs a fixed number of channels, structured arrays become and convenient way to store data in pyRSKtools. If you were to refer back to our getting started guide, you may find it more apparent that a structured array underpins RSK.data and RSK.channelNames simply returns all the channel (dtype) names of RSK.data (excluding the “timestamp” column).

Datetime64 objects:

In the code examples above, you may have noticed that the “timestamp” field was given the type datetime64[ms]; a NumPY datetime64 object. The NumPY datetime64 is used throughout pyRSKtools for representation, conversion, and processing of any date/time related fields, including the timestamp of each sample in RSK.data.

Examples of manually creating datetime64 objects are given below:

# Using the standard ISO 8601 format (precision of seconds in this example)
dt = np.datetime64("2022-08-15T11:18:34")
# Convert to a 64-bit unsigned integer
seconds = dt.astype(np.uint64)
# Using milliseconds
np.datetime64(1660562314000, "ms")