Classes

TopoStats uses its own Pydantic Data Classes to store the original data, configuration settings and derived datasets. These are described in detail in the API. Using Pydantic means that data validation, checking that an attribute which is meant to be an integer is an integer, or that an attribute that is meant to be a Numpy array actually is, is done automatically. This makes code development much easier as the automated validation avoids errors that can arise when passing data of the wrong "type" into dataclasses that is inherent in dynamically typed languages such as Python.

This page aims to...

Give an overview of the classes and their attributes.
Details how to go about adding classes and all of the additional changes that are required.
How to load .topostats files and reconstruct the classes.

Class Overview

Class	Description
`TopoStats`	Top-level object, all other classes are attributes of this class.
`GrainCrop`	A region of an image that contains a "grain" (typically a molecule of some description).
`DisorderedTrace`	Outline of a grain, but without ordering of coordinates.
`Node`	A junction in a grain where one or more molecules overlap.
`OrderedTrace`	Outline of a grain, with coordinates ordered.
`MatchedBranch`
`UnmatchedBranch`
`Molecule`	A sub-unit of a grain after de-tangling overlapping molecules.

The attributes of each class are described below. The type of an attribute is noted which informs you what the value should be. In many cases the attributes can also be None which is for convenience so that intermediary objects can be constructed, for example when first creating a TopoStats object after reading it from disk there will be no grain_crops to include because the flattening and grain detection has not been performed.

`TopoStats`

Attribute	Type	Description
`grain_crops`	`dict[int, GrainCrop]`	A dictionary of `GrainCrop` objects detected within the image.
`filename`	`str`	The filename the image was loaded from.
`pixel_to_nm_scaling`	`str`	Pixel to nanometre scaling, typically derived from the image itself.
`img_path`	`str`	Original path to image.
`image`	`npt.NDArray`	Flattened image (post `Filter()`).
`image_original`	`npt.NDArray`	Original image.
`full_mask_tensor`	`npt.NDArray`	Tensor mask for the full image.
`topostats_version`	`str`	TopoStats version the image was last processed with.
`config`	`dict[str, Any]`	Configuration used when processing the grain.

`GrainCrop`

Attribute	Type	Description
`image`	`npt.NDArray[np.float32]`	2-D Numpy array of the cropped image.
`mask`	`npt.NDArray[np.bool_]`	3-D Numpy tensor of the cropped mask.
`padding`	`int`	Padding added to the bounding box of the grain during cropping.
`bbox`	`tuple[int, int, int, int]`	Bounding box of the crop including padding.
`pixel_to_nm_scaling`	`float`	Pixel to nanometre scaling factor for the crop.
`thresholds`	`float`	Thresholds used to find the grain.
`filename`	`str`	Filename of the image from which the crop was taken.
`skeleton`	`npt.NDArray[np.bool_]`	3-D Numpy tensor of the skeletonised mask.
`height_profiles`	`dict[int, [int, npt.NDArray[np.float32]]]`	3-D Numpy tensor of the height profiles.
`stats`	`dict[int, dict[int, Any]]`	Dictionary of grain statistics.
`disordered_trace`	`DisorderedTrace`	A disordered trace for the grain.
`nodes`	`dict[str, Nodes]`	Dictionary of grain nodes.
`ordered_trace`	`OrderedTrace`	An ordered trace for the grain.
`threshold_method`	`str`	Threshold method used to find grains.`

`DisorderedTrace`

Attribute	Type	Description
`images`	`dict[str: npt.NDArray]`	Dictionary of images generated during disordered tracing, should include ''pruned_skeleton''.
`grain_endpoints`	`int`	Number of Grain endpoints.
`grain_junctions`	`int`	Number of Grain junctions.
`total_branch_length`	`float`	Total branch length in nanometres.
`grain_width_mean`	`float`	Mean grain width in nanometres.

`Node`

Attribute	Type	Description
`error`	`bool`	Whether an error occurred calculating statistics for this node.
`pixel_to_nm_scaling`	`np.float64`	Pixel to nanometre scaling.
`branch_stats`	`dict[int, MatchedBranch]`	Dictionary of branch statistics.
`unmatched_branch_stats`	`dict[int, UnMatchedBranch]`	Dictionary of unmatched branch statistics.
`node_coords`	`npt.NDArray[np.int32]`	Numpy array of node coordinates.
`confidence`	`np.float64`	Confidence in ???.
`reduced_node_area`	`???`	Reduced node area.
`node_area_skeleton`	`npt.NDArray[np.int32]`	Numpy array of skeleton.
`node_branch_mask`	`npt.NDArray[np.int32]`	Numpy array of branch mask.
`node_avg_mask`	`npt.NDArray[np.int32]`	Numpy array of averaged mask.`

`OrderedTrace`

Attribute	Type	Description
`molecule_data`	`dict[int, Molecule]`	Dictionary of ordered trace data for individual molecules within the grain indexed by molecule number.
`tracing_stats`	`dict`	Tracing statistics.
`grain_molstats`	`Any`	Grain molecule statistics.
`molecules`	`int`	Number of molecules within the grain.
`writhe`	`str`	The writhe sign, can be either `+`, `-` or `0` for positive, negative or no writhe.
`pixel_to_nm_scaling`	`np.float64`	Pixel to nm scaling.
`images`	`dict[str, npt.NDArray]`	Dictionary of diagnostic images for debugging.
`error`	`bool`	Errors encountered?

`MatchedBranch`

Attribute	Type	Description
`ordered_coords`	`npt.NDArray[np.int32]`	Numpy array of ordered coordinates.
`heights`	`npt.NDArray[np.number]`	Numpy array of heights.
`distances`	`npt.NDArray[np.number]`	Numpy array of distances.
`fwhm`	`float`	Full-width half maximum.
`fwhm_half_maxs`	`list[float]`	Half-maximums from a matched branch.
`fwhm_peaks`	`list[float]`	Peaks from a matched branch.
`angles`	`float`	Angle between branches.`

`UnmatchedBranch`

Attribute	Type	Description
`angles`	`float`	Angle between branches.

`Molecule`

Class for Molecules identified during ordered tracing.

Attribute	Type	Description
`circular`	`str, bool`	Whether the molecule is circular or linear.
`topology`	`str`	Topological classification of the molecule.
`topology_flip`	`Any`	Unknown?
`ordered_coords`	`npt.NDArray`	Ordered coordinates for the molecule.
`splined_coords`	`npt.NDArray`	Smoothed (aka splined) coordinates for the molecule.
`contour_length`	`float`	Length of the molecule.
`end_to_end_distance`	`float`	Distance between ends of molecule. Will be `0.0` for circular molecules which don't have ends.
`heights`	`npt.NDArray`	Height along molecule.
`distances`	`npt.NDArray`	Distance between points on the molecule.
`curvature_stats`	`npt.NDArray, optional`	Angle changes along molecule. NB - These are always positive due to use of `np.abs()` during calculation.
`bbox`	`tuple[int, int, int, int]`	Bounding box.

Hierarchical Structure

The top-level object is always the TopoStats class, the remaining objects are nested within. The key attribute being TopoStats.grain_crops which holds a dictionary of GrainCrops, each of which in turn holds various attributes, often dictionaries, of other attributes such as DisorderedTrace, Nodes, OrderedTrace and in turn Molecule nested within.

The structure of an image with a single grain and all attributes is shown below.

output/processed/minicircle.topostats
├config
│ └ ...
├filename
├full_image_plots
| └
│ ├all_molecules
│ ├branch_indexes
│ ├branch_types
│ ├connected_nodes
│ ├convolved_skeletons
│ ├node_centres
│ ├ordered_traces
│ ├over_under
│ ├pruned_skeleton
│ ├skeleton
│ ├smoothed_mask
│ └trace_segments
├full_mask_tensor
├grain_crops
│ ├0
│ │ ├bbox
│ │ ├convolved_skeleton
│ │ ├disordered_trace
│ │ │ ├grain_endpoints
│ │ │ ├grain_junctions
│ │ │ ├grain_width_mean
│ │ │ ├images
│ │ │ │ ├branch_indexes
│ │ │ │ ├branch_types
│ │ │ │ ├grain
│ │ │ │ ├image
│ │ │ │ ├pruned_skeleton
│ │ │ │ ├skeleton
│ │ │ │ └smoothed_mask
│ │ │ ├stats
│ │ │ │ ├0
│ │ │ │ │ ├basename
│ │ │ │ │ ├branch_distance
│ │ │ │ │ ├branch_type
│ │ │ │ │ ├connected_segments
│ │ │ │ │ ├image
│ │ │ │ │ ├mean_pixel_value
│ │ │ │ │ ├median_value
│ │ │ │ │ ├middle_value
│ │ │ │ │ ├min_value
│ │ │ │ │ └stdev_pixel_value
│ │ │ │ ├1
│ │ │ │ │ ├basename
│ │ │ │ │ ├branch_distance
│ │ │ │ │ ├branch_type
│ │ │ │ │ ├connected_segments
│ │ │ │ │ ├image
│ │ │ │ │ ├mean_pixel_value
│ │ │ │ │ ├median_value
│ │ │ │ │ ├middle_value
│ │ │ │ │ ├min_value
│ │ │ │ │ └stdev_pixel_value
│ │ │ │ └2
│ │ │ │   ├basename
│ │ │ │   ├branch_distance
│ │ │ │   ├branch_type
│ │ │ │   ├connected_segments
│ │ │ │   ├image
│ │ │ │   ├mean_pixel_value
│ │ │ │   ├median_value
│ │ │ │   ├middle_value
│ │ │ │   ├min_value
│ │ │ │   └stdev_pixel_value
│ │ │ └total_branch_length
│ │ ├filename
│ │ ├height_profiles
│ │ │ └1
│ │ │   └0
│ │ ├image
│ │ ├mask
│ │ ├nodes
│ │ │ └0
│ │ │   ├branch_stats
│ │ │   │ ├0
│ │ │   │ │ ├angles
│ │ │   │ │ ├branch_statistics
│ │ │   │ │ │ ├basename
│ │ │   │ │ │ ├fwhm
│ │ │   │ │ │ ├fwhm_half_maxs
│ │ │   │ │ │ ├fwhm_peaks
│ │ │   │ │ │ └image
│ │ │   │ │ ├distances
│ │ │   │ │ ├fwhm
│ │ │   │ │ ├fwhm_half_maxs
│ │ │   │ │ ├fwhm_peaks
│ │ │   │ │ ├heights
│ │ │   │ │ └ordered_coords
│ │ │   │ └1
│ │ │   │   ├angles
│ │ │   │   ├branch_statistics
│ │ │   │   │ ├basename
│ │ │   │   │ ├fwhm
│ │ │   │   │ ├fwhm_half_maxs
│ │ │   │   │ ├fwhm_peaks
│ │ │   │   │ └image
│ │ │   │   ├distances
│ │ │   │   ├fwhm
│ │ │   │   ├fwhm_half_maxs
│ │ │   │   ├fwhm_peaks
│ │ │   │   ├heights
│ │ │   │   └ordered_coords
│ │ │   ├confidence
│ │ │   ├error
│ │ │   ├node_area_skeleton
│ │ │   ├node_avg_mask
│ │ │   ├node_branch_mask
│ │ │   ├node_coords
│ │ │   ├pixel_to_nm_scaling
│ │ │   ├unmatched_branch_stats
│ │ │   │ ├0
│ │ │   │ │ └angles
│ │ │   │ ├1
│ │ │   │ │ └angles
│ │ │   │ ├2
│ │ │   │ │ └angles
│ │ │   │ └3
│ │ │   │   └angles
│ │ │   └writhe
│ │ ├ordered_trace
│ │ │ ├images
│ │ │ │ ├all_molecules
│ │ │ │ ├ordered_traces
│ │ │ │ ├over_under
│ │ │ │ └trace_segments
│ │ │ ├molecule_data
│ │ │ │ └0
│ │ │ │   ├bbox
│ │ │ │   ├circular
│ │ │ │   ├contour_length
│ │ │ │   ├curvature_stats
│ │ │ │   ├distances
│ │ │ │   ├end_to_end_distance
│ │ │ │   ├heights
│ │ │ │   ├molecule_statistics
│ │ │ │   │ ├circular
│ │ │ │   │ ├contour_length
│ │ │ │   │ ├end_to_end_distance
│ │ │ │   │ ├topology
│ │ │ │   │ └topology_flip
│ │ │ │   ├ordered_coords
│ │ │ │   ├splined_coords
│ │ │ │   ├topology
│ │ │ │   └topology_flip
│ │ │ └molecule_statistics
│ │ │   └0
│ │ │     ├circular
│ │ │     ├contour_length
│ │ │     ├end_to_end_distance
│ │ │     ├topology
│ │ │     └topology_flip
│ │ ├padding
│ │ ├pixel_to_nm_scaling
│ │ ├skeleton
│ │ ├stats
│ │ │ └1
│ │ │   └0
│ │ │     ├area
│ │ │     ├area_cartesian_bbox
│ │ │     ├aspect_ratio
│ │ │     ├centre_x
│ │ │     ├centre_y
│ │ │     ├height_max
│ │ │     ├height_mean
│ │ │     ├height_median
│ │ │     ├height_min
│ │ │     ├max_feret
│ │ │     ├mean_crossing_confidence
│ │ │     ├min_crossing_confidence
│ │ │     ├min_feret
│ │ │     ├num_crossings
│ │ │     ├num_mols
│ │ │     ├radius_max
│ │ │     ├radius_mean
│ │ │     ├radius_median
│ │ │     ├radius_min
│ │ │     ├smallest_bounding_area
│ │ │     ├smallest_bounding_length
│ │ │     ├smallest_bounding_width
│ │ │     ├volume
│ │ │     └writhe_string
│ │ ├threshold_method
│ │ └thresholds
│ │   ├above
│ │   └below
│ └ ... <other GrainCrop>
├image
├image_original
├image_statistics
│ ├grains
│ ├grains_per_m2
│ ├image
│ ├image_area_m2
│ ├image_area_px2
│ ├image_size_x_m
│ ├image_size_x_px
│ ├image_size_y_m
│ ├image_size_y_px
│ └rms_roughness
├img_path
├pixel_to_nm_scaling
└topostats_version

This nesting structure is retained when writing to .topostats a custom HDF5 format. The Python tool h5glance can be used to show the nested structure.

The top level of nesting reflects the TopoStats object itself.

h5glance output/processed/minicircle_small.topostats --depth 1

With a --depth 2 we can see the second level of nesting.

Individual items can be viewed with h5glance by specifying the path through the nesting structure as an argument at the command line.

h5glance output/processed/minicircle_small.topostats filename
h5glance output/processed/minicircle_small.topostats grain_crop/0/

Extending

Typically the unit of analysis is a GrainCrop so let's assume we are adding a new attribute GrainCrop.new_feature to the GrainCrop class and it is an object of class type NewFeature. We define NewFeature in the classes.py module.

import numpy.typing as npt
from pydantic import ConfigDict
from pydantic.dataclasses import dataclass


@dataclass(
    repr=True,
    eq=True,
    config=ConfigDict(arbitrary_types_allowed=True, validate_assignment=True),
    validate_on_init=True,
)
class NewFeature:
    """
    A class that adds a new feature.
    """

    left: bool | None
    right: bool | None
    data: dict[int, npt.NDArray | None]

This is to be an attribute of GrainCrop so we add a new attribute to the __init___ definition of GrainCrop in classes.py (note that GrainCrop is an exception as it is not a Pydantic dataclass). Don't forget to include a description of the attribute to both the class docstring and the __init__ docstring otherwise the Numpydoc validation will fail on commits and/or pull requests.

class GrainCrop:
    def __init__(self, new_feature: NewFeature | None = None):
        """
        Parameters
        ----------
        new_feature : NewFeature
            A ``NewFeature``.
        """

    self.new_feature = new_feature

`str` method

The __str__ method provides a convenient method to represent the class when called. If you would like some of the attributes you have defined shown then you should add them here. An example is shown below.

def __str__(self) -> str:
    """
    Readable attributes.

    Returns
    -------
    str
        Set of formatted statistics on matched branches.
    """
    return (
        f"left  : {self.left}\n" f"right : {self.right}\n" f"data  :\n\n {self.data}\n"
    )

`eq` method

Adding the equality dunder method means the object can easily be compared to another of the same type and tested for equality.

def __eq__(self, other: object) -> bool:
    """
    Check equality of ``NewFeature`` object against another.

    Parameters
    ----------
    other : object
        Other object to be compared.

    Returns
    -------
    bool
        ``True`` if the objects are equal, ``False`` otherwise.
    """
    if not isinstance(other, NewFeature):
        return False
    return (
        self.left == other.left
        and self.right == other.right
        and self.data == other.data
    )

IO

Output

TopoStats classes are saved to HDF5 .topostats files and in order to include any newly defined class in these files for subsequent processing the io.dict_to_hdf5() function needs to have the class adding to its list of supported objects and a clause that recurrsively calls io.dict_to_hdf5() with the class converted to a dictionary as the item argument.

if isinstance(
    item,
    (
        list
        | str
        | int
        | float
        | np.ndarray
        | Path
        | dict
        | GrainCrop
        | GrainCropsDirection
        | ImageGrainCrops
        | Node
        | OrderedTrace
        | DisorderedTrace
        | MatchedBranch
        | Molecule
        | NewFeature
    ),
):  # noqa: UP038
    # Lists need to be converted to numpy arrays
    if isinstance(item, list):
        LOGGER.debug(f"[dict_to_hdf5] {key} is of type : {type(item)}")
        item = np.array(item)
        open_hdf5_file[group_path + key] = item
    elif isinstance(item, NewFeature):
        logger.debug(f"[dict_to_hdf5] {key} is of type : {type(item)}")
        dict_to_hdf5(
            open_hdf5_file, group_path + key + "/", item.fancy_new_class_to_dict()
        )

Input

In order to work with the modular design of TopoStats we need to be able to import .topostats files which are in HDF5 and convert them to TopoStats classes with all of the nested features. HDF5 files are read by AFMReader which returns plain dictionaries. These are converted to TopoStats using the io.dict_to_topostats() function.

Where you add the new class depends on what existing Class it is an attribute of. In this example it is an attribute of GrainCrop and so it should be added within the loop that iterates over the crops dictionary that has been read. Not all attributes will be present so we use an if ... else None single line construct to unpack the components of the nested dictionary to the attribute only if the key is present.

for grain, crop in dictionary["image_grain_crops"][direction]["crops"].items():
    image = crop["image"] if "image" in crop.keys() else None
    ...
    new_feature = (
        NewFeature(**crop["new_feature"]) if "new_feature" in crop.keys() else None
    )

Classes

Class Overview

TopoStats

GrainCrop

DisorderedTrace

Node

OrderedTrace

MatchedBranch

UnmatchedBranch

Molecule