Skip to content

Classes

TopoStats uses its own Pydantic Data Classes to store the original data, configuration settings and derived datasets. These are described in detail in the API. Using Pydantic means that data validation, checking that an attribute which is meant to be an integer is an integer, or that an attribute that is meant to be a Numpy array actually is, is done automatically. This makes code development much easier as the automated validation avoids errors that can arise when passing data of the wrong "type" into dataclasses that is inherent in dynamically typed languages such as Python.

This page aims to...

  • Give an overview of the classes and their attributes.
  • Details how to go about adding classes and all of the additional changes that are required.
  • How to load .topostats files and reconstruct the classes.

Class Overview

Class Description
TopoStats Top-level object, all other classes are attributes of this class.
GrainCrop A region of an image that contains a "grain" (typically a molecule of some description).
DisorderedTrace Outline of a grain, but without ordering of coordinates.
Node A junction in a grain where one or more molecules overlap.
OrderedTrace Outline of a grain, with coordinates ordered.
MatchedBranch
UnmatchedBranch
Molecule A sub-unit of a grain after de-tangling overlapping molecules.

The attributes of each class are described below. The type of an attribute is noted which informs you what the value should be. In many cases the attributes can also be None which is for convenience so that intermediary objects can be constructed, for example when first creating a TopoStats object after reading it from disk there will be no grain_crops to include because the flattening and grain detection has not been performed.

TopoStats

Attribute Type Description
grain_crops dict[int, GrainCrop] A dictionary of GrainCrop objects detected within the image.
filename str The filename the image was loaded from.
pixel_to_nm_scaling str Pixel to nanometre scaling, typically derived from the image itself.
img_path str Original path to image.
image npt.NDArray Flattened image (post Filter()).
image_original npt.NDArray Original image.
full_mask_tensor npt.NDArray Tensor mask for the full image.
topostats_version str TopoStats version the image was last processed with.
config dict[str, Any] Configuration used when processing the grain.

GrainCrop

Attribute Type Description
image npt.NDArray[np.float32] 2-D Numpy array of the cropped image.
mask npt.NDArray[np.bool_] 3-D Numpy tensor of the cropped mask.
padding int Padding added to the bounding box of the grain during cropping.
bbox tuple[int, int, int, int] Bounding box of the crop including padding.
pixel_to_nm_scaling float Pixel to nanometre scaling factor for the crop.
thresholds float Thresholds used to find the grain.
filename str Filename of the image from which the crop was taken.
skeleton npt.NDArray[np.bool_] 3-D Numpy tensor of the skeletonised mask.
height_profiles dict[int, [int, npt.NDArray[np.float32]]] 3-D Numpy tensor of the height profiles.
stats dict[int, dict[int, Any]] Dictionary of grain statistics.
disordered_trace DisorderedTrace A disordered trace for the grain.
nodes dict[str, Nodes] Dictionary of grain nodes.
ordered_trace OrderedTrace An ordered trace for the grain.
threshold_method str Threshold method used to find grains.`

DisorderedTrace

Attribute Type Description
images dict[str: npt.NDArray] Dictionary of images generated during disordered tracing, should include ''pruned_skeleton''.
grain_endpoints int Number of Grain endpoints.
grain_junctions int Number of Grain junctions.
total_branch_length float Total branch length in nanometres.
grain_width_mean float Mean grain width in nanometres.

Node

Attribute Type Description
error bool Whether an error occurred calculating statistics for this node.
pixel_to_nm_scaling np.float64 Pixel to nanometre scaling.
branch_stats dict[int, MatchedBranch] Dictionary of branch statistics.
unmatched_branch_stats dict[int, UnMatchedBranch] Dictionary of unmatched branch statistics.
node_coords npt.NDArray[np.int32] Numpy array of node coordinates.
confidence np.float64 Confidence in ???.
reduced_node_area ??? Reduced node area.
node_area_skeleton npt.NDArray[np.int32] Numpy array of skeleton.
node_branch_mask npt.NDArray[np.int32] Numpy array of branch mask.
node_avg_mask npt.NDArray[np.int32] Numpy array of averaged mask.`

OrderedTrace

Attribute Type Description
molecule_data dict[int, Molecule] Dictionary of ordered trace data for individual molecules within the grain indexed by molecule number.
tracing_stats dict Tracing statistics.
grain_molstats Any Grain molecule statistics.
molecules int Number of molecules within the grain.
writhe str The writhe sign, can be either +, - or 0 for positive, negative or no writhe.
pixel_to_nm_scaling np.float64 Pixel to nm scaling.
images dict[str, npt.NDArray] Dictionary of diagnostic images for debugging.
error bool Errors encountered?

MatchedBranch

Attribute Type Description
ordered_coords npt.NDArray[np.int32] Numpy array of ordered coordinates.
heights npt.NDArray[np.number] Numpy array of heights.
distances npt.NDArray[np.number] Numpy array of distances.
fwhm float Full-width half maximum.
fwhm_half_maxs list[float] Half-maximums from a matched branch.
fwhm_peaks list[float] Peaks from a matched branch.
angles float Angle between branches.`

UnmatchedBranch

Attribute Type Description
angles float Angle between branches.

Molecule

Class for Molecules identified during ordered tracing.

Attribute Type Description
circular str, bool Whether the molecule is circular or linear.
topology str Topological classification of the molecule.
topology_flip Any Unknown?
ordered_coords npt.NDArray Ordered coordinates for the molecule.
splined_coords npt.NDArray Smoothed (aka splined) coordinates for the molecule.
contour_length float Length of the molecule.
end_to_end_distance float Distance between ends of molecule. Will be 0.0 for circular molecules which don't have ends.
heights npt.NDArray Height along molecule.
distances npt.NDArray Distance between points on the molecule.
curvature_stats npt.NDArray, optional Angle changes along molecule. NB - These are always positive due to use of np.abs() during calculation.
bbox tuple[int, int, int, int] Bounding box.

Hierarchical Structure

The top-level object is always the TopoStats class, the remaining objects are nested within. The key attribute being TopoStats.grain_crops which holds a dictionary of GrainCrops, each of which in turn holds various attributes, often dictionaries, of other attributes such as DisorderedTrace, Nodes, OrderedTrace and in turn Molecule nested within.

The structure of an image with a single grain and all attributes is shown below.

output/processed/minicircle.topostats
├config
  ...
├filename
├full_image_plots
|  ├all_molecules
 ├branch_indexes
 ├branch_types
 ├connected_nodes
 ├convolved_skeletons
 ├node_centres
 ├ordered_traces
 ├over_under
 ├pruned_skeleton
 ├skeleton
 ├smoothed_mask
 └trace_segments
├full_mask_tensor
├grain_crops
 ├0
  ├bbox
  ├convolved_skeleton
  ├disordered_trace
   ├grain_endpoints
   ├grain_junctions
   ├grain_width_mean
   ├images
    ├branch_indexes
    ├branch_types
    ├grain
    ├image
    ├pruned_skeleton
    ├skeleton
    └smoothed_mask
   ├stats
    ├0
     ├basename
     ├branch_distance
     ├branch_type
     ├connected_segments
     ├image
     ├mean_pixel_value
     ├median_value
     ├middle_value
     ├min_value
     └stdev_pixel_value
    ├1
     ├basename
     ├branch_distance
     ├branch_type
     ├connected_segments
     ├image
     ├mean_pixel_value
     ├median_value
     ├middle_value
     ├min_value
     └stdev_pixel_value
    └2
      ├basename
      ├branch_distance
      ├branch_type
      ├connected_segments
      ├image
      ├mean_pixel_value
      ├median_value
      ├middle_value
      ├min_value
      └stdev_pixel_value
   └total_branch_length
  ├filename
  ├height_profiles
   └1
     └0
  ├image
  ├mask
  ├nodes
   └0
     ├branch_stats
      ├0
       ├angles
       ├branch_statistics
        ├basename
        ├fwhm
        ├fwhm_half_maxs
        ├fwhm_peaks
        └image
       ├distances
       ├fwhm
       ├fwhm_half_maxs
       ├fwhm_peaks
       ├heights
       └ordered_coords
      └1
        ├angles
        ├branch_statistics
         ├basename
         ├fwhm
         ├fwhm_half_maxs
         ├fwhm_peaks
         └image
        ├distances
        ├fwhm
        ├fwhm_half_maxs
        ├fwhm_peaks
        ├heights
        └ordered_coords
     ├confidence
     ├error
     ├node_area_skeleton
     ├node_avg_mask
     ├node_branch_mask
     ├node_coords
     ├pixel_to_nm_scaling
     ├unmatched_branch_stats
      ├0
       └angles
      ├1
       └angles
      ├2
       └angles
      └3
        └angles
     └writhe
  ├ordered_trace
   ├images
    ├all_molecules
    ├ordered_traces
    ├over_under
    └trace_segments
   ├molecule_data
    └0
      ├bbox
      ├circular
      ├contour_length
      ├curvature_stats
      ├distances
      ├end_to_end_distance
      ├heights
      ├molecule_statistics
       ├circular
       ├contour_length
       ├end_to_end_distance
       ├topology
       └topology_flip
      ├ordered_coords
      ├splined_coords
      ├topology
      └topology_flip
   └molecule_statistics
     └0
       ├circular
       ├contour_length
       ├end_to_end_distance
       ├topology
       └topology_flip
  ├padding
  ├pixel_to_nm_scaling
  ├skeleton
  ├stats
   └1
     └0
       ├area
       ├area_cartesian_bbox
       ├aspect_ratio
       ├centre_x
       ├centre_y
       ├height_max
       ├height_mean
       ├height_median
       ├height_min
       ├max_feret
       ├mean_crossing_confidence
       ├min_crossing_confidence
       ├min_feret
       ├num_crossings
       ├num_mols
       ├radius_max
       ├radius_mean
       ├radius_median
       ├radius_min
       ├smallest_bounding_area
       ├smallest_bounding_length
       ├smallest_bounding_width
       ├volume
       └writhe_string
  ├threshold_method
  └thresholds
    ├above
    └below
  ... <other GrainCrop>
├image
├image_original
├image_statistics
 ├grains
 ├grains_per_m2
 ├image
 ├image_area_m2
 ├image_area_px2
 ├image_size_x_m
 ├image_size_x_px
 ├image_size_y_m
 ├image_size_y_px
 └rms_roughness
├img_path
├pixel_to_nm_scaling
└topostats_version

This nesting structure is retained when writing to .topostats a custom HDF5 format. The Python tool h5glance can be used to show the nested structure.

The top level of nesting reflects the TopoStats object itself.

h5glance output/processed/minicircle_small.topostats --depth 1

With a --depth 2 we can see the second level of nesting.

Individual items can be viewed with h5glance by specifying the path through the nesting structure as an argument at the command line.

h5glance output/processed/minicircle_small.topostats filename
h5glance output/processed/minicircle_small.topostats grain_crop/0/

Extending

Typically the unit of analysis is a GrainCrop so let's assume we are adding a new attribute GrainCrop.new_feature to the GrainCrop class and it is an object of class type NewFeature. We define NewFeature in the classes.py module.

import numpy.typing as npt
from pydantic import ConfigDict
from pydantic.dataclasses import dataclass


@dataclass(
    repr=True,
    eq=True,
    config=ConfigDict(arbitrary_types_allowed=True, validate_assignment=True),
    validate_on_init=True,
)
class NewFeature:
    """
    A class that adds a new feature.
    """

    left: bool | None
    right: bool | None
    data: dict[int, npt.NDArray | None]

This is to be an attribute of GrainCrop so we add a new attribute to the __init___ definition of GrainCrop in classes.py (note that GrainCrop is an exception as it is not a Pydantic dataclass). Don't forget to include a description of the attribute to both the class docstring and the __init__ docstring otherwise the Numpydoc validation will fail on commits and/or pull requests.

class GrainCrop:
    def __init__(self, new_feature: NewFeature | None = None):
        """
        Parameters
        ----------
        new_feature : NewFeature
            A ``NewFeature``.
        """

    self.new_feature = new_feature

__str__ method

The __str__ method provides a convenient method to represent the class when called. If you would like some of the attributes you have defined shown then you should add them here. An example is shown below.

def __str__(self) -> str:
    """
    Readable attributes.

    Returns
    -------
    str
        Set of formatted statistics on matched branches.
    """
    return (
        f"left  : {self.left}\n" f"right : {self.right}\n" f"data  :\n\n {self.data}\n"
    )

__eq__ method

Adding the equality dunder method means the object can easily be compared to another of the same type and tested for equality.

def __eq__(self, other: object) -> bool:
    """
    Check equality of ``NewFeature`` object against another.

    Parameters
    ----------
    other : object
        Other object to be compared.

    Returns
    -------
    bool
        ``True`` if the objects are equal, ``False`` otherwise.
    """
    if not isinstance(other, NewFeature):
        return False
    return (
        self.left == other.left
        and self.right == other.right
        and self.data == other.data
    )

IO

Output

TopoStats classes are saved to HDF5 .topostats files and in order to include any newly defined class in these files for subsequent processing the io.dict_to_hdf5() function needs to have the class adding to its list of supported objects and a clause that recurrsively calls io.dict_to_hdf5() with the class converted to a dictionary as the item argument.

if isinstance(
    item,
    (
        list
        | str
        | int
        | float
        | np.ndarray
        | Path
        | dict
        | GrainCrop
        | GrainCropsDirection
        | ImageGrainCrops
        | Node
        | OrderedTrace
        | DisorderedTrace
        | MatchedBranch
        | Molecule
        | NewFeature
    ),
):  # noqa: UP038
    # Lists need to be converted to numpy arrays
    if isinstance(item, list):
        LOGGER.debug(f"[dict_to_hdf5] {key} is of type : {type(item)}")
        item = np.array(item)
        open_hdf5_file[group_path + key] = item
    elif isinstance(item, NewFeature):
        logger.debug(f"[dict_to_hdf5] {key} is of type : {type(item)}")
        dict_to_hdf5(
            open_hdf5_file, group_path + key + "/", item.fancy_new_class_to_dict()
        )

Input

In order to work with the modular design of TopoStats we need to be able to import .topostats files which are in HDF5 and convert them to TopoStats classes with all of the nested features. HDF5 files are read by AFMReader which returns plain dictionaries. These are converted to TopoStats using the io.dict_to_topostats() function.

Where you add the new class depends on what existing Class it is an attribute of. In this example it is an attribute of GrainCrop and so it should be added within the loop that iterates over the crops dictionary that has been read. Not all attributes will be present so we use an if ... else None single line construct to unpack the components of the nested dictionary to the attribute only if the key is present.

for grain, crop in dictionary["image_grain_crops"][direction]["crops"].items():
    image = crop["image"] if "image" in crop.keys() else None
    ...
    new_feature = (
        NewFeature(**crop["new_feature"]) if "new_feature" in crop.keys() else None
    )