Classes
TopoStats uses its own Pydantic Data Classes to store the original data, configuration settings and derived datasets. These are described in detail in the API. Using Pydantic means that data validation, checking that an attribute which is meant to be an integer is an integer, or that an attribute that is meant to be a Numpy array actually is, is done automatically. This makes code development much easier as the automated validation avoids errors that can arise when passing data of the wrong "type" into dataclasses that is inherent in dynamically typed languages such as Python.
This page aims to...
- Give an overview of the classes and their attributes.
- Details how to go about adding classes and all of the additional changes that are required.
- How to load
.topostatsfiles and reconstruct the classes.
Class Overview
| Class | Description |
|---|---|
TopoStats |
Top-level object, all other classes are attributes of this class. |
GrainCrop |
A region of an image that contains a "grain" (typically a molecule of some description). |
DisorderedTrace |
Outline of a grain, but without ordering of coordinates. |
Node |
A junction in a grain where one or more molecules overlap. |
OrderedTrace |
Outline of a grain, with coordinates ordered. |
MatchedBranch |
|
UnmatchedBranch |
|
Molecule |
A sub-unit of a grain after de-tangling overlapping molecules. |
The attributes of each class are described below. The type of an attribute is noted which informs you what the value
should be. In many cases the attributes can also be None which is for convenience so that intermediary objects can be
constructed, for example when first creating a TopoStats object after reading it from disk there will be no
grain_crops to include because the flattening and grain detection has not been performed.
TopoStats
| Attribute | Type | Description |
|---|---|---|
grain_crops |
dict[int, GrainCrop] |
A dictionary of GrainCrop objects detected within the image. |
filename |
str |
The filename the image was loaded from. |
pixel_to_nm_scaling |
str |
Pixel to nanometre scaling, typically derived from the image itself. |
img_path |
str |
Original path to image. |
image |
npt.NDArray |
Flattened image (post Filter()). |
image_original |
npt.NDArray |
Original image. |
full_mask_tensor |
npt.NDArray |
Tensor mask for the full image. |
topostats_version |
str |
TopoStats version the image was last processed with. |
config |
dict[str, Any] |
Configuration used when processing the grain. |
GrainCrop
| Attribute | Type | Description |
|---|---|---|
image |
npt.NDArray[np.float32] |
2-D Numpy array of the cropped image. |
mask |
npt.NDArray[np.bool_] |
3-D Numpy tensor of the cropped mask. |
padding |
int |
Padding added to the bounding box of the grain during cropping. |
bbox |
tuple[int, int, int, int] |
Bounding box of the crop including padding. |
pixel_to_nm_scaling |
float |
Pixel to nanometre scaling factor for the crop. |
thresholds |
float |
Thresholds used to find the grain. |
filename |
str |
Filename of the image from which the crop was taken. |
skeleton |
npt.NDArray[np.bool_] |
3-D Numpy tensor of the skeletonised mask. |
height_profiles |
dict[int, [int, npt.NDArray[np.float32]]] |
3-D Numpy tensor of the height profiles. |
stats |
dict[int, dict[int, Any]] |
Dictionary of grain statistics. |
disordered_trace |
DisorderedTrace |
A disordered trace for the grain. |
nodes |
dict[str, Nodes] |
Dictionary of grain nodes. |
ordered_trace |
OrderedTrace |
An ordered trace for the grain. |
threshold_method |
str |
Threshold method used to find grains.` |
DisorderedTrace
| Attribute | Type | Description |
|---|---|---|
images |
dict[str: npt.NDArray] |
Dictionary of images generated during disordered tracing, should include ''pruned_skeleton''. |
grain_endpoints |
int |
Number of Grain endpoints. |
grain_junctions |
int |
Number of Grain junctions. |
total_branch_length |
float |
Total branch length in nanometres. |
grain_width_mean |
float |
Mean grain width in nanometres. |
Node
| Attribute | Type | Description |
|---|---|---|
error |
bool |
Whether an error occurred calculating statistics for this node. |
pixel_to_nm_scaling |
np.float64 |
Pixel to nanometre scaling. |
branch_stats |
dict[int, MatchedBranch] |
Dictionary of branch statistics. |
unmatched_branch_stats |
dict[int, UnMatchedBranch] |
Dictionary of unmatched branch statistics. |
node_coords |
npt.NDArray[np.int32] |
Numpy array of node coordinates. |
confidence |
np.float64 |
Confidence in ???. |
reduced_node_area |
??? |
Reduced node area. |
node_area_skeleton |
npt.NDArray[np.int32] |
Numpy array of skeleton. |
node_branch_mask |
npt.NDArray[np.int32] |
Numpy array of branch mask. |
node_avg_mask |
npt.NDArray[np.int32] |
Numpy array of averaged mask.` |
OrderedTrace
| Attribute | Type | Description |
|---|---|---|
molecule_data |
dict[int, Molecule] |
Dictionary of ordered trace data for individual molecules within the grain indexed by molecule number. |
tracing_stats |
dict |
Tracing statistics. |
grain_molstats |
Any |
Grain molecule statistics. |
molecules |
int |
Number of molecules within the grain. |
writhe |
str |
The writhe sign, can be either +, - or 0 for positive, negative or no writhe. |
pixel_to_nm_scaling |
np.float64 |
Pixel to nm scaling. |
images |
dict[str, npt.NDArray] |
Dictionary of diagnostic images for debugging. |
error |
bool |
Errors encountered? |
MatchedBranch
| Attribute | Type | Description |
|---|---|---|
ordered_coords |
npt.NDArray[np.int32] |
Numpy array of ordered coordinates. |
heights |
npt.NDArray[np.number] |
Numpy array of heights. |
distances |
npt.NDArray[np.number] |
Numpy array of distances. |
fwhm |
float |
Full-width half maximum. |
fwhm_half_maxs |
list[float] |
Half-maximums from a matched branch. |
fwhm_peaks |
list[float] |
Peaks from a matched branch. |
angles |
float |
Angle between branches.` |
UnmatchedBranch
| Attribute | Type | Description |
|---|---|---|
angles |
float |
Angle between branches. |
Molecule
Class for Molecules identified during ordered tracing.
| Attribute | Type | Description |
|---|---|---|
circular |
str, bool |
Whether the molecule is circular or linear. |
topology |
str |
Topological classification of the molecule. |
topology_flip |
Any |
Unknown? |
ordered_coords |
npt.NDArray |
Ordered coordinates for the molecule. |
splined_coords |
npt.NDArray |
Smoothed (aka splined) coordinates for the molecule. |
contour_length |
float |
Length of the molecule. |
end_to_end_distance |
float |
Distance between ends of molecule. Will be 0.0 for circular molecules which don't have ends. |
heights |
npt.NDArray |
Height along molecule. |
distances |
npt.NDArray |
Distance between points on the molecule. |
curvature_stats |
npt.NDArray, optional |
Angle changes along molecule. NB - These are always positive due to use of np.abs() during calculation. |
bbox |
tuple[int, int, int, int] |
Bounding box. |
Hierarchical Structure
The top-level object is always the TopoStats class, the remaining objects are nested within. The key attribute being
TopoStats.grain_crops which holds a dictionary of GrainCrops, each of which in turn holds various attributes, often
dictionaries, of other attributes such as DisorderedTrace, Nodes, OrderedTrace and in turn Molecule nested
within.
The structure of an image with a single grain and all attributes is shown below.
output/processed/minicircle.topostats
├config
│ └ ...
├filename
├full_image_plots
| └
│ ├all_molecules
│ ├branch_indexes
│ ├branch_types
│ ├connected_nodes
│ ├convolved_skeletons
│ ├node_centres
│ ├ordered_traces
│ ├over_under
│ ├pruned_skeleton
│ ├skeleton
│ ├smoothed_mask
│ └trace_segments
├full_mask_tensor
├grain_crops
│ ├0
│ │ ├bbox
│ │ ├convolved_skeleton
│ │ ├disordered_trace
│ │ │ ├grain_endpoints
│ │ │ ├grain_junctions
│ │ │ ├grain_width_mean
│ │ │ ├images
│ │ │ │ ├branch_indexes
│ │ │ │ ├branch_types
│ │ │ │ ├grain
│ │ │ │ ├image
│ │ │ │ ├pruned_skeleton
│ │ │ │ ├skeleton
│ │ │ │ └smoothed_mask
│ │ │ ├stats
│ │ │ │ ├0
│ │ │ │ │ ├basename
│ │ │ │ │ ├branch_distance
│ │ │ │ │ ├branch_type
│ │ │ │ │ ├connected_segments
│ │ │ │ │ ├image
│ │ │ │ │ ├mean_pixel_value
│ │ │ │ │ ├median_value
│ │ │ │ │ ├middle_value
│ │ │ │ │ ├min_value
│ │ │ │ │ └stdev_pixel_value
│ │ │ │ ├1
│ │ │ │ │ ├basename
│ │ │ │ │ ├branch_distance
│ │ │ │ │ ├branch_type
│ │ │ │ │ ├connected_segments
│ │ │ │ │ ├image
│ │ │ │ │ ├mean_pixel_value
│ │ │ │ │ ├median_value
│ │ │ │ │ ├middle_value
│ │ │ │ │ ├min_value
│ │ │ │ │ └stdev_pixel_value
│ │ │ │ └2
│ │ │ │ ├basename
│ │ │ │ ├branch_distance
│ │ │ │ ├branch_type
│ │ │ │ ├connected_segments
│ │ │ │ ├image
│ │ │ │ ├mean_pixel_value
│ │ │ │ ├median_value
│ │ │ │ ├middle_value
│ │ │ │ ├min_value
│ │ │ │ └stdev_pixel_value
│ │ │ └total_branch_length
│ │ ├filename
│ │ ├height_profiles
│ │ │ └1
│ │ │ └0
│ │ ├image
│ │ ├mask
│ │ ├nodes
│ │ │ └0
│ │ │ ├branch_stats
│ │ │ │ ├0
│ │ │ │ │ ├angles
│ │ │ │ │ ├branch_statistics
│ │ │ │ │ │ ├basename
│ │ │ │ │ │ ├fwhm
│ │ │ │ │ │ ├fwhm_half_maxs
│ │ │ │ │ │ ├fwhm_peaks
│ │ │ │ │ │ └image
│ │ │ │ │ ├distances
│ │ │ │ │ ├fwhm
│ │ │ │ │ ├fwhm_half_maxs
│ │ │ │ │ ├fwhm_peaks
│ │ │ │ │ ├heights
│ │ │ │ │ └ordered_coords
│ │ │ │ └1
│ │ │ │ ├angles
│ │ │ │ ├branch_statistics
│ │ │ │ │ ├basename
│ │ │ │ │ ├fwhm
│ │ │ │ │ ├fwhm_half_maxs
│ │ │ │ │ ├fwhm_peaks
│ │ │ │ │ └image
│ │ │ │ ├distances
│ │ │ │ ├fwhm
│ │ │ │ ├fwhm_half_maxs
│ │ │ │ ├fwhm_peaks
│ │ │ │ ├heights
│ │ │ │ └ordered_coords
│ │ │ ├confidence
│ │ │ ├error
│ │ │ ├node_area_skeleton
│ │ │ ├node_avg_mask
│ │ │ ├node_branch_mask
│ │ │ ├node_coords
│ │ │ ├pixel_to_nm_scaling
│ │ │ ├unmatched_branch_stats
│ │ │ │ ├0
│ │ │ │ │ └angles
│ │ │ │ ├1
│ │ │ │ │ └angles
│ │ │ │ ├2
│ │ │ │ │ └angles
│ │ │ │ └3
│ │ │ │ └angles
│ │ │ └writhe
│ │ ├ordered_trace
│ │ │ ├images
│ │ │ │ ├all_molecules
│ │ │ │ ├ordered_traces
│ │ │ │ ├over_under
│ │ │ │ └trace_segments
│ │ │ ├molecule_data
│ │ │ │ └0
│ │ │ │ ├bbox
│ │ │ │ ├circular
│ │ │ │ ├contour_length
│ │ │ │ ├curvature_stats
│ │ │ │ ├distances
│ │ │ │ ├end_to_end_distance
│ │ │ │ ├heights
│ │ │ │ ├molecule_statistics
│ │ │ │ │ ├circular
│ │ │ │ │ ├contour_length
│ │ │ │ │ ├end_to_end_distance
│ │ │ │ │ ├topology
│ │ │ │ │ └topology_flip
│ │ │ │ ├ordered_coords
│ │ │ │ ├splined_coords
│ │ │ │ ├topology
│ │ │ │ └topology_flip
│ │ │ └molecule_statistics
│ │ │ └0
│ │ │ ├circular
│ │ │ ├contour_length
│ │ │ ├end_to_end_distance
│ │ │ ├topology
│ │ │ └topology_flip
│ │ ├padding
│ │ ├pixel_to_nm_scaling
│ │ ├skeleton
│ │ ├stats
│ │ │ └1
│ │ │ └0
│ │ │ ├area
│ │ │ ├area_cartesian_bbox
│ │ │ ├aspect_ratio
│ │ │ ├centre_x
│ │ │ ├centre_y
│ │ │ ├height_max
│ │ │ ├height_mean
│ │ │ ├height_median
│ │ │ ├height_min
│ │ │ ├max_feret
│ │ │ ├mean_crossing_confidence
│ │ │ ├min_crossing_confidence
│ │ │ ├min_feret
│ │ │ ├num_crossings
│ │ │ ├num_mols
│ │ │ ├radius_max
│ │ │ ├radius_mean
│ │ │ ├radius_median
│ │ │ ├radius_min
│ │ │ ├smallest_bounding_area
│ │ │ ├smallest_bounding_length
│ │ │ ├smallest_bounding_width
│ │ │ ├volume
│ │ │ └writhe_string
│ │ ├threshold_method
│ │ └thresholds
│ │ ├above
│ │ └below
│ └ ... <other GrainCrop>
├image
├image_original
├image_statistics
│ ├grains
│ ├grains_per_m2
│ ├image
│ ├image_area_m2
│ ├image_area_px2
│ ├image_size_x_m
│ ├image_size_x_px
│ ├image_size_y_m
│ ├image_size_y_px
│ └rms_roughness
├img_path
├pixel_to_nm_scaling
└topostats_version
This nesting structure is retained when writing to .topostats a custom HDF5 format. The Python tool
h5glance can be used to show the nested structure.
The top level of nesting reflects the TopoStats object itself.
With a --depth 2 we can see the second level of nesting.
Individual items can be viewed with h5glance by specifying the path through the nesting structure as an argument at
the command line.
h5glance output/processed/minicircle_small.topostats filename
h5glance output/processed/minicircle_small.topostats grain_crop/0/
Extending
Typically the unit of analysis is a GrainCrop so let's assume we are adding a new attribute GrainCrop.new_feature to
the GrainCrop class and it is an object of class type NewFeature. We define NewFeature in the classes.py module.
import numpy.typing as npt
from pydantic import ConfigDict
from pydantic.dataclasses import dataclass
@dataclass(
repr=True,
eq=True,
config=ConfigDict(arbitrary_types_allowed=True, validate_assignment=True),
validate_on_init=True,
)
class NewFeature:
"""
A class that adds a new feature.
"""
left: bool | None
right: bool | None
data: dict[int, npt.NDArray | None]
This is to be an attribute of GrainCrop so we add a new attribute to the __init___ definition of GrainCrop in
classes.py (note that GrainCrop is an exception as it is not a Pydantic dataclass). Don't forget to include a
description of the attribute to both the class docstring and the __init__ docstring otherwise the Numpydoc validation
will fail on commits and/or pull requests.
class GrainCrop:
def __init__(self, new_feature: NewFeature | None = None):
"""
Parameters
----------
new_feature : NewFeature
A ``NewFeature``.
"""
self.new_feature = new_feature
__str__ method
The __str__ method provides a convenient method to represent the class when called. If you would like some of the
attributes you have defined shown then you should add them here. An example is shown below.
def __str__(self) -> str:
"""
Readable attributes.
Returns
-------
str
Set of formatted statistics on matched branches.
"""
return (
f"left : {self.left}\n" f"right : {self.right}\n" f"data :\n\n {self.data}\n"
)
__eq__ method
Adding the equality dunder method means the object can easily be compared to another of the same type and tested for equality.
def __eq__(self, other: object) -> bool:
"""
Check equality of ``NewFeature`` object against another.
Parameters
----------
other : object
Other object to be compared.
Returns
-------
bool
``True`` if the objects are equal, ``False`` otherwise.
"""
if not isinstance(other, NewFeature):
return False
return (
self.left == other.left
and self.right == other.right
and self.data == other.data
)
IO
Output
TopoStats classes are saved to HDF5 .topostats files and in order to include any newly defined class in these files
for subsequent processing the io.dict_to_hdf5() function needs to have the class adding to its list of supported
objects and a clause that recurrsively calls io.dict_to_hdf5() with the class converted to a dictionary as the item
argument.
if isinstance(
item,
(
list
| str
| int
| float
| np.ndarray
| Path
| dict
| GrainCrop
| GrainCropsDirection
| ImageGrainCrops
| Node
| OrderedTrace
| DisorderedTrace
| MatchedBranch
| Molecule
| NewFeature
),
): # noqa: UP038
# Lists need to be converted to numpy arrays
if isinstance(item, list):
LOGGER.debug(f"[dict_to_hdf5] {key} is of type : {type(item)}")
item = np.array(item)
open_hdf5_file[group_path + key] = item
elif isinstance(item, NewFeature):
logger.debug(f"[dict_to_hdf5] {key} is of type : {type(item)}")
dict_to_hdf5(
open_hdf5_file, group_path + key + "/", item.fancy_new_class_to_dict()
)
Input
In order to work with the modular design of TopoStats we need to be able to import .topostats files which are in HDF5
and convert them to TopoStats classes with all of the nested features. HDF5 files are read by AFMReader
which returns plain dictionaries. These are converted to TopoStats using the io.dict_to_topostats() function.
Where you add the new class depends on what existing Class it is an attribute of. In this example it is an attribute of
GrainCrop and so it should be added within the loop that iterates over the crops dictionary that has been read. Not
all attributes will be present so we use an if ... else None single line construct to unpack the components of the
nested dictionary to the attribute only if the key is present.