topostats.tracing.disordered_tracing#

Generates disordered traces (pruned skeletons) and metrics.

Attributes#

Classes#

disorderedTrace

Calculate disordered traces for a DNA molecule and calculates statistics from those traces.

Functions#

trace_image_disordered(→ tuple[dict, pandas.DataFrame, ...)

Processor function for tracing image.

compile_skan_stats(→ pandas.DataFrame)

Obtain and add more stats to the resultant Skan dataframe.

segment_heights(→ numpy.typing.NDArray)

Obtain an ordered list of heights from the skan defined skeleton segment.

segment_middles(→ float)

Obtain the pixel value in the middle of the ordered segment.

find_connections(→ str)

Compile the neighbouring branch indexes of the row.

prep_arrays(→ tuple[dict[int, numpy.typing.NDArray], ...)

Take an image and labelled mask and crops individual grains and original heights to a list.

grain_anchor(→ list)

Extract anchor (min_row, min_col) from labelled regions and align individual traces over the original image.

disordered_trace_grain(→ dict)

Trace an individual grain.

get_skan_image(→ numpy.typing.NDArray)

Label each branch with it's Skan branch type label.

crop_array(→ numpy.typing.NDArray)

Crop an array.

pad_bounding_box(→ list)

Pad coordinates, if they extend beyond image boundaries stop at boundary.

Module Contents#

topostats.tracing.disordered_tracing.LOGGER#
class topostats.tracing.disordered_tracing.disorderedTrace(image: numpy.typing.NDArray, mask: numpy.typing.NDArray, filename: str, pixel_to_nm_scaling: float, min_skeleton_size: int = 10, mask_smoothing_params: dict | None = None, skeletonisation_params: dict | None = None, pruning_params: dict | None = None, n_grain: int = None)[source]#

Calculate disordered traces for a DNA molecule and calculates statistics from those traces.

Parameters:
  • image (npt.NDArray) – Cropped image, typically padded beyond the bounding box.

  • mask (npt.NDArray) – Labelled mask for the grain, typically padded beyond the bounding box.

  • filename (str) – Filename being processed.

  • pixel_to_nm_scaling (float) – Pixel to nm scaling.

  • min_skeleton_size (int) – Minimum skeleton size below which tracing statistics are not calculated.

  • mask_smoothing_params (dict) – Dictionary of parameters to smooth the grain mask for better quality skeletonisation results. Contains a gaussian ‘sigma’ and number of dilation iterations.

  • skeletonisation_params (dict) – Skeletonisation Parameters. Method of skeletonisation to use ‘topostats’ is the original TopoStats method. Three methods from scikit-image are available ‘zhang’, ‘lee’ and ‘thin’.

  • pruning_params (dict) – Dictionary of pruning parameters. Contains ‘method’, ‘max_length’, ‘height_threshold’, ‘method_values’ and ‘method_outlier’.

  • n_grain (int) – Grain number being processed (only used in logging).

image#
mask#
filename#
pixel_to_nm_scaling#
min_skeleton_size#
mask_smoothing_params#
skeletonisation_params#
pruning_params#
n_grain#
smoothed_mask#
skeleton#
pruned_skeleton#
disordered_trace = None#
trace_dna()[source]#

Perform the DNA skeletonisation and cleaning pipeline.

re_add_holes(orig_mask: numpy.typing.NDArray, smoothed_mask: numpy.typing.NDArray, holearea_min_max: tuple[float | int | None] = (2, None)) numpy.typing.NDArray[source]#

Restore holes in masks that were occluded by dilation.

As Gaussian dilation smoothing methods can close holes in the original mask, this function obtains those holes (based on the general background being the first due to padding) and adds them back into the smoothed mask. When paired with smooth_mask, this essentially just smooths the outer edge of the mask.

Parameters:
  • orig_mask (npt.NDArray) – Original mask.

  • smoothed_mask (npt.NDArray) – Original mask but with inner and outer edged smoothed. The smoothing operation may have closed up important holes in the mask.

  • holearea_min_max (tuple[float | int | None]) – Tuple of minimum and maximum hole area (in nanometers) to replace from the original mask into the smoothed mask.

Returns:

Smoothed mask with holes restored.

Return type:

npt.NDArray

static remove_touching_edge(skeleton: numpy.typing.NDArray) numpy.typing.NDArray[source]#

Remove any skeleton points touching the border (to prevent errors later).

Parameters:

skeleton (npt.NDArray) – A binary array where touching clusters of 1’s become 0’s if touching the edge of the array.

Returns:

Skeleton without points touching the border.

Return type:

npt.NDArray

smooth_mask(grain: numpy.typing.NDArray, dilation_iterations: int = 2, gaussian_sigma: float | int = 2, holearea_min_max: tuple[int | float | None] = (0, None)) numpy.typing.NDArray[source]#

Smooth a grain mask based on the lower number of binary pixels added from dilation or gaussian.

This method ensures gaussian smoothing isn’t too aggressive and covers / creates gaps in the mask.

Parameters:
  • grain (npt.NDArray) – Numpy array of the grain mask.

  • dilation_iterations (int) – Number of times to dilate the grain to smooth it. Default is 2.

  • gaussian_sigma (float | None) – Gaussian sigma value to smooth the grains after an Otsu threshold. If None, defaults to 2.

  • holearea_min_max (tuple[float | int | None]) – Tuple of minimum and maximum hole area (in nanometers) to replace from the original mask into the smoothed mask.

Returns:

Numpy array of smmoothed image.

Return type:

npt.NDArray

topostats.tracing.disordered_tracing.trace_image_disordered(image: numpy.typing.NDArray, grains_mask: numpy.typing.NDArray, filename: str, pixel_to_nm_scaling: float, min_skeleton_size: int, mask_smoothing_params: dict, skeletonisation_params: dict, pruning_params: dict, pad_width: int = 1) tuple[dict, pandas.DataFrame, dict, pandas.DataFrame][source]#

Processor function for tracing image.

Parameters:
  • image (npt.NDArray) – Full image as Numpy Array.

  • grains_mask (npt.NDArray) – Full image as Grains that are labelled.

  • filename (str) – File being processed.

  • pixel_to_nm_scaling (float) – Pixel to nm scaling.

  • min_skeleton_size (int) – Minimum size of grain in pixels after skeletonisation.

  • mask_smoothing_params (dict) – Dictionary of parameters to smooth the grain mask for better quality skeletonisation results. Contains a gaussian ‘sigma’ and number of dilation iterations.

  • skeletonisation_params (dict) – Dictionary of options for skeletonisation, options are ‘zhang’ (scikit-image) / ‘lee’ (scikit-image) / ‘thin’ (scikitimage) or ‘topostats’ (original TopoStats method).

  • pruning_params (dict) – Dictionary of options for pruning.

  • pad_width (int) – Padding to the cropped image mask.

Returns:

Binary and integer labeled cropped and full-image masks from skeletonising and pruning the grains in the image.

Return type:

tuple[dict, pd.DataFrame, dict, pd.DataFrame]

topostats.tracing.disordered_tracing.compile_skan_stats(skan_df: pandas.DataFrame, skan_skeleton: skan.Skeleton, image: numpy.typing.NDArray, filename: str, grain_number: int) pandas.DataFrame[source]#

Obtain and add more stats to the resultant Skan dataframe.

Parameters:
  • skan_df (pd.DataFrame) – The statistics DataFrame produced by Skan’s summarize function.

  • skan_skeleton (skan.Skeleton) – The graphical representation of the skeleton produced by Skan.

  • image (npt.NDArray) – The image the skeleton was produced from.

  • filename (str) – Name of the file being processed.

  • grain_number (int) – The number of the grain being processed.

Returns:

A dataframe containing the filename, grain_number, branch-distance, branch-type, connected_segments, mean-pixel-value, stdev-pixel-value, min-value, median-value, and mid-value.

Return type:

pd.DataFrame

topostats.tracing.disordered_tracing.segment_heights(row: pandas.Series, skan_skeleton: skan.Skeleton, image: numpy.typing.NDArray) numpy.typing.NDArray[source]#

Obtain an ordered list of heights from the skan defined skeleton segment.

Parameters:
  • row (pd.Series) – A row from the Skan summarize dataframe.

  • skan_skeleton (skan.Skeleton) – The graphical representation of the skeleton produced by Skan.

  • image (npt.NDArray) – The image the skeleton was produced from.

Returns:

Heights along the segment, naturally ordered by Skan.

Return type:

npt.NDArray

topostats.tracing.disordered_tracing.segment_middles(row: pandas.Series, skan_skeleton: skan.csr.Skeleton, image: numpy.typing.NDArray) float[source]#

Obtain the pixel value in the middle of the ordered segment.

Parameters:
  • row (pd.Series) – A row from the Skan summarize dataframe.

  • skan_skeleton (skan.csr.Skeleton) – The graphical representation of the skeleton produced by Skan.

  • image (npt.NDArray) – The image the skeleton was produced from.

Returns:

The single or mean pixel value corresponding to the middle coordinate(s) of the segment.

Return type:

float

topostats.tracing.disordered_tracing.find_connections(row: pandas.Series, skan_df: pandas.DataFrame) str[source]#

Compile the neighbouring branch indexes of the row.

Parameters:
  • row (pd.Series) – A row from the Skan summarize dataframe.

  • skan_df (pd.DataFrame) – The statistics DataFrame produced by Skan’s summarize function.

Returns:

A string representation of a list of matching row indices where the node src and dst columns match that of the rows. String is needed for csv compatibility since csvs can’t hold lists.

Return type:

str

topostats.tracing.disordered_tracing.prep_arrays(image: numpy.typing.NDArray, labelled_grains_mask: numpy.typing.NDArray, pad_width: int) tuple[dict[int, numpy.typing.NDArray], dict[int, numpy.typing.NDArray]][source]#

Take an image and labelled mask and crops individual grains and original heights to a list.

A second padding is made after cropping to ensure for “edge cases” where grains are close to bounding box edges that they are traced correctly. This is accounted for when aligning traces to the whole image mask.

Parameters:
  • image (npt.NDArray) – Gaussian filtered image. Typically filtered_image.images[“gaussian_filtered”].

  • labelled_grains_mask (npt.NDArray) – 2D Numpy array of labelled grain masks, with each mask being comprised solely of unique integer (not zero). Typically this will be output from ‘grains.directions[<direction>[“labelled_region_02]’.

  • pad_width (int) – Cells by which to pad cropped regions by.

Returns:

Returns a tuple of three dictionaries, the cropped images, cropped masks and bounding boxes.

Return type:

Tuple

topostats.tracing.disordered_tracing.grain_anchor(array_shape: tuple, bounding_box: list, pad_width: int) list[source]#

Extract anchor (min_row, min_col) from labelled regions and align individual traces over the original image.

Parameters:
  • array_shape (tuple) – Shape of original array.

  • bounding_box (list) – A list of region properties returned by ‘skimage.measure.regionprops()’.

  • pad_width (int) – Padding for image.

Returns:

A list of tuples of the min_row, min_col of each bounding box.

Return type:

list(Tuple)

topostats.tracing.disordered_tracing.disordered_trace_grain(cropped_image: numpy.typing.NDArray, cropped_mask: numpy.typing.NDArray, pixel_to_nm_scaling: float, mask_smoothing_params: dict, skeletonisation_params: dict, pruning_params: dict, filename: str = None, min_skeleton_size: int = 10, n_grain: int = None) dict[source]#

Trace an individual grain.

Tracing involves multiple steps…

  1. Skeletonisation

  2. Pruning of side branches (artefacts from skeletonisation).

  3. Ordering of the skeleton.

Parameters:
  • cropped_image (npt.NDArray) – Cropped array from the original image defined as the bounding box from the labelled mask.

  • cropped_mask (npt.NDArray) – Cropped array from the labelled image defined as the bounding box from the labelled mask. This should have been converted to a binary mask.

  • pixel_to_nm_scaling (float) – Pixel to nm scaling.

  • mask_smoothing_params (dict) – Dictionary of parameters to smooth the grain mask for better quality skeletonisation results. Contains a gaussian ‘sigma’ and number of dilation iterations.

  • skeletonisation_params (dict) – Dictionary of skeletonisation parameters, options are ‘zhang’ (scikit-image) / ‘lee’ (scikit-image) / ‘thin’ (scikitimage) or ‘topostats’ (original TopoStats method).

  • pruning_params (dict) – Dictionary of pruning parameters.

  • filename (str) – File being processed.

  • min_skeleton_size (int) – Minimum size of grain in pixels after skeletonisation.

  • n_grain (int) – Grain number being processed.

Returns:

Dictionary of the contour length, whether the image is circular or linear, the end-to-end distance and an array of coordinates.

Return type:

dict

topostats.tracing.disordered_tracing.get_skan_image(original_image: numpy.typing.NDArray, pruned_skeleton: numpy.typing.NDArray, skan_column: str) numpy.typing.NDArray[source]#

Label each branch with it’s Skan branch type label.

Branch types (+1 compared to Skan docs) are defined as: 1 = Endpoint-to-endpoint (isolated branch) 2 = Junction-to-endpoint 3 = Junction-to-junction 4 = Isolated cycle

Parameters:
  • original_image (npt.NDArray) – Height image from which the pruned skeleton is derived from.

  • pruned_skeleton (npt.NDArray) – Single pixel thick skeleton mask.

  • skan_column (str) – A column from Skan’s summarize function to colour the branch segments with.

Returns:

2D array where the background is 0, and skeleton branches label as their Skan branch type.

Return type:

npt.NDArray

topostats.tracing.disordered_tracing.crop_array(array: numpy.typing.NDArray, bounding_box: tuple, pad_width: int = 0) numpy.typing.NDArray[source]#

Crop an array.

Ideally we pad the array that is being cropped so that we have heights outside of the grains bounding box. However, in some cases, if a grain is near the edge of the image scan this results in requesting indexes outside of the existing image. In which case we get as much of the image padded as possible.

Parameters:
  • array (npt.NDArray) – 2D Numpy array to be cropped.

  • bounding_box (Tuple) – Tuple of coordinates to crop, should be of form (min_row, min_col, max_row, max_col).

  • pad_width (int) – Padding to apply to bounding box.

Returns:

Cropped array.

Return type:

npt.NDArray()

topostats.tracing.disordered_tracing.pad_bounding_box(array_shape: tuple, bounding_box: list, pad_width: int) list[source]#

Pad coordinates, if they extend beyond image boundaries stop at boundary.

Parameters:
  • array_shape (tuple) – Shape of original image (row, columns).

  • bounding_box (list) – List of coordinates ‘min_row’, ‘min_col’, ‘max_row’, ‘max_col’.

  • pad_width (int) – Cells to pad arrays by.

Returns:

List of padded coordinates.

Return type:

list