topostats.tracing.dnatracing#

Perform DNA Tracing.

Attributes#

Classes#

dnaTrace

Calculates traces for a DNA molecule and calculates statistics from those traces.

Functions#

trace_image(→ dict)

Processor function for tracing image.

round_splined_traces(→ dict)

Round a Dict of floating point coordinates to integer floating point coordinates.

trim_array(→ numpy.typing.NDArray)

Trim an array by the specified pad_width.

adjust_coordinates(→ numpy.typing.NDArray)

Adjust coordinates of a trace by the pad_width.

trace_mask(→ numpy.typing.NDArray)

Place the traced skeletons into an array of the original image for plotting/overlaying.

prep_arrays(→ tuple[dict[int, numpy.typing.NDArray], ...)

Take an image and labelled mask and crops individual grains and original heights to a list.

grain_anchor(→ list)

Extract anchor (min_row, min_col) from labelled regions and align individual traces over the original image.

trace_grain(→ dict)

Trace an individual grain.

crop_array(→ numpy.typing.NDArray)

Crop an array.

pad_bounding_box(→ list)

Pad coordinates, if they extend beyond image boundaries stop at boundary.

Module Contents#

topostats.tracing.dnatracing.LOGGER#
class topostats.tracing.dnatracing.dnaTrace(image: numpy.typing.NDArray, grain: numpy.typing.NDArray, filename: str, pixel_to_nm_scaling: float, min_skeleton_size: int = 10, convert_nm_to_m: bool = True, skeletonisation_method: str = 'topostats', n_grain: int = None, spline_step_size: float = 7e-09, spline_linear_smoothing: float = 5.0, spline_circular_smoothing: float = 0.0, spline_quiet: bool = True, spline_degree: int = 3)[source]#

Calculates traces for a DNA molecule and calculates statistics from those traces.

2023-06-09 : This class has undergone some refactoring so that it works with a single grain. The trace_grain() helper function runs the class and returns the expected statistics whilst the trace_image() function handles processing all detected grains within an image. The original methods of skeletonisation are available along with additional methods from scikit-image.

Some bugs have been identified and corrected see commits for further details…

236750b2 2a79c4ff

Parameters:
  • image (npt.NDArray) – Cropped image, typically padded beyond the bounding box.

  • grain (npt.NDArray) – Labelled mask for the grain, typically padded beyond the bounding box.

  • filename (str) – Filename being processed.

  • pixel_to_nm_scaling (float) – Pixel to nm scaling.

  • min_skeleton_size (int) – Minimum skeleton size below which tracing statistics are not calculated.

  • convert_nm_to_m (bool) – Convert nanometers to metres.

  • skeletonisation_method (str) – Method of skeletonisation to use ‘topostats’ is the original TopoStats method. Three methods from scikit-image are available ‘zhang’, ‘lee’ and ‘thin’.

  • n_grain (int) – Grain number being processed (only used in logging).

  • spline_step_size (float) – Step size for spline evaluation in metres.

  • spline_linear_smoothing (float) – Smoothness of linear splines.

  • spline_circular_smoothing (float) – Smoothness of circular splines.

  • spline_quiet (bool) – Suppresses scipy splining warnings.

  • spline_degree (int) – Degree of the spline.

image#
grain#
filename#
pixel_to_nm_scaling#
min_skeleton_size = 10#
skeletonisation_method = 'topostats'#
n_grain = None#
number_of_rows#
number_of_columns#
sigma#
gauss_image = None#
disordered_trace = None#
ordered_trace = None#
fitted_trace = None#
splined_trace = None#
contour_length#
end_to_end_distance#
mol_is_circular#
curvature#
spline_step_size: float = 7e-09#
spline_linear_smoothing: float = 5.0#
spline_circular_smoothing: float = 0.0#
spline_quiet: bool = True#
spline_degree: int = 3#
neighbours = 5#
ordered_trace_heights = None#
ordered_trace_cumulative_distances = None#
trace_dna()[source]#

Perform DNA tracing.

gaussian_filter(**kwargs) numpy.array[source]#

Apply Gaussian filter.

Parameters:

**kwargs – Arguments passed to ‘skimage.filter.gaussian(**kwargs)’.

get_ordered_trace_heights() None[source]#

Derive the pixel heights from the ordered trace self.ordered_trace list.

Gets the heights of each pixel in the ordered trace from the gaussian filtered image. The pixel coordinates for the ordered trace are stored in the ordered trace list as part of the class.

get_ordered_trace_cumulative_distances() None[source]#

Calculate the cumulative distances of each pixel in the self.ordered_trace list.

static coord_dist(coordinates: numpy.typing.NDArray, px_to_nm: float) numpy.typing.NDArray[source]#

Calculate the cumulative real distances between each pixel in a trace.

Take a Nx2 numpy array of (grid adjacent) coordinates and produce a list of cumulative distances in nanometres, travelling from pixel to pixel. 1D example: coordinates: [0, 0], [0, 1], [1, 1], [2, 2] cumulative distances: [0, 1, 2, 3.4142]. Counts diagonal connections as 1.4142 distance. Converts distances from pixels to nanometres using px_to_nm scaling factor. Note that the pixels have to be adjacent.

Parameters:
  • coordinates (npt.NDArray) – A Nx2 integer array of coordinates of the pixels of a trace from a binary trace image.

  • px_to_nm (float) – Pixel to nanometre scaling factor to allow for real length measurements of distances rather than pixels.

Returns:

Numpy array of length N containing the cumulative sum of distances (0 at the first entry, full molecule length at the last entry).

Return type:

npt.NDArray

get_disordered_trace() None[source]#

Create a skeleton for each of the grains in the image.

Uses my own skeletonisation function from tracingfuncs module. I (Joe) will eventually get round to editing this function to try to reduce the branching and to try to better trace from looped molecules.

linear_or_circular(traces) None[source]#

Determine whether molecule is circular or linear based on the local environment of each pixel from the trace.

This function is sensitive to branches from the skeleton so might need to implement a function to remove them.

Parameters:

traces (npt.NDArray) – The array of coordinates to be assessed.

get_ordered_traces()[source]#

Order a trace.

get_fitted_traces()[source]#

Create trace coordinates that are adjusted to lie along the highest points of each traced molecule.

static remove_duplicate_consecutive_tuples(tuple_list: list[tuple | numpy.typing.NDArray]) list[tuple][source]#

Remove duplicate consecutive tuples from a list.

Parameters:

tuple_list (list[tuple | npt.NDArray]) – List of tuples or numpy ndarrays to remove consecutive duplicates from.

Returns:

List of tuples with consecutive duplicates removed.

Return type:

list[Tuple]

Examples

For the list of tuples [(1, 2), (1, 2), (1, 2), (2, 3), (2, 3), (3, 4)], this function will return [(1, 2), (2, 3), (3, 4)]

get_splined_traces() None[source]#

Get a splined version of the fitted trace - useful for finding the radius of gyration etc.

This function actually calculates the average of several splines which is important for getting a good fit on the lower resolution data.

show_traces()[source]#

Plot traces.

saveTraceFigures(filename: str | pathlib.Path, channel_name: str, vmaxval: float | int, vminval: float | int, output_dir: str | pathlib.Path = None) None[source]#

Save the traces.

Parameters:
  • filename (str | Path) – Filename being processed.

  • channel_name (str) – Channel.

  • vmaxval (float | int) – Maximum value for height.

  • vminval (float | int) – Minimum value for height.

  • output_dir (str | Path) – Output directory.

_checkForSaveDirectory(filename: str, new_output_dir: str) str[source]#

Create output directory and updates filename to account for this.

Parameters:
  • filename (str) – Filename.

  • new_output_dir (str) – Target directory.

Returns:

Updated output directory.

Return type:

str

find_curvature()[source]#

Calculate curvature of the molecule.

saveCurvature() None[source]#

Save curvature statistics.

plotCurvature(dna_num: int) None[source]#

Plot the curvature of the chosen molecule as a function of the contour length (in metres).

Parameters:

dna_num (int) – Molecule to plot, used for indexing.

measure_contour_length() None[source]#

Contour lengthof the splined trace taking into account whether the molecule is circular or linear.

Contour length units are nm.

measure_end_to_end_distance()[source]#

Calculate the Euclidean distance between the start and end of linear molecules.

The hypotenuse is calculated between the start ([0,0], [0,1]) and end ([-1,0], [-1,1]) of linear molecules. If the molecule is circular then the distance is set to zero (0).

topostats.tracing.dnatracing.trace_image(image: numpy.typing.NDArray, grains_mask: numpy.typing.NDArray, filename: str, pixel_to_nm_scaling: float, min_skeleton_size: int, skeletonisation_method: str, spline_step_size: float = 7e-09, spline_linear_smoothing: float = 5.0, spline_circular_smoothing: float = 0.0, pad_width: int = 1, cores: int = 1) dict[source]#

Processor function for tracing image.

Parameters:
  • image (npt.NDArray) – Full image as Numpy Array.

  • grains_mask (npt.NDArray) – Full image as Grains that are labelled.

  • filename (str) – File being processed.

  • pixel_to_nm_scaling (float) – Pixel to nm scaling.

  • min_skeleton_size (int) – Minimum size of grain in pixels after skeletonisation.

  • skeletonisation_method (str) – Method of skeletonisation, options are ‘zhang’ (scikit-image) / ‘lee’ (scikit-image) / ‘thin’ (scikitimage) or ‘topostats’ (original TopoStats method).

  • spline_step_size (float) – Step size for spline evaluation in metres.

  • spline_linear_smoothing (float) – Smoothness of linear splines.

  • spline_circular_smoothing (float) – Smoothness of circular splines.

  • pad_width (int) – Number of cells to pad arrays by, required to handle instances where grains touch the bounding box edges.

  • cores (int) – Number of cores to process with.

Returns:

Statistics from skeletonising and tracing the grains in the image.

Return type:

dict

topostats.tracing.dnatracing.round_splined_traces(splined_traces: dict) dict[source]#

Round a Dict of floating point coordinates to integer floating point coordinates.

Parameters:

splined_traces (dict) – Floating point coordinates to be rounded.

Returns:

Dictionary of rounded integer coordinates.

Return type:

dict

topostats.tracing.dnatracing.trim_array(array: numpy.typing.NDArray, pad_width: int) numpy.typing.NDArray[source]#

Trim an array by the specified pad_width.

Removes a border from an array. Typically this is the second padding that is added to the image/masks for edge cases that are near image borders and means traces will be correctly aligned as a mask for the original image.

Parameters:
  • array (npt.NDArray) – Numpy array to be trimmed.

  • pad_width (int) – Padding to be removed.

Returns:

Trimmed array.

Return type:

npt.NDArray

topostats.tracing.dnatracing.adjust_coordinates(coordinates: numpy.typing.NDArray, pad_width: int) numpy.typing.NDArray[source]#

Adjust coordinates of a trace by the pad_width.

A second padding is made to allow for grains that are “edge cases” and close to the bounding box edge. This adds the pad_width to the cropped grain array. In order to realign the trace with the original image we need to remove this padding so that when the coordinates are combined with the “grain_anchor”, which isn’t padded twice, the coordinates correctly align with the original image.

Parameters:
  • coordinates (npt.NDArray) – An array of trace coordinates (typically ordered).

  • pad_width (int) – The amount of padding used.

Returns:

Array of trace coordinates adjusted for secondary padding.

Return type:

npt.NDArray

topostats.tracing.dnatracing.trace_mask(grain_anchors: list[numpy.typing.NDArray], ordered_traces: dict[str, numpy.typing.NDArray], image_shape: tuple, pad_width: int) numpy.typing.NDArray[source]#

Place the traced skeletons into an array of the original image for plotting/overlaying.

Adjusts the coordinates back to the original position based on each grains anchor coordinates of the padded bounding box. Adjustments are made for the secondary padding that is made.

Parameters:
  • grain_anchors (List[npt.NDArray]) – List of grain anchors for the padded bounding box.

  • ordered_traces (Dict[npt.NDArray]) – Coordinates for each grain trace. Dict of coordinates for each grains trace.

  • image_shape (tuple) – Shape of original image.

  • pad_width (int) – The amount of padding used on the image.

Returns:

Mask of traces for all grains that can be overlaid on original image.

Return type:

npt.NDArray

topostats.tracing.dnatracing.prep_arrays(image: numpy.typing.NDArray, labelled_grains_mask: numpy.typing.NDArray, pad_width: int) tuple[dict[int, numpy.typing.NDArray], dict[int, numpy.typing.NDArray]][source]#

Take an image and labelled mask and crops individual grains and original heights to a list.

A second padding is made after cropping to ensure for “edge cases” where grains are close to bounding box edges that they are traced correctly. This is accounted for when aligning traces to the whole image mask.

Parameters:
  • image (npt.NDArray) – Gaussian filtered image. Typically filtered_image.images[“gaussian_filtered”].

  • labelled_grains_mask (npt.NDArray) – 2D Numpy array of labelled grain masks, with each mask being comprised solely of unique integer (not zero). Typically this will be output from ‘grains.directions[<direction>[“labelled_region_02]’.

  • pad_width (int) – Cells by which to pad cropped regions by.

Returns:

Returns a tuple of two dictionaries, each consisting of cropped arrays.

Return type:

Tuple

topostats.tracing.dnatracing.grain_anchor(array_shape: tuple, bounding_box: list, pad_width: int) list[source]#

Extract anchor (min_row, min_col) from labelled regions and align individual traces over the original image.

Parameters:
  • array_shape (tuple) – Shape of original array.

  • bounding_box (list) – A list of region properties returned by ‘skimage.measure.regionprops()’.

  • pad_width (int) – Padding for image.

Returns:

A list of tuples of the min_row, min_col of each bounding box.

Return type:

list(Tuple)

topostats.tracing.dnatracing.trace_grain(cropped_image: numpy.typing.NDArray, cropped_mask: numpy.typing.NDArray, pixel_to_nm_scaling: float, filename: str = None, min_skeleton_size: int = 10, skeletonisation_method: str = 'topostats', spline_step_size: float = 7e-09, spline_linear_smoothing: float = 5.0, spline_circular_smoothing: float = 0.0, n_grain: int = None) dict[source]#

Trace an individual grain.

Tracing involves multiple steps…

  1. Skeletonisation

  2. Pruning of side branch artefacts from skeletonisation.

  3. Ordering of the skeleton.

  4. Determination of molecule shape.

  5. Jiggling/Fitting

  6. Splining to improve resolution of image.

Parameters:
  • cropped_image (npt.NDArray) – Cropped array from the original image defined as the bounding box from the labelled mask.

  • cropped_mask (npt.NDArray) – Cropped array from the labelled image defined as the bounding box from the labelled mask. This should have been converted to a binary mask.

  • pixel_to_nm_scaling (float) – Pixel to nm scaling.

  • filename (str) – File being processed.

  • min_skeleton_size (int) – Minimum size of grain in pixels after skeletonisation.

  • skeletonisation_method (str) – Method of skeletonisation, options are ‘zhang’ (scikit-image) / ‘lee’ (scikit-image) / ‘thin’ (scikitimage) or ‘topostats’ (original TopoStats method).

  • spline_step_size (float) – Step size for spline evaluation in metres.

  • spline_linear_smoothing (float) – Smoothness of linear splines.

  • spline_circular_smoothing (float) – Smoothness of circular splines.

  • n_grain (int) – Grain number being processed.

Returns:

Dictionary of the contour length, whether the image is circular or linear, the end-to-end distance and an array of coordinates.

Return type:

dict

topostats.tracing.dnatracing.crop_array(array: numpy.typing.NDArray, bounding_box: tuple, pad_width: int = 0) numpy.typing.NDArray[source]#

Crop an array.

Ideally we pad the array that is being cropped so that we have heights outside of the grains bounding box. However, in some cases, if an grain is near the edge of the image scan this results in requesting indexes outside of the existing image. In which case we get as much of the image padded as possible.

Parameters:
  • array (npt.NDArray) – 2D Numpy array to be cropped.

  • bounding_box (Tuple) – Tuple of coordinates to crop, should be of form (min_row, min_col, max_row, max_col).

  • pad_width (int) – Padding to apply to bounding box.

Returns:

Cropped array.

Return type:

npt.NDArray()

topostats.tracing.dnatracing.pad_bounding_box(array_shape: tuple, bounding_box: list, pad_width: int) list[source]#

Pad coordinates, if they extend beyond image boundaries stop at boundary.

Parameters:
  • array_shape (tuple) – Shape of original image.

  • bounding_box (list) – List of coordinates ‘min_row’, ‘min_col’, ‘max_row’, ‘max_col’.

  • pad_width (int) – Cells to pad arrays by.

Returns:

List of padded coordinates.

Return type:

list