topostats.tracing.dnatracing#

Perform DNA Tracing

Attributes#

Classes#

dnaTrace

This class gets all the useful functions from the old tracing code and staples

Functions#

trace_image(→ Dict)

Processor function for tracing image.

round_splined_traces(splined_traces)

Round a list of floating point coordinates to integer floating point coordinates.

trim_array(→ numpy.ndarray)

Trim an array by the specified pad_width.

adjust_coordinates(→ numpy.ndarray)

Adjust coordinates of a trace by the pad_width.

trace_mask(→ numpy.ndarray)

Place the traced skeletons into an array of the original image for plotting/overlaying.

prep_arrays(→ Tuple[list, list])

Takes an image and labelled mask and crops individual grains and original heights to a list.

grain_anchor(→ list)

Extract the anchor (min_row, min_col) from all labelled regions which is used to align individual traces over the

trace_grain(→ Dict)

Trace an individual grain.

crop_array(→ numpy.ndarray)

Crop an array.

pad_bounding_box(→ list)

Pad coordinates, if they extend beyond image boundaries stop at boundary.

Module Contents#

topostats.tracing.dnatracing.LOGGER#
class topostats.tracing.dnatracing.dnaTrace(image: numpy.ndarray, grain: numpy.ndarray, filename: str, pixel_to_nm_scaling: float, min_skeleton_size: int = 10, convert_nm_to_m: bool = True, skeletonisation_method: str = 'topostats', n_grain: int = None, spline_step_size: float = 7e-09, spline_linear_smoothing: float = 5.0, spline_circular_smoothing: float = 0.0, spline_quiet: bool = True, spline_degree: int = 3)#

This class gets all the useful functions from the old tracing code and staples them together to create an object that contains the traces for each DNA molecule in an image and functions to calculate stats from those traces.

The traces are stored in dictionaries labelled by their gwyddion defined grain number and are represented as numpy arrays.

The object also keeps track of the skeletonised plots and other intermediates in case these are useful for other things in the future.

2023-06-09 : This class has undergone some refactoring so that it works with a single grain. The trace_grain() helper function runs the class and returns the expected statistics whilst the trace_image() function handles processing all detected grains within an image. The original methods of skeletonisation are available along with additional methods from scikit-image.

Some bugs have been identified and corrected see commits for further details…

236750b2 2a79c4ff

image#
grain#
filename#
pixel_to_nm_scaling#
min_skeleton_size#
skeletonisation_method#
n_grain#
number_of_rows#
number_of_columns#
sigma#
gauss_image = None#
disordered_trace = None#
ordered_trace = None#
fitted_trace = None#
splined_trace = None#
contour_length#
end_to_end_distance#
mol_is_circular#
curvature#
spline_step_size: float#
spline_linear_smoothing: float#
spline_circular_smoothing: float#
spline_quiet: bool#
spline_degree: int#
neighbours = 5#
trace_dna()#

Perform DNA tracing.

gaussian_filter(**kwargs) numpy.array#

Apply Gaussian filter

get_disordered_trace()#

Create a skeleton for each of the grains in the image.

Uses my own skeletonisation function from tracingfuncs module. I will eventually get round to editing this function to try to reduce the branching and to try to better trace from looped molecules

linear_or_circular(traces)#

Determines whether each molecule is circular or linear based on the local environment of each pixel from the trace

This function is sensitive to branches from the skeleton so might need to implement a function to remove them

get_ordered_traces()#
get_fitted_traces()#

Create trace coordinates (for each identified molecule) that are adjusted to lie along the highest points of each traced molecule

static remove_duplicate_consecutive_tuples(tuple_list: list[tuple | numpy.ndarray]) list[tuple]#

Remove duplicate consecutive tuples from a list.

Eg: for the list of tuples [(1, 2), (1, 2), (1, 2), (2, 3), (2, 3), (3, 4)], this function will return [(1, 2), (2, 3), (3, 4)]

Parameters:

tuple_list (list[Union[tuple, np.ndarray]]) – List of tuples or numpy ndarrays to remove consecutive duplicates from.

Returns:

List of tuples with consecutive duplicates removed.

Return type:

list[Tuple]

get_splined_traces() None#

Gets a splined version of the fitted trace - useful for finding the radius of gyration etc.

This function actually calculates the average of several splines which is important for getting a good fit on the lower res data

show_traces()#
saveTraceFigures(filename: str | pathlib.Path, channel_name: str, vmaxval, vminval, output_dir: str | pathlib.Path = None)#
_checkForSaveDirectory(filename, new_output_dir)#
find_curvature()#
saveCurvature()#
plotCurvature(dna_num)#

Plot the curvature of the chosen molecule as a function of the contour length (in metres)

measure_contour_length()#

Measures the contour length for each of the splined traces taking into account whether the molecule is circular or linear

Contour length units are nm

measure_end_to_end_distance()#

Calculate the Euclidean distance between the start and end of linear molecules. The hypotenuse is calculated between the start ([0,0], [0,1]) and end ([-1,0], [-1,1]) of linear molecules. If the molecule is circular then the distance is set to zero (0).

topostats.tracing.dnatracing.trace_image(image: numpy.ndarray, grains_mask: numpy.ndarray, filename: str, pixel_to_nm_scaling: float, min_skeleton_size: int, skeletonisation_method: str, spline_step_size: float = 7e-09, spline_linear_smoothing: float = 5.0, spline_circular_smoothing: float = 0.0, pad_width: int = 1, cores: int = 1) Dict#

Processor function for tracing image.

Parameters:
  • image (np.ndarray) – Full image as Numpy Array.

  • grains_mask (np.ndarray) – Full image as Grains that are labelled.

  • filename (str) – File being processed

  • pixel_to_nm_scaling (float) – Pixel to nm scaling.

  • min_skeleton_size (int) – Minimum size of grain in pixels after skeletonisation.

  • skeletonisation_method (str) – Method of skeletonisation, options are ‘zhang’ (scikit-image) / ‘lee’ (scikit-image) / ‘thin’ (scikitimage) or ‘topostats’ (original TopoStats method)

  • spline_step_size (float = 7e-9,) – Step size for spline evaluation in metres.

  • spline_circular_smoothing (float = 0.0,) – Smoothness of circular splines

  • spline_linear_smoothing (float = 5.0,) – Smoothness of linear splines

  • pad_width (int) – Number of cells to pad arrays by, required to handle instances where grains touch the bounding box edges.

  • cores (int) – Number of cores to process with.

Returns:

Statistics from skeletonising and tracing the grains in the image.

Return type:

pd.DataFrame

topostats.tracing.dnatracing.round_splined_traces(splined_traces: list)#

Round a list of floating point coordinates to integer floating point coordinates. Note that if a trace has failed and is None, it will be skipped, so the indexes will NOT be correct.

Parameters:

splined_traces (list) – List of floating point coordinates, or Nones

Returns:

rounded_splined_traces – List of integer coordates, without Nones

Return type:

list

topostats.tracing.dnatracing.trim_array(array: numpy.ndarray, pad_width: int) numpy.ndarray#

Trim an array by the specified pad_width.

Removes a border from an array. Typically this is the second padding that is added to the image/masks for edge cases that are near image borders and means traces will be correctly aligned as a mask for the original image.

Parameters:
  • array (np.ndarray) – Numpy array to be trimmed.

  • pad_width (int) – Padding to be removed.

Returns:

Trimmed array

Return type:

np.ndarray

topostats.tracing.dnatracing.adjust_coordinates(coordinates: numpy.ndarray, pad_width: int) numpy.ndarray#

Adjust coordinates of a trace by the pad_width.

A second padding is made to allow for grains that are “edge cases” and close to the bounding box edge. This adds the pad_width to the cropped grain array. In order to realign the trace with the original image we need to remove this padding so that when the coordinates are combined with the “grain_anchor”, which isn’t padded twice, the coordinates correctly align with the original image.

Parameters:
  • coordinates (np.ndarray) – An array of trace coordinates (typically ordered).

  • pad_width (int) – The amount of padding used.

Returns:

Array of trace coordinates adjusted for secondary padding.

Return type:

np.ndarray

topostats.tracing.dnatracing.trace_mask(grain_anchors: List[numpy.ndarray], ordered_traces: List[numpy.ndarray], image_shape: tuple, pad_width: int) numpy.ndarray#

Place the traced skeletons into an array of the original image for plotting/overlaying.

Adjusts the coordinates back to the original position based on each grains anchor coordinates of the padded bounding box. Adjustments are made for the secondary padding that is made.

Parameters:
  • grain_anchors (List[np.ndarray]) – List of grain anchors for the padded bounding box.

  • ordered_traces (List[np.ndarray]) – List of coordinates for each grains trace.

  • image_shape (tuple) – Shape of original image.

  • pad_width (int) – The amount of padding used on the image.

Returns:

Mask of traces for all grains that can be overlaid on original image.

Return type:

np.ndarray

topostats.tracing.dnatracing.prep_arrays(image: numpy.ndarray, labelled_grains_mask: numpy.ndarray, pad_width: int) Tuple[list, list]#

Takes an image and labelled mask and crops individual grains and original heights to a list.

A second padding is made after cropping to ensure for “edge cases” where grains are close to bounding box edges that they are traced correctly. This is accounted for when aligning traces to the whole image mask.

Parameters:
  • image (np.ndarray) – Gaussian filtered image. Typically filtered_image.images[“gaussian_filtered”].

  • labelled_grains_mask (np.ndarray) – 2D Numpy array of labelled grain masks, with each mask being comprised solely of unique integer (not

  • grains.directions[<direction>["labelled_region_02]. (zero). Typically this will be output from)

  • pad_width (int) – Cells by which to pad cropped regions by.

Returns:

Returns a tuple of two lists, each consisting of cropped arrays.

Return type:

Tuple

topostats.tracing.dnatracing.grain_anchor(array_shape: tuple, bounding_box: list, pad_width: int) list#

Extract the anchor (min_row, min_col) from all labelled regions which is used to align individual traces over the original image.

Parameters:
  • array_shape (tuple) – Shape of original array.

  • bounding_box (list) – A list of region properties returned by skimage.measure.regionprops()

  • pad_width (int) – Padding for image.

Returns:

A list of tuples of the min_row, min_col of each bounding box.

Return type:

List(Tuple)

topostats.tracing.dnatracing.trace_grain(cropped_image: numpy.ndarray, cropped_mask: numpy.ndarray, pixel_to_nm_scaling: float, filename: str = None, min_skeleton_size: int = 10, skeletonisation_method: str = 'topostats', spline_step_size: float = 7e-09, spline_linear_smoothing: float = 5.0, spline_circular_smoothing: float = 0.0, n_grain: int = None) Dict#

Trace an individual grain.

Tracing involves multiple steps…

  1. Skeletonisation

  2. Pruning of side branch artefacts from skeletonisation.

  3. Ordering of the skeleton.

  4. Determination of molecule shape.

  5. Jiggling/Fitting

  6. Splining to improve resolution of image.

Parameters:
  • cropped_image (np.ndarray) – Cropped array from the original image defined as the bounding box from the labelled mask.

  • cropped_mask (np.ndarray) – Cropped array from the labelled image defined as the bounding box from the labelled mask. This should have been converted to a binary mask.

  • filename (str) – File being processed

  • pixel_to_nm_scaling (float) – Pixel to nm scaling.

  • min_skeleton_size (int) – Minimum size of grain in pixels after skeletonisation.

  • skeletonisation_method (str) – Method of skeletonisation, options are ‘zhang’ (scikit-image) / ‘lee’ (scikit-image) / ‘thin’ (scikitimage) or ‘topostats’ (original TopoStats method)

  • spline_step_size (float = 7e-9,) – Step size for spline evaluation in metres.

  • spline_circular_smoothing (float = 0.0,) – Smoothness of circular splines

  • spline_linear_smoothing (float = 5.0,) – Smoothness of linear splines

  • n_grain (int) – Grain number being processed.

Returns:

  • Dictionary – Dictionary of the contour length, whether the image is circular or linear, the end-to-end distance and an array

  • of coordinates.

topostats.tracing.dnatracing.crop_array(array: numpy.ndarray, bounding_box: tuple, pad_width: int = 0) numpy.ndarray#

Crop an array.

Ideally we pad the array that is being cropped so that we have heights outside of the grains bounding box. However, in some cases, if an grain is near the edge of the image scan this results in requesting indexes outside of the existing image. In which case we get as much of the image padded as possible.

Parameters:
  • array (np.ndarray) – 2D Numpy array to be cropped.

  • bounding_box (Tuple) – Tuple of coordinates to crop, should be of form (min_row, min_col, max_row, max_col).

  • pad_width (int) – Padding to apply to bounding box.

Returns:

Cropped array

Return type:

np.ndarray()

topostats.tracing.dnatracing.pad_bounding_box(array_shape: tuple, bounding_box: list, pad_width: int) list#

Pad coordinates, if they extend beyond image boundaries stop at boundary.

Parameters:
  • array_shape (tuple) – Shape of original image

  • bounding_box (list) – List of coordinates min_row, min_col, max_row, max_col

  • pad_width (int) – Cells to pad arrays by.

Returns:

List of padded coordinates

Return type:

list