topostats.processing#

Functions for processing data.

Attributes#

Classes#

Filters

Class for filtering scans.

Grains

Find grains in an image.

GrainStats

Class for calculating grain stats.

Images

Plots image arrays.

Functions#

get_out_path(→ pathlib.Path)

Add the image path relative to the base directory to the output directory.

save_topostats_file(→ None)

Save a topostats dictionary object to a .topostats (hdf5 format) file.

setup_logger(→ logging.Logger)

Logger setup.

add_pixel_to_nm_to_plotting_config(→ dict)

Add the pixel to nanometre scaling factor to plotting configs.

image_statistics(→ pandas.DataFrame)

Calculate statistics pertaining to the whole image.

trace_image(→ dict)

Processor function for tracing image.

create_empty_dataframe(→ pandas.DataFrame)

Create an empty data frame for returning when no results are found.

run_filters(→ numpy.ndarray | None)

Filter and flatten an image. Optionally plots the results, returning the flattened image.

run_grains(image, pixel_to_nm_scaling, filename, ...)

Identify grains (molecules) and optionally plots the results.

run_grainstats(image, pixel_to_nm_scaling, ...)

Calculate grain statistics for an image and optionally plots the results.

run_dnatracing(image, grain_masks, ...[, results_df])

Trace DNA molecule for the supplied grains adding results to statistics data frames and optionally plot results.

get_out_paths(image_path, base_dir, output_dir, ...)

Determine components of output paths for a given image and plotting config.

process_scan(→ tuple[dict, pandas.DataFrame, dict])

Process a single image, filtering, finding grains and calculating their statistics.

check_run_steps(→ None)

Check options for running steps (Filter, Grain, Grainstats and DNA tracing) are logically consistent.

completion_message(→ None)

Print a completion message summarising images processed.

Module Contents#

topostats.processing.__version__#
class topostats.processing.Filters(image: numpy.typing.NDArray, filename: str, pixel_to_nm_scaling: float, row_alignment_quantile: float = 0.5, threshold_method: str = 'otsu', otsu_threshold_multiplier: float = 1.7, threshold_std_dev: dict = None, threshold_absolute: dict = None, gaussian_size: float = None, gaussian_mode: str = 'nearest', remove_scars: dict = None)[source]#

Class for filtering scans.

Parameters:
  • image (npt.NDArray) – The raw image from the Atomic Force Microscopy machine.

  • filename (str) – The filename (used in logging only).

  • pixel_to_nm_scaling (float) – Value for converting pixels to nanometers.

  • row_alignment_quantile (float) – Quantile (0.0 to 1.0) to be used to determine the average background for the image below values may improve flattening of large features.

  • threshold_method (str) – Method for thresholding, default ‘otsu’, valid options ‘otsu’, ‘std_dev’ and ‘absolute’.

  • otsu_threshold_multiplier (float) – Value for scaling the derived Otsu threshold.

  • threshold_std_dev (dict) – If using the ‘std_dev’ threshold method. Dictionary that contains above and below threshold values for the number of standard deviations from the mean to threshold.

  • threshold_absolute (dict) – If using the ‘absolute’ threshold method. Dictionary that contains above and below absolute threshold values for flattening.

  • gaussian_size (float) – If using the ‘absolute’ threshold method. Dictionary that contains above and below absolute threshold values for flattening.

  • gaussian_mode (str) – Method passed to ‘skimage.filters.gaussian(mode = gaussian_mode)’.

  • remove_scars (dict) – Dictionary containing configuration parameters for the scar removal function.

median_flatten(image: numpy.typing.NDArray, mask: numpy.typing.NDArray = None, row_alignment_quantile: float = 0.5) numpy.typing.NDArray[source]#

Flatten images using median differences.

Flatten the rows of an image, aligning the rows and centering the median around zero. When used with a mask, this has the effect of centering the background data on zero.

Note this function does not handle scars.

Parameters:
  • image (npt.NDArray) – 2-D image of the data to align the rows of.

  • mask (npt.NDArray) – Boolean array of points to mask (ignore).

  • row_alignment_quantile (float) – Quantile (in the range 0.0 to 1.0) used for defining the average background.

Returns:

Copy of the input image with rows aligned.

Return type:

npt.NDArray

remove_tilt(image: numpy.typing.NDArray, mask: numpy.typing.NDArray = None) numpy.typing.NDArray[source]#

Remove the planar tilt from an image (linear in 2D spaces).

Uses a linear fit of the medians of the rows and columns to determine the linear slants in x and y directions and then subtracts the fit from the columns.

Parameters:
  • image (npt.NDArray) – 2-D image of the data to remove the planar tilt from.

  • mask (npt.NDArray) – Boolean array of points to mask (ignore).

Returns:

Numpy array of image with tilt removed.

Return type:

npt.NDArray

remove_nonlinear_polynomial(image: numpy.typing.NDArray, mask: numpy.typing.NDArray | None = None) numpy.typing.NDArray[source]#

Fit and remove a “saddle” shaped nonlinear polynomial from the image.

“Saddles” with the form a + b * x * y - c * x - d * y from the supplied image. AFM images sometimes contain a “saddle” shape trend to their background, and so to remove them we fit a nonlinear polynomial of x and y and then subtract the fit from the image.

If these trends are not removed, then the image will not flatten properly and will leave opposite diagonal corners raised or lowered.

Parameters:
  • image (npt.NDArray) – 2-D numpy height-map array of floats with a polynomial trend to remove.

  • mask (npt.NDArray, optional) – 2-D Numpy boolean array used to mask any points in the image that are deemed not to be part of the height-map’s background data.

Returns:

Image with the polynomial trend subtracted.

Return type:

npt.NDArray

remove_quadratic(image: numpy.typing.NDArray, mask: numpy.typing.NDArray = None) numpy.typing.NDArray[source]#

Remove the quadratic bowing that can be seen in some large-scale AFM images.

Use a simple quadratic fit on the medians of the columns of the image and then subtracts the calculated quadratic from the columns.

Parameters:
  • image (npt.NDArray) – 2-D image of the data to remove the quadratic from.

  • mask (npt.NDArray) – Boolean array of points to mask (ignore).

Returns:

Image with the quadratic bowing removed.

Return type:

npt.NDArray

static calc_diff(array: numpy.typing.NDArray) numpy.typing.NDArray[source]#

Calculate the difference between the last and first rows of a 2-D array.

Parameters:

array (npt.NDArray) – A Numpy array.

Returns:

An array of the difference between the last and first rows of an array.

Return type:

npt.NDArray

calc_gradient(array: numpy.typing.NDArray, shape: int) numpy.typing.NDArray[source]#

Calculate the gradient of an array.

Parameters:
  • array (npt.NDArray) – Array for gradient to be calculated.

  • shape (int) – Shape of the array.

Returns:

Gradient across the array.

Return type:

npt.NDArray

average_background(image: numpy.typing.NDArray, mask: numpy.typing.NDArray = None) numpy.typing.NDArray[source]#

Zero the background by subtracting the non-masked mean from all pixels.

Parameters:
  • image (npt.NDArray) – Numpy array representing the image.

  • mask (npt.NDArray) – Mask of the array, should have the same dimensions as image.

Returns:

Numpy array of image zero averaged.

Return type:

npt.NDArray

gaussian_filter(image: numpy.typing.NDArray, **kwargs) numpy.typing.NDArray[source]#

Apply Gaussian filter to an image.

Parameters:
  • image (npt.NDArray) – Numpy array representing the image.

  • **kwargs – Keyword arguments passed on to the skimage.filters.gaussian() function.

Returns:

Numpy array that represent the image after Gaussian filtering.

Return type:

npt.NDArray

filter_image() None[source]#

Process a single image, filtering, finding grains and calculating their statistics.

Returns:

Does not return anything.

Return type:

None

Examples

from topostats.io import LoadScan from topostats.topotracing import Filter, process_scan

filter = Filter(image=load_scan.image, … pixel_to_nm_scaling=load_scan.pixel_to_nm_scaling, … filename=load_scan.filename, … threshold_method=’otsu’) filter.filter_image()

class topostats.processing.Grains(image: numpy.typing.NDArray, filename: str, pixel_to_nm_scaling: float, unet_config: dict[str, str | int | float | tuple[int | None, int, int, int] | None] | None = None, threshold_method: str | None = None, otsu_threshold_multiplier: float | None = None, threshold_std_dev: dict | None = None, threshold_absolute: dict | None = None, absolute_area_threshold: dict | None = None, direction: str | None = None, smallest_grain_size_nm2: float | None = None, remove_edge_intersecting_grains: bool = True)[source]#

Find grains in an image.

Parameters:
  • image (npt.NDArray) – 2-D Numpy array of image.

  • filename (str) – File being processed (used in logging).

  • pixel_to_nm_scaling (float) – Scaling of pixels to nanometres.

  • unet_config (dict[str, str | int | float | tuple[int | None, int, int, int] | None]) –

    Configuration for the UNet model. model_path: str

    Path to the UNet model.

    grain_crop_padding: int

    Padding to add to the bounding box of the grain before cropping.

    upper_norm_bound: float

    Upper bound for normalising the image.

    lower_norm_bound: float

    Lower bound for normalising the image.

  • threshold_method (str) – Method for determining thershold to mask values, default is ‘otsu’.

  • otsu_threshold_multiplier (float) – Factor by which the below threshold is to be scaled prior to masking.

  • threshold_std_dev (dict) – Dictionary of ‘below’ and ‘above’ factors by which standard deviation is multiplied to derive the threshold if threshold_method is ‘std_dev’.

  • threshold_absolute (dict) – Dictionary of absolute ‘below’ and ‘above’ thresholds for grain finding.

  • absolute_area_threshold (dict) – Dictionary of above and below grain’s area thresholds.

  • direction (str) – Direction for which grains are to be detected, valid values are ‘above’, ‘below’ and ‘both’.

  • smallest_grain_size_nm2 (float) – Whether or not to remove grains that intersect the edge of the image.

  • remove_edge_intersecting_grains (bool) – Direction for which grains are to be detected, valid values are ‘above’, ‘below’ and ‘both’.

tidy_border(image: numpy.typing.NDArray, **kwargs) numpy.typing.NDArray[source]#

Remove grains touching the border.

Parameters:
  • image (npt.NDarray) – 2-D Numpy array representing the image.

  • **kwargs – Arguments passed to ‘skimage.segmentation.clear_border(**kwargs)’.

Returns:

2-D Numpy array of image without objects touching the border.

Return type:

npt.NDarray

static label_regions(image: numpy.typing.NDArray, background: int = 0) numpy.typing.NDArray[source]#

Label regions.

This method is used twice, once prior to removal of small regions and again afterwards which is why an image must be supplied rather than using ‘self’.

Parameters:
  • image (npt.NDArray) – 2-D Numpy array of image.

  • background (int) – Value used to indicate background of image. Default = 0.

Returns:

2-D Numpy array of image with regions numbered.

Return type:

npt.NDArray

calc_minimum_grain_size(image: numpy.typing.NDArray) float[source]#

Calculate the minimum grain size in pixels squared.

Very small objects are first removed via thresholding before calculating the below extreme.

Parameters:

image (npt.NDArray) – 2-D Numpy image from which to calculate the minimum grain size.

Returns:

Minimum grains size in pixels squared. If there are areas a value of -1 is returned.

Return type:

float

remove_noise(image: numpy.typing.NDArray, **kwargs) numpy.typing.NDArray[source]#

Remove noise which are objects smaller than the ‘smallest_grain_size_nm2’.

This ensures that the smallest objects ~1px are removed regardless of the size distribution of the grains.

Parameters:
  • image (npt.NDArray) – 2-D Numpy array to be cleaned.

  • **kwargs – Arguments passed to ‘skimage.morphology.remove_small_objects(**kwargs)’.

Returns:

2-D Numpy array of image with objects < smallest_grain_size_nm2 removed.

Return type:

npt.NDArray

remove_small_objects(image: numpy.array, **kwargs) numpy.typing.NDArray[source]#

Remove small objects from the input image.

Threshold determined by the minimum grain size, in pixels squared, of the classes initialisation.

Parameters:
  • image (np.array) – 2-D Numpy array to remove small objects from.

  • **kwargs – Arguments passed to ‘skimage.morphology.remove_small_objects(**kwargs)’.

Returns:

2-D Numpy array of image with objects < minimumm_grain_size removed.

Return type:

npt.NDArray

area_thresholding(image: numpy.typing.NDArray, area_thresholds: tuple) numpy.typing.NDArray[source]#

Remove objects larger and smaller than the specified thresholds.

Parameters:
  • image (npt.NDArray) – Image array where the background == 0 and grains are labelled as integers >0.

  • area_thresholds (tuple) – List of area thresholds (in nanometres squared, not pixels squared), first is the lower limit for size, second is the upper.

Returns:

Array with small and large objects removed.

Return type:

npt.NDArray

colour_regions(image: numpy.typing.NDArray, **kwargs) numpy.typing.NDArray[source]#

Colour the regions.

Parameters:
  • image (npt.NDArray) – 2-D array of labelled regions to be coloured.

  • **kwargs – Arguments passed to ‘skimage.color.label2rgb(**kwargs)’.

Returns:

Numpy array of image with objects coloured.

Return type:

np.array

static get_region_properties(image: numpy.array, **kwargs) list[source]#

Extract the properties of each region.

Parameters:
  • image (np.array) – Numpy array representing image.

  • **kwargs – Arguments passed to ‘skimage.measure.regionprops(**kwargs)’.

Returns:

List of region property objects.

Return type:

list

get_bounding_boxes(direction: str) dict[source]#

Derive a list of bounding boxes for each region from the derived region_properties.

Parameters:

direction (str) – Direction of threshold for which bounding boxes are being calculated.

Returns:

Dictionary of bounding boxes indexed by region area.

Return type:

dict

find_grains()[source]#

Find grains.

static improve_grain_segmentation_unet(filename: str, direction: str, unet_config: dict[str, str | int | float | tuple[int | None, int, int, int] | None], image: numpy.typing.NDArray, labelled_grain_regions: numpy.typing.NDArray) tuple[numpy.typing.NDArray, numpy.typing.NDArray][source]#

Use a UNet model to re-segment existing grains to improve their accuracy.

Parameters:
  • filename (str) – File being processed (used in logging).

  • direction (str) – Direction of threshold for which bounding boxes are being calculated.

  • unet_config (dict[str, str | int | float | tuple[int | None, int, int, int] | None]) –

    Configuration for the UNet model. model_path: str

    Path to the UNet model.

    grain_crop_padding: int

    Padding to add to the bounding box of the grain before cropping.

    upper_norm_bound: float

    Upper bound for normalising the image.

    lower_norm_bound: float

    Lower bound for normalising the image.

  • image (npt.NDArray) – 2-D Numpy array of image.

  • labelled_grain_regions (npt.NDArray) – 2-D Numpy array of labelled grain regions.

Returns:

  • npt.NDArray – NxNxC Numpy array of the UNet mask.

  • npt.NDArray – NxNxC Numpy array of the labelled regions from the UNet mask.

static keep_largest_labelled_region(labelled_image: numpy.typing.NDArray[numpy.int32]) numpy.typing.NDArray[numpy.bool_][source]#

Keep only the largest region in a labelled image.

Parameters:

labelled_image (npt.NDArray) – 2-D Numpy array of labelled regions.

Returns:

2-D Numpy boolean array of labelled regions with only the largest region.

Return type:

npt.NDArray

class topostats.processing.GrainStats(data: numpy.typing.NDArray, labelled_data: numpy.typing.NDArray, pixel_to_nanometre_scaling: float, direction: str, base_output_dir: str | pathlib.Path, image_name: str = None, edge_detection_method: str = 'binary_erosion', extract_height_profile: bool = False, cropped_size: float = -1, plot_opts: dict = None, metre_scaling_factor: float = 1e-09)[source]#

Class for calculating grain stats.

Parameters:
  • data (npt.NDArray) – 2D Numpy array containing the flattened afm image. Data in this 2D array is floating point.

  • labelled_data (npt.NDArray) – 2D Numpy array containing all the grain masks in the image. Data in this 2D array is boolean.

  • pixel_to_nanometre_scaling (float) – Floating point value that defines the scaling factor between nanometres and pixels.

  • direction (str) – Direction for which grains have been detected (“above” or “below”).

  • base_output_dir (Path) – Path to the folder that will store the grain stats output images and data.

  • image_name (str) – The name of the file being processed.

  • edge_detection_method (str) – Method used for detecting the edges of grain masks before calculating statistics on them. Do not change unless you know exactly what this is doing. Options: “binary_erosion”, “canny”.

  • extract_height_profile (bool) – Extract the height profile.

  • cropped_size (float) – Length of square side (in nm) to crop grains to.

  • plot_opts (dict) – Plotting options dictionary for the cropped grains.

  • metre_scaling_factor (float) – Multiplier to convert the current length scale to metres. Default: 1e-9 for the usual AFM length scale of nanometres.

static get_angle(point_1: tuple, point_2: tuple) float[source]#

Calculate the angle in radians between two points.

Parameters:
  • point_1 (tuple) – Coordinate vectors for the first point to find the angle between.

  • point_2 (tuple) – Coordinate vectors for the second point to find the angle between.

Returns:

The angle in radians between the two input vectors.

Return type:

float

static is_clockwise(p_1: tuple, p_2: tuple, p_3: tuple) bool[source]#

Determine if three points make a clockwise or counter-clockwise turn.

Parameters:
  • p_1 (tuple) – First point to be used to calculate turn.

  • p_2 (tuple) – Second point to be used to calculate turn.

  • p_3 (tuple) – Third point to be used to calculate turn.

Returns:

Indicator of whether turn is clockwise.

Return type:

boolean

calculate_stats()[source]#

Calculate the stats of grains in the labelled image.

Returns:

Consists of a pd.DataFrame containing all the grain stats that have been calculated for the labelled image and a list of dictionaries containing grain data to be plotted.

Return type:

tuple

static calculate_points(grain_mask: numpy.typing.NDArray) list[source]#

Convert a 2D boolean array to a list of coordinates.

Parameters:

grain_mask (npt.NDArray) – A 2D numpy array image of a grain. Data in the array must be boolean.

Returns:

A python list containing the coordinates of the pixels in the grain.

Return type:

list

static calculate_edges(grain_mask: numpy.typing.NDArray, edge_detection_method: str) list[source]#

Convert 2D boolean array to list of the coordinates of the edges of the grain.

Parameters:
  • grain_mask (npt.NDArray) – A 2D numpy array image of a grain. Data in the array must be boolean.

  • edge_detection_method (str) – Method used for detecting the edges of grain masks before calculating statistics on them. Do not change unless you know exactly what this is doing. Options: “binary_erosion”, “canny”.

Returns:

List containing the coordinates of the edges of the grain.

Return type:

list

calculate_radius_stats(edges: list, points: list) tuple[float][source]#

Calculate the radius of grains.

The radius in this context is the distance from the centroid to points on the edge of the grain.

Parameters:
  • edges (list) – A 2D python list containing the coordinates of the edges of a grain.

  • points (list) – A 2D python list containing the coordinates of the points in a grain.

Returns:

A tuple of the minimum, maximum, mean and median radius of the grain.

Return type:

tuple[float]

static _calculate_centroid(points: numpy.array) tuple[source]#

Calculate the centroid of a bounding box.

Parameters:

points (list) – A 2D python list containing the coordinates of the points in a grain.

Returns:

The coordinates of the centroid.

Return type:

tuple

static _calculate_displacement(edges: numpy.typing.NDArray, centroid: tuple) numpy.typing.NDArray[source]#

Calculate the displacement between the edges and centroid.

Parameters:
  • edges (npt.NDArray) – Coordinates of the edge points.

  • centroid (tuple) – Coordinates of the centroid.

Returns:

Array of displacements.

Return type:

npt.NDArray

static _calculate_radius(displacements: list[list]) numpy.typing.NDarray[source]#

Calculate the radius of each point from the centroid.

Parameters:

displacements (List[list]) – A list of displacements.

Returns:

Array of radii of each point from the centroid.

Return type:

npt.NDarray

convex_hull(edges: list, base_output_dir: pathlib.Path, debug: bool = False) tuple[list, list, list][source]#

Calculate a grain’s convex hull.

Based off of the Graham Scan algorithm and should ideally scale in time with O(nlog(n)).

Parameters:
  • edges (list) – A python list containing the coordinates of the edges of the grain.

  • base_output_dir (Path) – Directory to save output to.

  • debug (bool) – Default false. If true, debug information will be displayed to the terminal and plots for the convex hulls and edges will be saved.

Returns:

A hull (list) of the coordinates of each point on the hull. Hull indices providing a way to find the points from the hill inside the edge list that was passed. Simplices (list) of tuples each representing a simplex of the convex hull, these are sorted in a counter-clockwise order.

Return type:

tuple[list, list, list]

calculate_squared_distance(point_2: tuple, point_1: tuple = None) float[source]#

Calculate the squared distance between two points.

Used for distance sorting purposes and therefore does not perform a square root in the interests of efficiency.

Parameters:
  • point_2 (tuple) – The point to find the squared distance to.

  • point_1 (tuple) – Optional - defaults to the starting point defined in the graham_scan() function. The point to find the squared distance from.

Returns:

The squared distance between the two points.

Return type:

float

sort_points(points: list) list[source]#

Sort points in counter-clockwise order of angle made with the starting point.

Parameters:

points (list) – A python list of the coordinates to sort.

Returns:

Points (coordinates) sorted counter-clockwise.

Return type:

list

get_start_point(edges: numpy.typing.NDArray) None[source]#

Determine the index of the bottom most point of the hull when sorted by x-position.

Parameters:

edges (npt.NDArray) – Array of coordinates.

graham_scan(edges: list) tuple[list, list, list][source]#

Construct the convex hull using the Graham Scan algorithm.

Ideally this algorithm will take O( n * log(n) ) time.

Parameters:

edges (list) – A python list of coordinates that make up the edges of the grain.

Returns:

A hull (list) of the coordinates of each point on the hull. Hull indices providing a way to find the points from the hill inside the edge list that was passed. Simplices (list) of tuples each representing a simplex of the convex hull, these are sorted in a counter-clockwise order.

Return type:

tuple[list, list, list]

static plot(edges: list, convex_hull: list = None, file_path: pathlib.Path = None) None[source]#

Plot and save the coordinates of the edges in the grain and optionally the hull.

Parameters:
  • edges (list) – A list of points to be plotted.

  • convex_hull (list) – Optional argument. A list of points that form the convex hull. Will be plotted with the coordinates if provided.

  • file_path (Path) – Path of the file to save the plot as.

calculate_aspect_ratio(edges: list, hull_simplices: numpy.typing.NDArray, path: pathlib.Path, debug: bool = False) tuple[source]#

Calculate the width, length and aspect ratio of the smallest bounding rectangle of a grain.

Parameters:
  • edges (list) – A python list of coordinates of the edge of the grain.

  • hull_simplices (npt.NDArray) – A 2D numpy array of simplices that the hull is comprised of.

  • path (Path) – Path to the save folder for the grain.

  • debug (bool) – If true, various plots will be saved for diagnostic purposes.

Returns:

The smallest_bouning_width (float) in pixels (not nanometres) of the smallest bounding rectangle for the grain. The smallest_bounding_length (float) in pixels (not nanometres), of the smallest bounding rectangle for the grain. And the aspect_ratio (float) the width divided by the length of the smallest bounding rectangle for the grain. It will always be greater or equal to 1.

Return type:

tuple

static find_cartesian_extremes(rotated_points: numpy.typing.NDArray) dict[source]#

Find the limits of x and y of rotated points.

Parameters:

rotated_points (npt.NDArray) – 2-D array of rotated points.

Returns:

Dictionary of the x and y min and max.__annotations__.

Return type:

Dict

static get_shift(coords: numpy.typing.NDArray, shape: numpy.typing.NDArray) int[source]#

Obtain the coordinate shift to reflect the cropped image box for molecules near the edges of the image.

Parameters:
  • coords (npt.NDArray) – Value representing integer coordinates which may be outside of the image.

  • shape (npt.NDArray) – Array of the shape of an image.

Returns:

Max value of the shift to reflect the croped region so it stays within the image.

Return type:

np.int64

get_cropped_region(image: numpy.typing.NDArray, length: int, centre: numpy.typing.NDArray) numpy.typing.NDArray[source]#

Crop the image with respect to a given pixel length around the centre coordinates.

Parameters:
  • image (npt.NDArray) – The image array.

  • length (int) – The length (in pixels) of the resultant cropped image.

  • centre (npt.NDArray) – The centre of the object to crop.

Returns:

Cropped array of the image.

Return type:

npt.NDArray

topostats.processing.get_out_path(image_path: str | pathlib.Path = None, base_dir: str | pathlib.Path = None, output_dir: str | pathlib.Path = None) pathlib.Path[source]#

Add the image path relative to the base directory to the output directory.

Parameters:
  • image_path (Path) – The path of the current image.

  • base_dir (Path) – Directory to recursively search for files.

  • output_dir (Path) – The output directory specified in the configuration file.

Returns:

The output path that mirrors the input path structure.

Return type:

Path

topostats.processing.save_topostats_file(output_dir: pathlib.Path, filename: str, topostats_object: dict) None[source]#

Save a topostats dictionary object to a .topostats (hdf5 format) file.

Parameters:
  • output_dir (Path) – Directory to save the .topostats file in.

  • filename (str) – File name of the .topostats file.

  • topostats_object (dict) – Dictionary of the topostats data to save. Must include a flattened image and pixel to nanometre scaling factor. May also include grain masks.

topostats.processing.LOGGER_NAME = 'topostats'#
topostats.processing.setup_logger(log_name: str = LOGGER_NAME) logging.Logger[source]#

Logger setup.

The logger for the module is initialised when the module is loaded (as this functions is called from __init__.py). This creates two stream handlers, one for general output and one for errors which are formatted differently (there is greater information in the error formatter). To use in modules import the ‘LOGGER_NAME’ and create a logger as shown in the Examples, it will inherit the formatting and direction of messages to the correct stream.

Parameters:

log_name (str) – Name under which logging information occurs.

Returns:

Logger object.

Return type:

logging.Logger

Examples

To use the logger in (sub-)modules have the following.

import logging from topostats.logs.logs import LOGGER_NAME

LOGGER = logging.getLogger(LOGGER_NAME)

LOGGER.info(‘This is a log message.’)

class topostats.processing.Images(data: numpy.array, output_dir: str | pathlib.Path, filename: str, style: str | pathlib.Path = None, pixel_to_nm_scaling: float = 1.0, masked_array: numpy.array = None, title: str = None, image_type: str = 'non-binary', image_set: str = 'core', core_set: bool = False, pixel_interpolation: str | None = None, cmap: str | None = None, mask_cmap: str = 'jet_r', region_properties: dict = None, zrange: list = None, colorbar: bool = True, axes: bool = True, num_ticks: tuple[int | None] = (None, None), save: bool = True, savefig_format: str | None = None, histogram_log_axis: bool = True, histogram_bins: int | None = None, savefig_dpi: str | float | None = None)[source]#

Plots image arrays.

Parameters:
  • data (np.array) – Numpy array to plot.

  • output_dir (Union[str, Path]) – Output directory to save the file to.

  • filename (Union[str, Path]) – Filename to save image as.

  • style (dict) – Filename of matploglibrc Params.

  • pixel_to_nm_scaling (float) – The scaling factor showing the real length of 1 pixel, in nm.

  • masked_array (npt.NDArray) – Optional mask array to overlay onto an image.

  • title (str) – Title for plot.

  • image_type (str) – The image data type - binary or non-binary.

  • image_set (str) – The set of images to process - core or all.

  • core_set (bool) – Flag to identify image as part of the core image set or not.

  • pixel_interpolation (str | None) – Interpolation to use (default: None).

  • cmap (str) – Colour map to use (default ‘nanoscope’, ‘afmhot’ also available).

  • mask_cmap (str) – Colour map to use for the secondary (masked) data (default ‘jet_r’, ‘blu’ proivides more contrast).

  • region_properties (dict) – Dictionary of region properties, adds bounding boxes if specified.

  • zrange (list) – Lower and upper bound to clip core images to.

  • colorbar (bool) – Optionally add a colorbar to plots, default is False.

  • axes (bool) – Optionally add/remove axes from the image.

  • num_ticks (tuple[int | None]) – The number of x and y ticks to display on the image.

  • save (bool) – Whether to save the image.

  • savefig_format (str) – Format to save the image as.

  • histogram_log_axis (bool) – Optionally use a logarithmic y axis for the histogram plots.

  • histogram_bins (int) – Number of bins for histograms to use.

  • savefig_dpi (str | float | None) – The resolution of the saved plot (default ‘figure’).

plot_histogram_and_save() tuple | None[source]#

Plot and save a histogram of the height map.

Returns:

Matplotlib.pyplot figure object and Matplotlib.pyplot axes object.

Return type:

tuple | None

plot_and_save()[source]#

Plot and save the image.

Returns:

Matplotlib.pyplot figure object and Matplotlib.pyplot axes object.

Return type:

tuple

save_figure()[source]#

Save figures as plt.savefig objects.

Returns:

Matplotlib.pyplot figure object and Matplotlib.pyplot axes object.

Return type:

tuple

topostats.processing.add_pixel_to_nm_to_plotting_config(plotting_config: dict, pixel_to_nm_scaling: float) dict[source]#

Add the pixel to nanometre scaling factor to plotting configs.

Ensures plots are in nanometres and not pixels.

Parameters:
  • plotting_config (dict) – TopoStats plotting configuration dictionary.

  • pixel_to_nm_scaling (float) – Pixel to nanometre scaling factor for the image.

Returns:

Updated plotting config with the pixel to nanometre scaling factor applied to all the image configurations.

Return type:

dict

topostats.processing.image_statistics(image: numpy.ndarray, filename: str, pixel_to_nm_scaling: float, results_df: pandas.DataFrame) pandas.DataFrame[source]#

Calculate statistics pertaining to the whole image.

Calculates the size of the image in pixels and metres, the root-mean-squared roughness and the grains per metre squared.

Parameters:
  • image (np.ndarray) – Numpy 2D image array of the image to calculate stats for.

  • filename (str) – The name of the file being processed.

  • pixel_to_nm_scaling (float) – Float of the scaling factor between pixels and nanometres.

  • results_df (pd.DataFrame) – Pandas DataFrame of statistics pertaining to individual grains including from grainstats and dna tracing.

Returns:

Dictionary of image statistics.

Return type:

dict

topostats.processing.trace_image(image: numpy.typing.NDArray, grains_mask: numpy.typing.NDArray, filename: str, pixel_to_nm_scaling: float, min_skeleton_size: int, skeletonisation_method: str, spline_step_size: float = 7e-09, spline_linear_smoothing: float = 5.0, spline_circular_smoothing: float = 0.0, pad_width: int = 1, cores: int = 1) dict[source]#

Processor function for tracing image.

Parameters:
  • image (npt.NDArray) – Full image as Numpy Array.

  • grains_mask (npt.NDArray) – Full image as Grains that are labelled.

  • filename (str) – File being processed.

  • pixel_to_nm_scaling (float) – Pixel to nm scaling.

  • min_skeleton_size (int) – Minimum size of grain in pixels after skeletonisation.

  • skeletonisation_method (str) – Method of skeletonisation, options are ‘zhang’ (scikit-image) / ‘lee’ (scikit-image) / ‘thin’ (scikitimage) or ‘topostats’ (original TopoStats method).

  • spline_step_size (float) – Step size for spline evaluation in metres.

  • spline_linear_smoothing (float) – Smoothness of linear splines.

  • spline_circular_smoothing (float) – Smoothness of circular splines.

  • pad_width (int) – Number of cells to pad arrays by, required to handle instances where grains touch the bounding box edges.

  • cores (int) – Number of cores to process with.

Returns:

Statistics from skeletonising and tracing the grains in the image.

Return type:

dict

topostats.processing.create_empty_dataframe(columns: set = ALL_STATISTICS_COLUMNS, index: str = 'molecule_number') pandas.DataFrame[source]#

Create an empty data frame for returning when no results are found.

Parameters:
  • columns (list) – Columns of the empty dataframe.

  • index (str) – Column to set as index of empty dataframe.

Returns:

Empty Pandas DataFrame.

Return type:

pd.DataFrame

topostats.processing.LOGGER#
topostats.processing.run_filters(unprocessed_image: numpy.ndarray, pixel_to_nm_scaling: float, filename: str, filter_out_path: pathlib.Path, core_out_path: pathlib.Path, filter_config: dict, plotting_config: dict) numpy.ndarray | None[source]#

Filter and flatten an image. Optionally plots the results, returning the flattened image.

Parameters:
  • unprocessed_image (np.ndarray) – Image to be flattened.

  • pixel_to_nm_scaling (float) – Scaling factor for converting pixel length scales to nanometres. ie the number of pixels per nanometre.

  • filename (str) – File name for the image.

  • filter_out_path (Path) – Output directory for step-by-step flattening plots.

  • core_out_path (Path) – General output directory for outputs such as the flattened image.

  • filter_config (dict) – Dictionary of configuration for the Filters class to use when initialised.

  • plotting_config (dict) – Dictionary of configuration for plotting output images.

Returns:

Either a numpy array of the flattened image, or None if an error occurs or flattening is disabled in the configuration.

Return type:

Union[np.ndarray, None]

topostats.processing.run_grains(image: numpy.ndarray, pixel_to_nm_scaling: float, filename: str, grain_out_path: pathlib.Path, core_out_path: pathlib.Path, plotting_config: dict, grains_config: dict)[source]#

Identify grains (molecules) and optionally plots the results.

Parameters:
  • image (np.ndarray) – 2d numpy array image to find grains in.

  • pixel_to_nm_scaling (float) – Scaling factor for converting pixel length scales to nanometres. I.e. the number of pixels per nanometre.

  • filename (str) – Name of file being processed (used in logging).

  • grain_out_path (Path) – Output path for step-by-step grain finding plots.

  • core_out_path (Path) – General output directory for outputs such as the flattened image with grain masks overlaid.

  • plotting_config (dict) – Dictionary of configuration for plotting images.

  • grains_config (dict) – Dictionary of configuration for the Grains class to use when initialised.

Returns:

Either None in the case of error or grain finding being disabled or a dictionary with keys of “above” and or “below” containing binary masks depicting where grains have been detected.

Return type:

Union[dict, None]

topostats.processing.run_grainstats(image: numpy.ndarray, pixel_to_nm_scaling: float, grain_masks: dict, filename: str, grainstats_config: dict, plotting_config: dict, grain_out_path: pathlib.Path)[source]#

Calculate grain statistics for an image and optionally plots the results.

Parameters:
  • image (np.ndarray) – 2D numpy array image for grain statistics calculations.

  • pixel_to_nm_scaling (float) – Scaling factor for converting pixel length scales to nanometres. ie the number of pixels per nanometre.

  • grain_masks (dict) – Dictionary of grain masks, keys “above” or “below” with values of 2d numpy boolean arrays indicating the pixels that have been masked as grains.

  • filename (str) – Name of the image.

  • grainstats_config (dict) – Dictionary of configuration for the GrainStats class to be used when initialised.

  • plotting_config (dict) – Dictionary of configuration for plotting images.

  • grain_out_path (Path) – Directory to save optional grain statistics visual information to.

Returns:

A pandas DataFrame containing the statsistics for each grain. The index is the filename and grain number.

Return type:

pd.DataFrame

topostats.processing.run_dnatracing(image: numpy.ndarray, grain_masks: dict, pixel_to_nm_scaling: float, image_path: pathlib.Path, filename: str, core_out_path: pathlib.Path, grain_out_path: pathlib.Path, dnatracing_config: dict, plotting_config: dict, results_df: pandas.DataFrame = None)[source]#

Trace DNA molecule for the supplied grains adding results to statistics data frames and optionally plot results.

Parameters:
  • image (np.ndarray) – Image containing the DNA to pass to the dna tracing function.

  • grain_masks (dict) – Dictionary of grain masks, keys “above” or “below” with values of 2d numpy boolean arrays indicating the pixels that have been masked as grains.

  • pixel_to_nm_scaling (float) – Scaling factor for converting pixel length scales to nanometres. ie the number of pixels per nanometre.

  • image_path (Path) – Path to the image file. Used for DataFrame indexing.

  • filename (str) – Name of the image.

  • core_out_path (Path) – General output directory for outputs such as the grain statistics DataFrame.

  • grain_out_path (Path) – Directory to save optional dna tracing visual information to.

  • dnatracing_config (dict) – Dictionary configuration for the dna tracing function.

  • plotting_config (dict) – Dictionary configuration for plotting images.

  • results_df (pd.DataFrame) – Pandas DataFrame containing grain statistics.

Returns:

Pandas DataFrame containing grain statistics and dna tracing statistics. Keys are file path and molecule number.

Return type:

pd.DataFrame

topostats.processing.get_out_paths(image_path: pathlib.Path, base_dir: pathlib.Path, output_dir: pathlib.Path, filename: str, plotting_config: dict)[source]#

Determine components of output paths for a given image and plotting config.

Parameters:
  • image_path (Path) – Path of the image being processed.

  • base_dir (Path) – Path of the data folder.

  • output_dir (Path) – Base output directory for output data.

  • filename (str) – Name of the image being processed.

  • plotting_config (dict) – Dictionary of configuration for plotting images.

Returns:

Core output path for general file outputs, filter output path for flattening related files and grain output path for grain finding related files.

Return type:

tuple

topostats.processing.process_scan(topostats_object: dict, base_dir: str | pathlib.Path, filter_config: dict, grains_config: dict, grainstats_config: dict, dnatracing_config: dict, plotting_config: dict, output_dir: str | pathlib.Path = 'output') tuple[dict, pandas.DataFrame, dict][source]#

Process a single image, filtering, finding grains and calculating their statistics.

Parameters:
  • topostats_object (dict[str, Union[np.ndarray, Path, float]]) – A dictionary with keys ‘image’, ‘img_path’ and ‘px_2_nm’ containing a file or frames’ image, it’s path and it’s pixel to namometre scaling value.

  • base_dir (Union[str, Path]) – Directory to recursively search for files, if not specified the current directory is scanned.

  • filter_config (dict) – Dictionary of configuration options for running the Filter stage.

  • grains_config (dict) – Dictionary of configuration options for running the Grain detection stage.

  • grainstats_config (dict) – Dictionary of configuration options for running the Grain Statistics stage.

  • dnatracing_config (dict) – Dictionary of configuration options for running the DNA Tracing stage.

  • plotting_config (dict) – Dictionary of configuration options for plotting figures.

  • output_dir (Union[str, Path]) – Directory to save output to, it will be created if it does not exist. If it already exists then it is possible that output will be over-written.

Returns:

TopoStats dictionary object, DataFrame containing grain statistics and dna tracing statistics, and dictionary containing general image statistics.

Return type:

tuple[dict, pd.DataFrame, dict]

topostats.processing.check_run_steps(filter_run: bool, grains_run: bool, grainstats_run: bool, dnatracing_run: bool) None[source]#

Check options for running steps (Filter, Grain, Grainstats and DNA tracing) are logically consistent.

This checks that earlier steps required are enabled.

Parameters:
  • filter_run (bool) – Flag for running Filtering.

  • grains_run (bool) – Flag for running Grains.

  • grainstats_run (bool) – Flag for running GrainStats.

  • dnatracing_run (bool) – Flag for running DNA Tracing.

topostats.processing.completion_message(config: dict, img_files: list, summary_config: dict, images_processed: int) None[source]#

Print a completion message summarising images processed.

Parameters:
  • config (dict) – Configuration dictionary.

  • img_files (list()) – List of found image paths.

  • summary_config (dict() – Configuration for plotting summary statistics.

  • images_processed (int) – Pandas DataFrame of results.