topostats.processing#

Functions for processing data.

Attributes#

Functions#

run_filters(→ numpy.typing.NDArray | None)

Filter and flatten an image. Optionally plots the results, returning the flattened image.

run_grains(→ dict | None)

Identify grains (molecules) and optionally plots the results.

run_grainstats(image, pixel_to_nm_scaling, ...)

Calculate grain statistics for an image and optionally plots the results.

run_disordered_tracing(→ dict)

Skeletonise and prune grains, adding results to statistics data frames and optionally plot results.

run_nodestats(→ tuple[dict, pandas.DataFrame])

Analyse crossing points in grains adding results to statistics data frames and optionally plot results.

run_ordered_tracing(→ tuple)

Order coordinates of traces, adding results to statistics data frames and optionally plot results.

run_splining(→ tuple)

Smooth the ordered trace coordinates, adding results to statistics data frames and optionally plot results.

run_curvature_stats(→ dict | None)

Calculate curvature statistics for the traced DNA molecules.

get_out_paths(image_path, base_dir, output_dir, ...)

Determine components of output paths for a given image and plotting config.

process_scan(→ tuple[dict, pandas.DataFrame, dict])

Process a single image, filtering, finding grains and calculating their statistics.

check_run_steps(→ None)

Check options for running steps (Filter, Grain, Grainstats and DNA tracing) are logically consistent.

completion_message(→ None)

Print a completion message summarising images processed.

Module Contents#

topostats.processing.LOGGER#
topostats.processing.run_filters(unprocessed_image: numpy.typing.NDArray, pixel_to_nm_scaling: float, filename: str, filter_out_path: pathlib.Path, core_out_path: pathlib.Path, filter_config: dict, plotting_config: dict) numpy.typing.NDArray | None[source]#

Filter and flatten an image. Optionally plots the results, returning the flattened image.

Parameters:
  • unprocessed_image (npt.NDArray) – Image to be flattened.

  • pixel_to_nm_scaling (float) – Scaling factor for converting pixel length scales to nanometres. ie the number of pixels per nanometre.

  • filename (str) – File name for the image.

  • filter_out_path (Path) – Output directory for step-by-step flattening plots.

  • core_out_path (Path) – General output directory for outputs such as the flattened image.

  • filter_config (dict) – Dictionary of configuration for the Filters class to use when initialised.

  • plotting_config (dict) – Dictionary of configuration for plotting output images.

Returns:

Either a numpy array of the flattened image, or None if an error occurs or flattening is disabled in the configuration.

Return type:

npt.NDArray | None

topostats.processing.run_grains(image: numpy.typing.NDArray, pixel_to_nm_scaling: float, filename: str, grain_out_path: pathlib.Path, core_out_path: pathlib.Path, plotting_config: dict, grains_config: dict) dict | None[source]#

Identify grains (molecules) and optionally plots the results.

Parameters:
  • image (npt.NDArray) – 2d numpy array image to find grains in.

  • pixel_to_nm_scaling (float) – Scaling factor for converting pixel length scales to nanometres. I.e. the number of pixels per nanometre.

  • filename (str) – Name of file being processed (used in logging).

  • grain_out_path (Path) – Output path for step-by-step grain finding plots.

  • core_out_path (Path) – General output directory for outputs such as the flattened image with grain masks overlaid.

  • plotting_config (dict) – Dictionary of configuration for plotting images.

  • grains_config (dict) – Dictionary of configuration for the Grains class to use when initialised.

Returns:

Either None in the case of error or grain finding being disabled or a dictionary with keys of “above” and or “below” containing binary masks depicting where grains have been detected.

Return type:

dict | None

topostats.processing.run_grainstats(image: numpy.typing.NDArray, pixel_to_nm_scaling: float, grain_masks: dict, filename: str, basename: pathlib.Path, grainstats_config: dict, plotting_config: dict, grain_out_path: pathlib.Path)[source]#

Calculate grain statistics for an image and optionally plots the results.

Parameters:
  • image (npt.NDArray) – 2D numpy array image for grain statistics calculations.

  • pixel_to_nm_scaling (float) – Scaling factor for converting pixel length scales to nanometres. ie the number of pixels per nanometre.

  • grain_masks (dict) – Dictionary of grain masks, keys “above” or “below” with values of 2d numpy boolean arrays indicating the pixels that have been masked as grains.

  • filename (str) – Name of the image.

  • basename (Path) – Path to directory containing the image.

  • grainstats_config (dict) – Dictionary of configuration for the GrainStats class to be used when initialised.

  • plotting_config (dict) – Dictionary of configuration for plotting images.

  • grain_out_path (Path) – Directory to save optional grain statistics visual information to.

Returns:

A pandas DataFrame containing the statsistics for each grain. The index is the filename and grain number.

Return type:

pd.DataFrame

topostats.processing.run_disordered_tracing(image: numpy.typing.NDArray, grain_masks: dict, pixel_to_nm_scaling: float, filename: str, basename: str, core_out_path: pathlib.Path, tracing_out_path: pathlib.Path, disordered_tracing_config: dict, plotting_config: dict, grainstats_df: pandas.DataFrame = None) dict[source]#

Skeletonise and prune grains, adding results to statistics data frames and optionally plot results.

Parameters:
  • image (npt.ndarray) – Image containing the grains to pass to the tracing function.

  • grain_masks (dict) – Dictionary of grain masks, keys “above” or “below” with values of 2D Numpy boolean arrays indicating the pixels that have been masked as grains.

  • pixel_to_nm_scaling (float) – Scaling factor for converting pixel length scales to nanometers, i.e. the number of pixesl per nanometres (nm).

  • filename (str) – Name of the image.

  • basename (Path) – Path to directory containing the image.

  • core_out_path (Path) – Path to save the core disordered trace image to.

  • tracing_out_path (Path) – Path to save the optional, diagnostic disordered trace images to.

  • disordered_tracing_config (dict) – Dictionary configuration for obtaining a disordered trace representation of the grains.

  • plotting_config (dict) – Dictionary configuration for plotting images.

  • grainstats_df (pd.DataFrame | None) – The grain statistics dataframe to be added to. This optional argument defaults to None in which case an empty grainstats dataframe is created.

Returns:

Dictionary of “grain_<index>” keys and Nx2 coordinate arrays of the disordered grain trace.

Return type:

dict

topostats.processing.run_nodestats(image: numpy.typing.NDArray, disordered_tracing_data: dict, pixel_to_nm_scaling: float, filename: str, core_out_path: pathlib.Path, tracing_out_path: pathlib.Path, nodestats_config: dict, plotting_config: dict, grainstats_df: pandas.DataFrame = None) tuple[dict, pandas.DataFrame][source]#

Analyse crossing points in grains adding results to statistics data frames and optionally plot results.

Parameters:
  • image (npt.ndarray) – Image containing the DNA to pass to the tracing function.

  • disordered_tracing_data (dict) – Dictionary of skeletonised and pruned grain masks. Result from “run_disordered_tracing”.

  • pixel_to_nm_scaling (float) – Scaling factor for converting pixel length scales to nanometers, i.e. the number of pixels per nanometres (nm).

  • filename (str) – Name of the image.

  • core_out_path (Path) – Path to save the core NodeStats image to.

  • tracing_out_path (Path) – Path to save optional, diagnostic NodeStats images to.

  • nodestats_config (dict) – Dictionary configuration for analysing the crossing points.

  • plotting_config (dict) – Dictionary configuration for plotting images.

  • grainstats_df (pd.DataFrame | None) – The grain statistics dataframe to bee added to. This optional argument defaults to None in which case an empty grainstats dataframe is created.

Returns:

A NodeStats analysis dictionary and grainstats metrics dataframe.

Return type:

tuple[dict, pd.DataFrame]

topostats.processing.run_ordered_tracing(image: numpy.typing.NDArray, disordered_tracing_data: dict, nodestats_data: dict, filename: str, basename: pathlib.Path, core_out_path: pathlib.Path, tracing_out_path: pathlib.Path, ordered_tracing_config: dict, plotting_config: dict, grainstats_df: pandas.DataFrame = None) tuple[source]#

Order coordinates of traces, adding results to statistics data frames and optionally plot results.

Parameters:
  • image (npt.ndarray) – Image containing the DNA to pass to the tracing function.

  • disordered_tracing_data (dict) – Dictionary of skeletonised and pruned grain masks. Result from “run_disordered_tracing”.

  • nodestats_data (dict) – Dictionary of images and statistics from the NodeStats analysis. Result from “run_nodestats”.

  • filename (str) – Name of the image.

  • basename (Path) – The path of the files’ parent directory.

  • core_out_path (Path) – Path to save the core ordered tracing image to.

  • tracing_out_path (Path) – Path to save optional, diagnostic ordered trace images to.

  • ordered_tracing_config (dict) – Dictionary configuration for obtaining an ordered trace representation of the skeletons.

  • plotting_config (dict) – Dictionary configuration for plotting images.

  • grainstats_df (pd.DataFrame | None) – The grain statistics dataframe to be added to. This optional argument defaults to None in which case an empty grainstats dataframe is created.

Returns:

A NodeStats analysis dictionary and grainstats metrics dataframe.

Return type:

tuple[dict, pd.DataFrame]

topostats.processing.run_splining(image: numpy.typing.NDArray, ordered_tracing_data: dict, pixel_to_nm_scaling: float, filename: str, core_out_path: pathlib.Path, splining_config: dict, plotting_config: dict, grainstats_df: pandas.DataFrame = None, molstats_df: pandas.DataFrame = None) tuple[source]#

Smooth the ordered trace coordinates, adding results to statistics data frames and optionally plot results.

Parameters:
  • image (npt.NDArray) – Image containing the DNA to pass to the tracing function.

  • ordered_tracing_data (dict) – Dictionary of ordered coordinates. Result from “run_ordered_tracing”.

  • pixel_to_nm_scaling (float) – Scaling factor for converting pixel length scales to nanometers, i.e. the number of pixels per nanometres (nm).

  • filename (str) – Name of the image.

  • core_out_path (Path) – Path to save the core ordered tracing image to.

  • splining_config (dict) – Dictionary configuration for obtaining an ordered trace representation of the skeletons.

  • plotting_config (dict) – Dictionary configuration for plotting images.

  • grainstats_df (pd.DataFrame | None) – The grain statistics dataframe to be added to. This optional argument defaults to None in which case an empty grainstats dataframe is created.

  • molstats_df (pd.DataFrame | None) – The molecule statistics dataframe to be added to. This optional argument defaults to None in which case an empty grainstats dataframe is created.

Returns:

A smooth curve analysis dictionary and grainstats metrics dataframe.

Return type:

tuple[dict, pd.DataFrame]

topostats.processing.run_curvature_stats(image: numpy.ndarray, cropped_image_data: dict, grain_trace_data: dict, pixel_to_nm_scaling: float, filename: str, core_out_path: pathlib.Path, tracing_out_path: pathlib.Path, curvature_config: dict, plotting_config: dict) dict | None[source]#

Calculate curvature statistics for the traced DNA molecules.

Currently only works on simple traces, not branched traces.

Parameters:
  • image (np.ndarray) – AFM image, for plotting purposes.

  • cropped_image_data (dict) – Dictionary containing cropped images.

  • grain_trace_data (dict) – Dictionary of grain trace data.

  • pixel_to_nm_scaling (float) – Scaling factor for converting pixel length scales to nanometres. ie the number of pixels per nanometre.

  • filename (str) – Name of the image.

  • core_out_path (Path) – Path to save the core curvature image to.

  • tracing_out_path (Path) – Path to save the optional, diagnostic curvature images to.

  • curvature_config (dict) – Dictionary of configuration for running the curvature stats.

  • plotting_config (dict) – Dictionary of configuration for plotting images.

Returns:

Dictionary containing curvature statistics.

Return type:

dict

topostats.processing.get_out_paths(image_path: pathlib.Path, base_dir: pathlib.Path, output_dir: pathlib.Path, filename: str, plotting_config: dict)[source]#

Determine components of output paths for a given image and plotting config.

Parameters:
  • image_path (Path) – Path of the image being processed.

  • base_dir (Path) – Path of the data folder.

  • output_dir (Path) – Base output directory for output data.

  • filename (str) – Name of the image being processed.

  • plotting_config (dict) – Dictionary of configuration for plotting images.

Returns:

Core output path for general file outputs, filter output path for flattening related files and grain output path for grain finding related files.

Return type:

tuple

topostats.processing.process_scan(topostats_object: dict, base_dir: str | pathlib.Path, filter_config: dict, grains_config: dict, grainstats_config: dict, disordered_tracing_config: dict, nodestats_config: dict, ordered_tracing_config: dict, splining_config: dict, curvature_config: dict, plotting_config: dict, output_dir: str | pathlib.Path = 'output') tuple[dict, pandas.DataFrame, dict][source]#

Process a single image, filtering, finding grains and calculating their statistics.

Parameters:
  • topostats_object (dict[str, Union[npt.NDArray, Path, float]]) – A dictionary with keys ‘image’, ‘img_path’ and ‘pixel_to_nm_scaling’ containing a file or frames’ image, it’s path and it’s pixel to namometre scaling value.

  • base_dir (str | Path) – Directory to recursively search for files, if not specified the current directory is scanned.

  • filter_config (dict) – Dictionary of configuration options for running the Filter stage.

  • grains_config (dict) – Dictionary of configuration options for running the Grain detection stage.

  • grainstats_config (dict) – Dictionary of configuration options for running the Grain Statistics stage.

  • disordered_tracing_config (dict) – Dictionary configuration for obtaining a disordered trace representation of the grains.

  • nodestats_config (dict) – Dictionary of configuration options for running the NodeStats stage.

  • ordered_tracing_config (dict) – Dictionary configuration for obtaining an ordered trace representation of the skeletons.

  • splining_config (dict) – Dictionary of configuration options for running the splining stage.

  • curvature_config (dict) – Dictionary of configuration options for running the curvature stats stage.

  • plotting_config (dict) – Dictionary of configuration options for plotting figures.

  • output_dir (str | Path) – Directory to save output to, it will be created if it does not exist. If it already exists then it is possible that output will be over-written.

Returns:

TopoStats dictionary object, DataFrame containing grain statistics and dna tracing statistics, and dictionary containing general image statistics.

Return type:

tuple[dict, pd.DataFrame, dict]

topostats.processing.check_run_steps(filter_run: bool, grains_run: bool, grainstats_run: bool, disordered_tracing_run: bool, nodestats_run: bool, ordered_tracing_run: bool, splining_run: bool) None[source]#

Check options for running steps (Filter, Grain, Grainstats and DNA tracing) are logically consistent.

This checks that earlier steps required are enabled.

Parameters:
  • filter_run (bool) – Flag for running Filtering.

  • grains_run (bool) – Flag for running Grains.

  • grainstats_run (bool) – Flag for running GrainStats.

  • disordered_tracing_run (bool) – Flag for running Disordered Tracing.

  • nodestats_run (bool) – Flag for running NodeStats.

  • ordered_tracing_run (bool) – Flag for running Ordered Tracing.

  • splining_run (bool) – Flag for running DNA Tracing.

topostats.processing.completion_message(config: dict, img_files: list, summary_config: dict, images_processed: int) None[source]#

Print a completion message summarising images processed.

Parameters:
  • config (dict) – Configuration dictionary.

  • img_files (list) – List of found image paths.

  • summary_config (dict) – Configuration for plotting summary statistics.

  • images_processed (int) – Pandas DataFrame of results.