topostats.run_topostats#

Run TopoStats.

This provides an entry point for running TopoStats as a command line programme.

Attributes#

Classes#

LoadScans

Load the image and image parameters from a file path.

Functions#

find_files(→ list)

Recursively scan the specified directory for images with the given file extension.

read_yaml(→ dict)

Read a YAML file.

save_folder_grainstats(→ None)

Save a data frame of grain and tracing statistics at the folder level.

write_yaml(→ None)

Write a configuration (stored as a dictionary) to a YAML file.

toposum(→ dict)

Process plotting and summarisation of data.

check_run_steps(→ None)

Check options for running steps (Filter, Grain, Grainstats and DNA tracing) are logically consistent.

completion_message(→ None)

Print a completion message summarising images processed.

process_scan(→ tuple[dict, pandas.DataFrame, dict])

Process a single image, filtering, finding grains and calculating their statistics.

update_config(→ dict)

Update the configuration with any arguments.

update_plotting_config(→ dict)

Update the plotting config for each of the plots in plot_dict.

validate_config(→ None)

Validate configuration.

run_topostats(→ None)

Find and process all files.

Module Contents#

class topostats.run_topostats.LoadScans(img_paths: list[str | pathlib.Path], channel: str)[source]#

Load the image and image parameters from a file path.

Parameters:
  • img_paths (list[str, Path]) – Path to a valid AFM scan to load.

  • channel (str) – Image channel to extract from the scan.

load_spm() tuple[numpy.typing.NDArray, float][source]#

Extract image and pixel to nm scaling from the Bruker .spm file.

Returns:

A tuple containing the image and its pixel to nanometre scaling value.

Return type:

tuple[npt.NDArray, float]

_spm_pixel_to_nm_scaling(channel_data: pySPM.SPM.SPM_image) float[source]#

Extract pixel to nm scaling from the SPM image metadata.

Parameters:

channel_data (pySPM.SPM.SPM_image) – Channel data from PySPM.

Returns:

Pixel to nm scaling factor.

Return type:

float

load_topostats() tuple[numpy.typing.NDArray, float][source]#

Load a .topostats file (hdf5 format).

Loads and extracts the image, pixel to nanometre scaling factor and any grain masks.

Note that grain masks are stored via self.grain_masks rather than returned due to how we extract information for all other file loading functions.

Returns:

A tuple containing the image and its pixel to nanometre scaling value.

Return type:

tuple[npt.NDArray, float]

load_asd() tuple[numpy.typing.NDArray, float][source]#

Extract image and pixel to nm scaling from .asd files.

Returns:

A tuple containing the image and its pixel to nanometre scaling value.

Return type:

tuple[npt.NDArray, float]

load_ibw() tuple[numpy.typing.NDArray, float][source]#

Load image from Asylum Research (Igor) .ibw files.

Returns:

A tuple containing the image and its pixel to nanometre scaling value.

Return type:

tuple[npt.NDArray, float]

_ibw_pixel_to_nm_scaling(scan: dict) float[source]#

Extract pixel to nm scaling from the IBW image metadata.

Parameters:

scan (dict) – The loaded binary wave object.

Returns:

A value corresponding to the real length of a single pixel.

Return type:

float

load_jpk() tuple[numpy.typing.NDArray, float][source]#

Load image from JPK Instruments .jpk files.

Returns:

A tuple containing the image and its pixel to nanometre scaling value.

Return type:

tuple[npt.NDArray, float]

static _jpk_pixel_to_nm_scaling(tiff_page: tifffile.tifffile.TiffPage) float[source]#

Extract pixel to nm scaling from the JPK image metadata.

Parameters:

tiff_page (tifffile.tifffile.TiffPage) – An image file directory (IFD) of .jpk files.

Returns:

A value corresponding to the real length of a single pixel.

Return type:

float

static _gwy_read_object(open_file: io.TextIOWrapper, data_dict: dict) None[source]#

Parse and extract data from a .gwy file object, starting at the current open file read position.

Parameters:
  • open_file (io.TextIOWrapper) – An open file object.

  • data_dict (dict) – Dictionary of .gwy file image properties.

static _gwy_read_component(open_file: io.TextIOWrapper, initial_byte_pos: int, data_dict: dict) int[source]#

Parse and extract data from a .gwy file object, starting at the current open file read position.

Parameters:
  • open_file (io.TextIOWrapper) – An open file object.

  • initial_byte_pos (int) – Initial position, as byte.

  • data_dict (dict) – Dictionary of .gwy file image properties.

Returns:

Size of the component in bytes.

Return type:

int

static _gwy_print_dict(gwy_file_dict: dict, pre_string: str) None[source]#

Recursively print nested dictionary.

Can be used to find labels and values of objects / components in the .gwy file.

Parameters:
  • gwy_file_dict (dict) – Dictionary of the nested object / component structure of a .gwy file.

  • pre_string (str) – Prefix to use when printing string.

static _gwy_print_dict_wrapper(gwy_file_dict: dict) None[source]#

Print dictionaries.

This is a wrapper for the _gwy_print_dict() method.

Parameters:

gwy_file_dict (dict) – Dictionary of the nested object / component structure of a .gwy file.

static _gwy_get_channels(gwy_file_structure: dict) dict[source]#

Extract a list of channels and their corresponding dictionary key ids from the .gwy file dictionary.

Parameters:

gwy_file_structure (dict) – Dictionary of the nested object / component structure of a .gwy file. Where the keys are object names and the values are dictionaries of the object’s components.

Returns:

Dictionary where the keys are the channel names and the values are the dictionary key ids.

Return type:

dict

Examples

# Using a loaded dictionary generated from a .gwy file: LoadScans._gwy_get_channels(gwy_file_structure=loaded_gwy_file_dictionary)

load_gwy() tuple[numpy.typing.NDArray, float][source]#

Extract image and pixel to nm scaling from the Gwyddion .gwy file.

Returns:

A tuple containing the image and its pixel to nanometre scaling value.

Return type:

tuple[npt.NDArray, float]

get_data() None[source]#

Extract image, filepath and pixel to nm scaling value, and append these to the img_dic object.

_check_image_size_and_add_to_dict(image: numpy.typing.NDArray, filename: str) None[source]#

Check the image is above a minimum size in both dimensions.

Images that do not meet the minimum size are not included for processing.

Parameters:
  • image (npt.NDArray) – An array of the extracted AFM image.

  • filename (str) – The name of the file.

add_to_dict(image: numpy.typing.NDArray, filename: str) None[source]#

Add an image and metadata to the img_dict dictionary under the key filename.

Adds the image and associated metadata such as any grain masks, and pixel to nanometere scaling factor to the img_dict dictionary which is used as a place to store the image information for processing.

Parameters:
  • image (npt.NDArray) – An array of the extracted AFM image.

  • filename (str) – The name of the file.

topostats.run_topostats.find_files(base_dir: str | pathlib.Path = None, file_ext: str = '.spm') list[source]#

Recursively scan the specified directory for images with the given file extension.

Parameters:
  • base_dir (Union[str, Path]) – Directory to recursively search for files, if not specified the current directory is scanned.

  • file_ext (str) – File extension to search for.

Returns:

List of files found with the extension in the given directory.

Return type:

List

topostats.run_topostats.read_yaml(filename: str | pathlib.Path) dict[source]#

Read a YAML file.

Parameters:

filename (Union[str, Path]) – YAML file to read.

Returns:

Dictionary of the file.

Return type:

Dict

topostats.run_topostats.save_folder_grainstats(output_dir: str | pathlib.Path, base_dir: str | pathlib.Path, all_stats_df: pandas.DataFrame) None[source]#

Save a data frame of grain and tracing statistics at the folder level.

Parameters:
  • output_dir (Union[str, Path]) – Path of the output directory head.

  • base_dir (Union[str, Path]) – Path of the base directory where files were found.

  • all_stats_df (pd.DataFrame) – The dataframe containing all sample statistics run.

Returns:

This only saves the dataframes and does not retain them.

Return type:

None

topostats.run_topostats.write_yaml(config: dict, output_dir: str | pathlib.Path, config_file: str = 'config.yaml', header_message: str = None) None[source]#

Write a configuration (stored as a dictionary) to a YAML file.

Parameters:
  • config (dict) – Configuration dictionary.

  • output_dir (Union[str, Path]) – Path to save the dictionary to as a YAML file (it will be called ‘config.yaml’).

  • config_file (str) – Filename to write to.

  • header_message (str) – String to write to the header message of the YAML file.

topostats.run_topostats.LOGGER_NAME = 'topostats'#
topostats.run_topostats.toposum(config: dict) dict[source]#

Process plotting and summarisation of data.

Parameters:

config (dict) – Dictionary of summarisation options.

Returns:

Dictionary of nested dictionaries. Each variable has its own dictionary with keys ‘dist’ and ‘violin’ which

contain distribution like plots and violin plots respectively (if the later are required). Each ‘dist’ and

’violin’ is itself a dictionary with two elements ‘figures’ and ‘axes’ which correspond to MatplotLib ‘fig’ and ‘ax’ for that plot.

Return type:

dict

topostats.run_topostats.check_run_steps(filter_run: bool, grains_run: bool, grainstats_run: bool, dnatracing_run: bool) None[source]#

Check options for running steps (Filter, Grain, Grainstats and DNA tracing) are logically consistent.

This checks that earlier steps required are enabled.

Parameters:
  • filter_run (bool) – Flag for running Filtering.

  • grains_run (bool) – Flag for running Grains.

  • grainstats_run (bool) – Flag for running GrainStats.

  • dnatracing_run (bool) – Flag for running DNA Tracing.

topostats.run_topostats.completion_message(config: dict, img_files: list, summary_config: dict, images_processed: int) None[source]#

Print a completion message summarising images processed.

Parameters:
  • config (dict) – Configuration dictionary.

  • img_files (list()) – List of found image paths.

  • summary_config (dict() – Configuration for plotting summary statistics.

  • images_processed (int) – Pandas DataFrame of results.

topostats.run_topostats.process_scan(topostats_object: dict, base_dir: str | pathlib.Path, filter_config: dict, grains_config: dict, grainstats_config: dict, dnatracing_config: dict, plotting_config: dict, output_dir: str | pathlib.Path = 'output') tuple[dict, pandas.DataFrame, dict][source]#

Process a single image, filtering, finding grains and calculating their statistics.

Parameters:
  • topostats_object (dict[str, Union[np.ndarray, Path, float]]) – A dictionary with keys ‘image’, ‘img_path’ and ‘px_2_nm’ containing a file or frames’ image, it’s path and it’s pixel to namometre scaling value.

  • base_dir (Union[str, Path]) – Directory to recursively search for files, if not specified the current directory is scanned.

  • filter_config (dict) – Dictionary of configuration options for running the Filter stage.

  • grains_config (dict) – Dictionary of configuration options for running the Grain detection stage.

  • grainstats_config (dict) – Dictionary of configuration options for running the Grain Statistics stage.

  • dnatracing_config (dict) – Dictionary of configuration options for running the DNA Tracing stage.

  • plotting_config (dict) – Dictionary of configuration options for plotting figures.

  • output_dir (Union[str, Path]) – Directory to save output to, it will be created if it does not exist. If it already exists then it is possible that output will be over-written.

Returns:

TopoStats dictionary object, DataFrame containing grain statistics and dna tracing statistics, and dictionary containing general image statistics.

Return type:

tuple[dict, pd.DataFrame, dict]

topostats.run_topostats.update_config(config: dict, args: dict | argparse.Namespace) dict[source]#

Update the configuration with any arguments.

Parameters:
  • config (dict) – Dictionary of configuration (typically read from YAML file specified with ‘-c/–config <filename>’).

  • args (Namespace) – Command line arguments.

Returns:

Dictionary updated with command arguments.

Return type:

dict

topostats.run_topostats.update_plotting_config(plotting_config: dict) dict[source]#

Update the plotting config for each of the plots in plot_dict.

Ensures that each entry has all the plotting configuration values that are needed.

Parameters:

plotting_config (dict) – Plotting configuration to be updated.

Returns:

Updated plotting configuration.

Return type:

dict

topostats.run_topostats.DEFAULT_CONFIG_SCHEMA#
topostats.run_topostats.PLOTTING_SCHEMA#
topostats.run_topostats.SUMMARY_SCHEMA#
topostats.run_topostats.validate_config(config: dict, schema: schema.Schema, config_type: str) None[source]#

Validate configuration.

Parameters:
  • config (dict) – Config dictionary imported by read_yaml() and parsed through clean_config().

  • schema (Schema) – A schema against which the configuration is to be compared.

  • config_type (str) – Description of of configuration being validated.

topostats.run_topostats.LOGGER#
topostats.run_topostats.run_topostats(args: None = None) None[source]#

Find and process all files.

Parameters:

args (None) – Arguments.