topostats.io#

Functions for reading and writing data.

Attributes#

Classes#

LoadScans

Load the image and image parameters from a file path.

Functions#

merge_mappings(→ MutableMappingType)

Merge two mappings (dictionaries), with priority given to the second mapping.

dict_almost_equal(dict1, dict2[, abs_tol])

Recursively check if two dictionaries are almost equal with a given absolute tolerance.

read_yaml(→ dict)

Read a YAML file.

get_date_time(→ str)

Get a date and time for adding to generated files or logging.

write_yaml(→ None)

Write a configuration (stored as a dictionary) to a YAML file.

write_config_with_comments(→ None)

Write a sample configuration with in-line comments.

save_array(→ None)

Save a Numpy array to disk.

load_array(→ numpy.typing.NDArray)

Load a Numpy array from file.

path_to_str(→ dict)

Recursively traverse a dictionary and convert any Path() objects to strings for writing to YAML.

get_out_path(→ pathlib.Path)

Add the image path relative to the base directory to the output directory.

find_files(→ list)

Recursively scan the specified directory for images with the given file extension.

save_folder_grainstats(→ None)

Save a data frame of grain and tracing statistics at the folder level.

read_null_terminated_string(→ str)

Read an open file from the current position in the open binary file, until the next null value.

read_u32i(→ str)

Read an unsigned 32 bit integer from an open binary file (in little-endian form).

read_64d(→ str)

Read a 64-bit double from an open binary file.

read_char(→ str)

Read a character from an open binary file.

read_gwy_component_dtype(→ str)

Read the data type of a .gwy file component.

get_relative_paths(→ list[str])

Extract a list of relative paths, removing the common suffix.

convert_basename_to_relative_paths(df)

Convert paths in the 'basename' column of a dataframe to relative paths.

dict_to_hdf5(→ None)

Recursively save a dictionary to an open hdf5 file.

hdf5_to_dict(→ dict)

Read a dictionary from an open hdf5 file.

save_topostats_file(→ None)

Save a topostats dictionary object to a .topostats (hdf5 format) file.

save_pkl(→ None)

Pickle objects for working with later.

load_pkl(→ Any)

Load data from a pickle.

dict_to_json(→ None)

Write a dictionary to a JSON file at the specified location with the given name.

Module Contents#

topostats.io.LOGGER#
topostats.io.CONFIG_DOCUMENTATION_REFERENCE = Multiline-String#
Show Value
"""# For more information on configuration and how to use it:
# https://afm-spm.github.io/TopoStats/main/configuration.html
"""
topostats.io.MutableMappingType#
topostats.io.merge_mappings(map1: MutableMappingType, map2: MutableMappingType) MutableMappingType[source]#

Merge two mappings (dictionaries), with priority given to the second mapping.

Note: Using a Mapping should make this robust to any mapping type, not just dictionaries. MutableMapping was needed as Mapping is not a mutable type, and this function needs to be able to change the dictionaries.

Parameters:
  • map1 (MutableMapping) – First mapping to merge, with secondary priority.

  • map2 (MutableMapping) – Second mapping to merge, with primary priority.

Returns:

Merged dictionary.

Return type:

dict

topostats.io.dict_almost_equal(dict1: dict, dict2: dict, abs_tol: float = 1e-09)[source]#

Recursively check if two dictionaries are almost equal with a given absolute tolerance.

Parameters:
  • dict1 (dict) – First dictionary to compare.

  • dict2 (dict) – Second dictionary to compare.

  • abs_tol (float) – Absolute tolerance to check for equality.

Returns:

True if the dictionaries are almost equal, False otherwise.

Return type:

bool

topostats.io.read_yaml(filename: str | pathlib.Path) dict[source]#

Read a YAML file.

Parameters:

filename (Union[str, Path]) – YAML file to read.

Returns:

Dictionary of the file.

Return type:

Dict

topostats.io.get_date_time() str[source]#

Get a date and time for adding to generated files or logging.

Returns:

A string of the current date and time, formatted appropriately.

Return type:

str

topostats.io.write_yaml(config: dict, output_dir: str | pathlib.Path, config_file: str = 'config.yaml', header_message: str = None) None[source]#

Write a configuration (stored as a dictionary) to a YAML file.

Parameters:
  • config (dict) – Configuration dictionary.

  • output_dir (Union[str, Path]) – Path to save the dictionary to as a YAML file (it will be called ‘config.yaml’).

  • config_file (str) – Filename to write to.

  • header_message (str) – String to write to the header message of the YAML file.

topostats.io.write_config_with_comments(args=None) None[source]#

Write a sample configuration with in-line comments.

This function is not designed to be used interactively but can be, just call it without any arguments and it will write a configuration to ‘./config.yaml’.

Parameters:

args (Namespace) – A Namespace object parsed from argparse with values for ‘filename’.

topostats.io.save_array(array: numpy.typing.NDArray, outpath: pathlib.Path, filename: str, array_type: str) None[source]#

Save a Numpy array to disk.

Parameters:
  • array (npt.NDArray) – Numpy array to be saved.

  • outpath (Path) – Location array should be saved.

  • filename (str) – Filename of the current image from which the array is derived.

  • array_type (str) – Short string describing the array type e.g. z_threshold. Ideally should not have periods or spaces in (use underscores ‘_’ instead).

topostats.io.load_array(array_path: str | pathlib.Path) numpy.typing.NDArray[source]#

Load a Numpy array from file.

Should have been saved using save_array() or numpy.save().

Parameters:

array_path (Union[str, Path]) – Path to the Numpy array on disk.

Returns:

Returns the loaded Numpy array.

Return type:

npt.NDArray

topostats.io.path_to_str(config: dict) dict[source]#

Recursively traverse a dictionary and convert any Path() objects to strings for writing to YAML.

Parameters:

config (dict) – Dictionary to be converted.

Returns:

The same dictionary with any Path() objects converted to string.

Return type:

Dict

topostats.io.get_out_path(image_path: str | pathlib.Path = None, base_dir: str | pathlib.Path = None, output_dir: str | pathlib.Path = None) pathlib.Path[source]#

Add the image path relative to the base directory to the output directory.

Parameters:
  • image_path (Path) – The path of the current image.

  • base_dir (Path) – Directory to recursively search for files.

  • output_dir (Path) – The output directory specified in the configuration file.

Returns:

The output path that mirrors the input path structure.

Return type:

Path

topostats.io.find_files(base_dir: str | pathlib.Path = None, file_ext: str = '.spm') list[source]#

Recursively scan the specified directory for images with the given file extension.

Parameters:
  • base_dir (Union[str, Path]) – Directory to recursively search for files, if not specified the current directory is scanned.

  • file_ext (str) – File extension to search for.

Returns:

List of files found with the extension in the given directory.

Return type:

List

topostats.io.save_folder_grainstats(output_dir: str | pathlib.Path, base_dir: str | pathlib.Path, all_stats_df: pandas.DataFrame, stats_filename: str) None[source]#

Save a data frame of grain and tracing statistics at the folder level.

Parameters:
  • output_dir (Union[str, Path]) – Path of the output directory head.

  • base_dir (Union[str, Path]) – Path of the base directory where files were found.

  • all_stats_df (pd.DataFrame) – The dataframe containing all sample statistics run.

  • stats_filename (str) – The name of the type of statistics dataframe to be saved.

Returns:

This only saves the dataframes and does not retain them.

Return type:

None

topostats.io.read_null_terminated_string(open_file: io.TextIOWrapper, encoding: str = 'utf-8') str[source]#

Read an open file from the current position in the open binary file, until the next null value.

Parameters:
  • open_file (io.TextIOWrapper) – An open file object.

  • encoding (str) – Encoding to use when decoding the bytes.

Returns:

String of the ASCII decoded bytes before the next null byte.

Return type:

str

Examples

>>> with open("test.txt", "rb") as f:
...     print(read_null_terminated_string(f), encoding="utf-8")
topostats.io.read_u32i(open_file: io.TextIOWrapper) str[source]#

Read an unsigned 32 bit integer from an open binary file (in little-endian form).

Parameters:

open_file (io.TextIOWrapper) – An open file object.

Returns:

Python integer type cast from the unsigned 32 bit integer.

Return type:

int

topostats.io.read_64d(open_file: io.TextIOWrapper) str[source]#

Read a 64-bit double from an open binary file.

Parameters:

open_file (io.TextIOWrapper) – An open file object.

Returns:

Python float type cast from the double.

Return type:

float

topostats.io.read_char(open_file: io.TextIOWrapper) str[source]#

Read a character from an open binary file.

Parameters:

open_file (io.TextIOWrapper) – An open file object.

Returns:

A string type cast from the decoded character.

Return type:

str

topostats.io.read_gwy_component_dtype(open_file: io.TextIOWrapper) str[source]#

Read the data type of a .gwy file component.

Possible data types are as follows:

  • ‘b’: boolean

  • ‘c’: character

  • ‘i’: 32-bit integer

  • ‘q’: 64-bit integer

  • ‘d’: double

  • ‘s’: string

  • ‘o’: .gwy format object

Capitalised versions of some of these data types represent arrays of values of that data type. Arrays are stored as an unsigned 32 bit integer, describing the size of the array, followed by the unseparated array values:

  • ‘C’: array of characters

  • ‘I’: array of 32-bit integers

  • ‘Q’: array of 64-bit integers

  • ‘D’: array of doubles

  • ‘S’: array of strings

  • ‘O’: array of objects.

Parameters:

open_file (io.TextIOWrapper) – An open file object.

Returns:

Python string (one character long) of the data type of the component’s value.

Return type:

str

topostats.io.get_relative_paths(paths: list[pathlib.Path]) list[str][source]#

Extract a list of relative paths, removing the common suffix.

From a list of paths, create a list where each path is relative to all path’s closest common parent. For example, [‘a/b/c’, ‘a/b/d’, ‘a/b/e/f’] would return [‘c’, ‘d’, ‘e/f’].

Parameters:

paths (list) – List of string or pathlib paths.

Returns:

List of string paths, relative to the common parent.

Return type:

list

topostats.io.convert_basename_to_relative_paths(df: pandas.DataFrame)[source]#

Convert paths in the ‘basename’ column of a dataframe to relative paths.

If the ‘basename’ column has the following paths: [‘/usr/topo/data/a/b’, ‘/usr/topo/data/c/d’], the output will be: [‘a/b’, ‘c/d’].

Parameters:

df (pd.DataFrame) – A pandas dataframe containing a column ‘basename’ which contains the paths indicating the locations of the image data files.

Returns:

A pandas dataframe where the ‘basename’ column has paths relative to a common parent.

Return type:

pd.DataFrame

class topostats.io.LoadScans(img_paths: list[str | pathlib.Path], channel: str)[source]#

Load the image and image parameters from a file path.

Parameters:
  • img_paths (list[str, Path]) – Path to a valid AFM scan to load.

  • channel (str) – Image channel to extract from the scan.

img_paths#
img_path = None#
channel#
channel_data = None#
filename = None#
image = None#
pixel_to_nm_scaling = None#
grain_masks#
grain_trace_data#
img_dict#
MINIMUM_IMAGE_SIZE = 10#
load_spm() tuple[numpy.typing.NDArray, float][source]#

Extract image and pixel to nm scaling from the Bruker .spm file.

Returns:

A tuple containing the image and its pixel to nanometre scaling value.

Return type:

tuple[npt.NDArray, float]

load_topostats() tuple[numpy.typing.NDArray, float][source]#

Load a .topostats file (hdf5 format).

Loads and extracts the image, pixel to nanometre scaling factor and any grain masks.

Note that grain masks are stored via self.grain_masks rather than returned due to how we extract information for all other file loading functions.

Returns:

A tuple containing the image and its pixel to nanometre scaling value.

Return type:

tuple[npt.NDArray, float]

load_asd() tuple[numpy.typing.NDArray, float][source]#

Extract image and pixel to nm scaling from .asd files.

Returns:

A tuple containing the image and its pixel to nanometre scaling value.

Return type:

tuple[npt.NDArray, float]

load_ibw() tuple[numpy.typing.NDArray, float][source]#

Load image from Asylum Research (Igor) .ibw files.

Returns:

A tuple containing the image and its pixel to nanometre scaling value.

Return type:

tuple[npt.NDArray, float]

load_jpk() tuple[numpy.typing.NDArray, float][source]#

Load image from JPK Instruments .jpk files.

Returns:

A tuple containing the image and its pixel to nanometre scaling value.

Return type:

tuple[npt.NDArray, float]

load_gwy() tuple[numpy.typing.NDArray, float][source]#

Extract image and pixel to nm scaling from the Gwyddion .gwy file.

Returns:

A tuple containing the image and its pixel to nanometre scaling value.

Return type:

tuple[npt.NDArray, float]

get_data() None[source]#

Extract image, filepath and pixel to nm scaling value, and append these to the img_dic object.

_check_image_size_and_add_to_dict(image: numpy.typing.NDArray, filename: str) None[source]#

Check the image is above a minimum size in both dimensions.

Images that do not meet the minimum size are not included for processing.

Parameters:
  • image (npt.NDArray) – An array of the extracted AFM image.

  • filename (str) – The name of the file.

add_to_dict(image: numpy.typing.NDArray, filename: str) None[source]#

Add an image and metadata to the img_dict dictionary under the key filename.

Adds the image and associated metadata such as any grain masks, and pixel to nanometere scaling factor to the img_dict dictionary which is used as a place to store the image information for processing.

Parameters:
  • image (npt.NDArray) – An array of the extracted AFM image.

  • filename (str) – The name of the file.

topostats.io.dict_to_hdf5(open_hdf5_file: h5py.File, group_path: str, dictionary: dict) None[source]#

Recursively save a dictionary to an open hdf5 file.

Parameters:
  • open_hdf5_file (h5py.File) – An open hdf5 file object.

  • group_path (str) – The path to the group in the hdf5 file to start saving data from.

  • dictionary (dict) – A dictionary of the data to save.

topostats.io.hdf5_to_dict(open_hdf5_file: h5py.File, group_path: str) dict[source]#

Read a dictionary from an open hdf5 file.

Parameters:
  • open_hdf5_file (h5py.File) – An open hdf5 file object.

  • group_path (str) – The path to the group in the hdf5 file to start reading data from.

Returns:

A dictionary of the hdf5 file data.

Return type:

dict

topostats.io.save_topostats_file(output_dir: pathlib.Path, filename: str, topostats_object: dict) None[source]#

Save a topostats dictionary object to a .topostats (hdf5 format) file.

Parameters:
  • output_dir (Path) – Directory to save the .topostats file in.

  • filename (str) – File name of the .topostats file.

  • topostats_object (dict) – Dictionary of the topostats data to save. Must include a flattened image and pixel to nanometre scaling factor. May also include grain masks.

topostats.io.save_pkl(outfile: pathlib.Path, to_pkl: dict) None[source]#

Pickle objects for working with later.

Parameters:
  • outfile (Path) – Path and filename to save pickle to.

  • to_pkl (dict) – Object to be picled.

topostats.io.load_pkl(infile: pathlib.Path) Any[source]#

Load data from a pickle.

Parameters:

infile (Path) – Path to a valid pickle.

Returns:

Dictionary of generated images.

Return type:

dict

Examples

from pathlib import Path

from topostats.io import load_plots

pkl_path = “output/distribution_plots.pkl” my_plots = load_pkl(pkl_path) # Show the type of my_plots which is a dictionary of nested dictionaries type(my_plots) # Show the keys are various levels of nesting. my_plots.keys() my_plots[“area”].keys() my_plots[“area”][“dist”].keys() # Get the figure and axis object for a given metrics distribution plot figure, axis = my_plots[“area”][“dist”].values() # Get the figure and axis object for a given metrics violin plot figure, axis = my_plots[“area”][“violin”].values()

topostats.io.dict_to_json(data: dict, output_dir: str | pathlib.Path, filename: str | pathlib.Path, indent: int = 4) None[source]#

Write a dictionary to a JSON file at the specified location with the given name.

NBThe NumpyEncoder class is used as the default encoder to ensure Numpy dtypes are written as strings (they are

not serialisable to JSON using the default JSONEncoder).

Parameters:
  • data (dict) – Data as a dictionary that is to be written to file.

  • output_dir (str | Path) – Directory the file is to be written to.

  • filename (str | Path) – Name of output file.

  • indent (int) – Spaces to indent JSON with, default is 4.