topostats.io#

Functions for reading and writing data.

Attributes#

Classes#

LoadScans

Load the image and image parameters from a file path.

Functions#

read_yaml(→ Dict)

Read a YAML file.

write_yaml(→ None)

Write a configuration (stored as a dictionary) to a YAML file.

save_array(→ None)

Save a Numpy array to disk.

load_array(→ numpy.ndarray)

Load a Numpy array from file.

path_to_str(→ Dict)

Recursively traverse a dictionary and convert any Path() objects to strings for writing to YAML.

get_out_path(→ pathlib.Path)

Adds the image path relative to the base directory to the output directory.

find_files(→ List)

Recursively scan the specified directory for images with the given file extension.

save_folder_grainstats(→ None)

Saves a data frame of grain and tracing statictics at the folder level.

read_null_terminated_string(→ str)

Read an open file from the current position in the open binary file,

read_u32i(→ str)

Read an unsigned 32 bit integer from an open binary file (in little-endian form).

read_64d(→ str)

Read a 64-bit double from an open binary file.

read_char(→ str)

Read a character from an open binary file.

read_gwy_component_dtype(→ str)

Read the data type of a .gwy file component.

save_pkl(→ None)

Pickle objects for working with later.

load_pkl(→ Any)

Load data from a pickle.

Module Contents#

topostats.io.LOGGER#
topostats.io.read_yaml(filename: str | pathlib.Path) Dict#

Read a YAML file.

Parameters:

filename (Union[str, Path]) – YAML file to read.

Returns:

Dictionary of the file.

Return type:

Dict

topostats.io.write_yaml(config: dict, output_dir: str | pathlib.Path, config_file: str = 'config.yaml', header_message: str = None) None#

Write a configuration (stored as a dictionary) to a YAML file.

Parameters:
  • config (dict) – Configuration dictionary.

  • output_dir (Union[str, Path]) – Path to save the dictionary to as a YAML file (it will be called ‘config.yaml’).

  • config_file (str) – Filename to write to.

  • header_message (str) – String to write to the header message of the YAML file

topostats.io.save_array(array: numpy.ndarray, outpath: pathlib.Path, filename: str, array_type: str) None#

Save a Numpy array to disk.

Parameters:
  • array (np.ndarray) – Numpy array to be saved.

  • outpath (Path) – Location array should be saved

  • filename (str) – Filename of the current image from which the array is derived.

  • array_type (str) – Short string describing the array type e.g. z_threshold. Ideally should not have periods or spaces in (use

  • instead). (underscores '_')

topostats.io.load_array(array_path: str | pathlib.Path) numpy.ndarray#

Load a Numpy array from file.

Should have been saved using save_array() or numpy.save().

Parameters:

array_path (Union[str, Path]) – Path to the Numpy array on disk.

Returns:

Returns the loaded Numpy array.

Return type:

np.ndarray

topostats.io.path_to_str(config: dict) Dict#

Recursively traverse a dictionary and convert any Path() objects to strings for writing to YAML.

Parameters:

config (dict) – Dictionary to be converted.

Returns:

The same dictionary with any Path() objects converted to string.

Return type:

Dict

topostats.io.get_out_path(image_path: str | pathlib.Path = None, base_dir: str | pathlib.Path = None, output_dir: str | pathlib.Path = None) pathlib.Path#

Adds the image path relative to the base directory to the output directory.

Parameters:
  • image_path (Path) – The path of the current image.

  • base_dir (Path) – Directory to recursively search for files.

  • output_dir (Path) – The output directory specified in the configuration file.

Returns:

The output path that mirrors the input path structure.

Return type:

Path

topostats.io.find_files(base_dir: str | pathlib.Path = None, file_ext: str = '.spm') List#

Recursively scan the specified directory for images with the given file extension.

Parameters:
  • base_dir (Union[str, Path]) – Directory to recursively search for files, if not specified the current directory is scanned.

  • file_ext (str) – File extension to search for.

Returns:

List of files found with the extension in the given directory.

Return type:

List

topostats.io.save_folder_grainstats(output_dir: str | pathlib.Path, base_dir: str | pathlib.Path, all_stats_df: pandas.DataFrame) None#

Saves a data frame of grain and tracing statictics at the folder level.

Parameters:
  • output_dir (Union[str, Path]) – Path of the output directory head.

  • base_dir (Union[str, Path]) – Path of the base directory where files were found.

  • all_stats_df (pd.DataFrame) – The dataframe containing all sample statistics run.

Returns:

This only saves the dataframes and does not retain them.

Return type:

None

topostats.io.read_null_terminated_string(open_file: io.TextIOWrapper) str#

Read an open file from the current position in the open binary file, until the next null value.

Parameters:

open_file (io.TextIOWrapper) – An open file object.

Returns:

String of the ASCII decoded bytes before the next null byte.

Return type:

str

topostats.io.read_u32i(open_file: io.TextIOWrapper) str#

Read an unsigned 32 bit integer from an open binary file (in little-endian form).

Parameters:

open_file (io.TextIOWrapper) – An open file object.

Returns:

Python integer type cast from the unsigned 32 bit integer.

Return type:

int

topostats.io.read_64d(open_file: io.TextIOWrapper) str#

Read a 64-bit double from an open binary file.

Parameters:

open_file – An open file object.

Returns:

Python float type cast from the double.

Return type:

float

topostats.io.read_char(open_file: io.TextIOWrapper) str#

Read a character from an open binary file.

Parameters:

open_file (io.TextIOWrapper) – An open file object.

Returns:

A string type cast from the decoded character.

Return type:

str

topostats.io.read_gwy_component_dtype(open_file: io.TextIOWrapper) str#

Read the data type of a .gwy file component. Possible data types are as follows: - ‘b’: boolean - ‘c’: character - ‘i’: 32-bit integer - ‘q’: 64-bit integer - ‘d’: double - ‘s’: string - ‘o’: .gwy format object Capitalised versions of some of these data types represent arrays of values of that data type. Arrays are stored as an unsigned 32 bit integer, describing the size of the array, followed by the unseparated array values. - ‘C’: array of characters - ‘I’: array of 32-bit integers - ‘Q’: array of 64-bit integers - ‘D’: array of doubles - ‘S’: array of strings - ‘O’: array of objects

Parameters:

open_file (io.TextIOWrapper) – An open file object.

Returns:

Python string (one character long) of the data type of the component’s value.

Return type:

str

class topostats.io.LoadScans(img_paths: list, channel: str)#

Load the image and image parameters from a file path.

img_paths#
img_path = None#
channel#
channel_data = None#
filename = None#
image = None#
pixel_to_nm_scaling = None#
img_dict#
MINIMUM_IMAGE_SIZE = 10#
load_spm() tuple#

Extract image and pixel to nm scaling from the Bruker .spm file.

Returns:

A tuple containing the image and its pixel to nanometre scaling value.

Return type:

tuple(np.ndarray, float)

_spm_pixel_to_nm_scaling(channel_data: pySPM.SPM.SPM_image) float#

Extract pixel to nm scaling from the SPM image metadata.

Parameters:

channel_data (pySPM.SPM.SPM_image) – Channel data from PySPM.

Returns:

Pixel to nm scaling factor.

Return type:

float

load_ibw() tuple#

Loads image from Asylum Research (Igor) .ibw files

Returns:

A tuple containing the image and its pixel to nanometre scaling value.

Return type:

tuple(np.ndarray, float)

_ibw_pixel_to_nm_scaling(scan: dict) float#

Extract pixel to nm scaling from the IBW image metadata.

Parameters:

scan (dict) – The loaded binary wave object.

Returns:

A value corresponding to the real length of a single pixel.

Return type:

float

load_jpk() tuple#

Loads image from JPK Instruments .jpk files.

Returns:

A tuple containing the image and its pixel to nanometre scaling value.

Return type:

tuple(np.ndarray, float)

static _jpk_pixel_to_nm_scaling(tiff_page: tifffile.tifffile.TiffPage) float#

Extract pixel to nm scaling from the JPK image metadata.

Parameters:

tiff_page (tifffile.tifffile.TiffPage) – An image file directory (IFD) of .jpk files.

Returns:

A value corresponding to the real length of a single pixel.

Return type:

float

static _gwy_read_object(open_file: io.TextIOWrapper, data_dict: dict) None#

Parse and extract data from a .gwy file object, starting at the current open file read position.

Parameters:
  • open_file (io.TextIOWrapper) – An open file object.

  • data_dict (dict) – Dictionary of .gwy file image properties.

Return type:

None

static _gwy_read_component(open_file: io.TextIOWrapper, initial_byte_pos: int, data_dict: dict) int#

Parse and extract data from a .gwy file object, starting at the current open file read position.

Parameters:
  • open_file (io.TextIOWrapper,) – An open file object.

  • data_dict (dict) – Dictionary of .gwy file image properties.

Returns:

Size of the component in bytes.

Return type:

int

static _gwy_print_dict(gwy_file_dict: dict, pre_string: str) None#

A developer function to print the nested object / component structure. Can be used to find labels and values of objects / components in the .gwy file.

Parameters:

gwy_file_dict (dict) – Dictionary of the nested object / component structure of a .gwy file.

static _gwy_print_dict_wrapper(gwy_file_dict: dict) None#

Wrapper for the _print_gwy_dict function.

Parameters:

gwy_file_dict (dict) – Dictionary of the nested object / component structure of a .gwy file.

load_gwy() tuple#

Extract image and pixel to nm scaling from the Gwyddion .gwy file.

Returns:

A tuple containing the image and its pixel to nanometre scaling value.

Return type:

tuple(np.ndarray, float)

get_data() None#

Method to extract image, filepath and pixel to nm scaling value, and append these to the img_dic object.

_check_image_size() None#

Check the image is above a minimum size in both dimensions.

Images that do not meet the minimum size are not included for processing.

add_to_dic(filename: str, image: numpy.ndarray, img_path: pathlib.Path, px_2_nm: float) None#

Adds the image, image path and pixel to nanometre scaling value to the img_dic dictionary under the key filename.

Parameters:
  • filename (str) – The filename, idealy without an extension.

  • image (np.ndarray) – An array of the extracted AFM image.

  • img_path (str) – The path to the AFM file (with a frame number if applicable)

  • px_2_nm (float) – The length of a pixel in nm.

topostats.io.save_pkl(outfile: pathlib.Path, to_pkl: dict) None#

Pickle objects for working with later.

Parameters:
  • outfile (Path) – Path and filename to save pickle to.

  • to_pkl (dict) – Object to be picled.

Return type:

None

topostats.io.load_pkl(infile: pathlib.Path) Any#

Load data from a pickle.

Parameters:

infile (Path) – Path to a valid pickle.

Returns:

Dictionary of generated images.

Return type:

dict

Example

from pathlib import Path

from topostats.io import load_plots

pkl_path = “output/distribution_plots.pkl” my_plots = load_pkl(pkl_path) # Show the type of my_plots which is a dictionary of nested dictionaries type(my_plots) # Show the keys are various levels of nesting. my_plots.keys() my_plots[“area”].keys() my_plots[“area”][“dist”].keys() # Get the figure and axis object for a given metrics distribution plot figure, axis = my_plots[“area”][“dist”].values() # Get the figure and axis object for a given metrics violin plot figure, axis = my_plots[“area”][“violin”].values()