topostats.plotting#

Plotting and summary of TopoStats output statistics.

Attributes#

Classes#

TopoSum

Class for summarising grain statistics in plots.

Functions#

read_yaml(→ dict)

Read a YAML file.

save_pkl(→ None)

Pickle objects for working with later.

write_yaml(→ None)

Write a configuration (stored as a dictionary) to a YAML file.

convert_basename_to_relative_paths(df)

Convert paths in the 'basename' column of a dataframe to relative paths.

update_config(→ dict)

Update the configuration with any arguments.

toposum(→ dict)

Process plotting and summarisation of data.

run_toposum(→ None)

Run Plotting.

Module Contents#

topostats.plotting.read_yaml(filename: str | pathlib.Path) dict[source]#

Read a YAML file.

Parameters:

filename (Union[str, Path]) – YAML file to read.

Returns:

Dictionary of the file.

Return type:

Dict

topostats.plotting.save_pkl(outfile: pathlib.Path, to_pkl: dict) None[source]#

Pickle objects for working with later.

Parameters:
  • outfile (Path) – Path and filename to save pickle to.

  • to_pkl (dict) – Object to be picled.

topostats.plotting.write_yaml(config: dict, output_dir: str | pathlib.Path, config_file: str = 'config.yaml', header_message: str = None) None[source]#

Write a configuration (stored as a dictionary) to a YAML file.

Parameters:
  • config (dict) – Configuration dictionary.

  • output_dir (Union[str, Path]) – Path to save the dictionary to as a YAML file (it will be called ‘config.yaml’).

  • config_file (str) – Filename to write to.

  • header_message (str) – String to write to the header message of the YAML file.

topostats.plotting.convert_basename_to_relative_paths(df: pandas.DataFrame)[source]#

Convert paths in the ‘basename’ column of a dataframe to relative paths.

If the ‘basename’ column has the following paths: [‘/usr/topo/data/a/b’, ‘/usr/topo/data/c/d’], the output will be: [‘a/b’, ‘c/d’].

Parameters:

df (pd.DataFrame) – A pandas dataframe containing a column ‘basename’ which contains the paths indicating the locations of the image data files.

Returns:

A pandas dataframe where the ‘basename’ column has paths relative to a common parent.

Return type:

pd.DataFrame

topostats.plotting.LOGGER_NAME = 'topostats'#
topostats.plotting.update_config(config: dict, args: dict | argparse.Namespace) dict[source]#

Update the configuration with any arguments.

Parameters:
  • config (dict) – Dictionary of configuration (typically read from YAML file specified with ‘-c/–config <filename>’).

  • args (Namespace) – Command line arguments.

Returns:

Dictionary updated with command arguments.

Return type:

dict

topostats.plotting.LOGGER#
class topostats.plotting.TopoSum(df: pandas.DataFrame = None, base_dir: str | pathlib.Path = None, csv_file: str | pathlib.Path = None, stat_to_sum: str = None, molecule_id: str = 'molecule_number', image_id: str = 'image', hist: bool = True, stat: str = 'count', bins: int = 12, kde: bool = True, cut: float = 20, figsize: tuple = (16, 9), alpha: float = 0.5, palette: str = 'deep', savefig_format: str = 'png', output_dir: str | pathlib.Path = '.', var_to_label: dict = None, hue: str = 'basename')[source]#

Class for summarising grain statistics in plots.

Parameters:
  • df (pd.DataFrame) – Pandas data frame of data to be summarised.

  • base_dir (str | Path) – Base directory from which all paths are relative to.

  • csv_file (str | Path) – CSV file of data to be summarised.

  • stat_to_sum (str) – Variable to summarise.

  • molecule_id (str) – Variable that uniquely identifies molecules.

  • image_id (str) – Variable that uniquely identifies images.

  • hist (bool) – Whether to plot histograms.

  • stat (str) – Statistic to plot on histogram ‘count’ (default), ‘freq’.

  • bins (int) – Number of bins to plot.

  • kde (bool) – Whether to include a Kernel Density Estimate.

  • cut (float = 20,) – Cut point for KDE.

  • figsize (tuple) – Figure dimensions.

  • alpha (float) – Opacity to use in plots.

  • palette (str = "deep") – Seaborn colour plot to use.

  • savefig_format (str) – File type to save plots as ‘png’ (default), ‘pdf’, ‘svg’.

  • output_dir (str | Path) – Location to save plots to.

  • var_to_label (dict) – Variable to label dictionary for automatically adding titles to plots.

  • hue (str) – Dataframe column to group plots by.

_setup_figure()[source]#

Setup Matplotlib figure and axes.

Returns:

Matplotlib fig and ax objects.

Return type:

fig, ax

_outfile(plot_suffix: str) str[source]#

Generate the output file name with the appropriate suffix.

Parameters:

plot_suffix (str) – The suffix to append to the output file.

Returns:

Concanenated string of the outfile and plot_suffix.

Return type:

str

sns_plot() tuple[matplotlib.pyplot.Figure, matplotlib.pyplot.Axes] | None[source]#

Plot the distribution of one or more statistics as either histogram, kernel density estimates or both.

Uses base Seaborn.

Returns:

Tuple of Matplotlib figure and axes if plotting is successful, None otherwise.

Return type:

Optional[Union[Tuple[plt.Figure, plt.Axes], None]]

sns_violinplot() None[source]#

Violin plot of data.

Returns:

Matplotlib fig and ax objects.

Return type:

fig, ax

static melt_data(df: pandas.DataFrame, stat_to_summarize: str, var_to_label: dict) pandas.DataFrame[source]#

Melt a dataframe into long format for plotting with Seaborn.

Parameters:
  • df (pd.DataFrame) – Statistics to melt.

  • stat_to_summarize (str) – Statistics to summarise.

  • var_to_label (dict) – Mapping of variable names to descriptions.

Returns:

Data in long-format with descriptive variable names.

Return type:

pd.DataFrame

set_xlim(percent: float = 0.1) None[source]#

Set the range of the x-axis.

Parameters:

percent (float) – Percentage of the observed range by which to extend the x-axis. Only used if supplied range is outside the observed values.

set_palette()[source]#

Set the color palette.

save_plot(outfile: pathlib.Path) None[source]#

Save the plot to the output_dir.

Parameters:

outfile (str) – Output file name to save figure to.

_set_label(var: str)[source]#

Get the label based on the column name(s).

Parameters:

var (str) – The variable for which a label is required.

topostats.plotting.toposum(config: dict) dict[source]#

Process plotting and summarisation of data.

Parameters:

config (dict) – Dictionary of summarisation options.

Returns:

Dictionary of nested dictionaries. Each variable has its own dictionary with keys ‘dist’ and ‘violin’ which

contain distribution like plots and violin plots respectively (if the later are required). Each ‘dist’ and

’violin’ is itself a dictionary with two elements ‘figures’ and ‘axes’ which correspond to MatplotLib ‘fig’ and ‘ax’ for that plot.

Return type:

dict

topostats.plotting.run_toposum(args=None) None[source]#

Run Plotting.

Parameters:

args (None) – Arguments to pass and update configuration.