Usage#

After having installed TopoStats you are ready to run it. For convenience TopoStats provides a command line interface run_topostats that will load a default configuration file and process all images with reasonable default configuration options.

However, because the location of your image files can not be known in advance you must make a copy of the default configuration and modify it to work with your files. This guide will hopefully take you through the process of running TopoStats and customising the configuration file with which it is run. If you encounter any problems please ask questions in the Discussions. If you think you have encountered a bug or have a feature suggestion please create an Issue.

Organising Scans#

You should place all files you wish to batch process in a single directory. They can be nested in separate folders as TopoStats will scan for all images within this directory but currently it will only process one scan type at a time (i.e. .spm or .jpk or .asd). This may change in the future.

Running TopoStats#

The default location that TopoStats looks for scans is the directory from which it is invoked. Once you start your shell/terminal you will therefore need to do two things.

  1. Navigate to the location of the scans you wish to process using cd /path/to/where/scans/are/located.

  2. Activate the virtual environment under which you installed TopoStats (refer to installed if unsure).

You can now run topostats by invoking run_topostats and you should start to see some output similar to that below.

cd /path/to/where/scans/are/located
run_topostats
[Tue, 15 Nov 2022 12:39:48] [INFO    ] [topostats] Configuration is valid.
[Tue, 15 Nov 2022 12:39:48] [INFO    ] [topostats] Plotting configuration is valid.
[Tue, 15 Nov 2022 12:39:48] [INFO    ] [topostats] Configuration file loaded from      : None
[Tue, 15 Nov 2022 12:39:48] [INFO    ] [topostats] Scanning for images in              : /home/neil/work/projects/topostats/TopoStats
[Tue, 15 Nov 2022 12:39:48] [INFO    ] [topostats] Output directory                    : output
[Tue, 15 Nov 2022 12:39:48] [INFO    ] [topostats] Looking for images with extension   : .spm
[Tue, 15 Nov 2022 12:39:48] [INFO    ] [topostats] Images with extension .spm in /home/neil/work/projects/topostats/TopoStats : 32
[Tue, 15 Nov 2022 12:39:48] [INFO    ] [topostats] Thresholding method (Filtering)     : std_dev
[Tue, 15 Nov 2022 12:39:48] [INFO    ] [topostats] Thresholding method (Grains)        : std_dev
...

Configuring TopoStats#

Configuration of TopoStats is done through a YAML file and a full description of the fields used can be found under the configuration section.

Here we will go through common changes that you are likely to want to make to the default configuration and how to make them.

Copying default_config.yaml#

If you have used Git to clone the TopoStats repository from GitHub the default configuration can be found in the sub-directory topostats/default_config.yaml. If you have installed TopoStats from PyPI then a sample configuration file can be downloaded from here (right-click on the link and select Save As to save the file to your computer).

Save or copy this file to the same directory all of your scan files are located and call it my_config.yaml.

cp /<path>/<to>/<where>/<topostats>/<is>/<cloned>/TopoStats/topostats/default_config.yaml /<where>/<scans>/<are>/my_config.yaml

Editing my_config.yaml#

IMPORTANT This file is an ASCII text file and you should use NotePad (Windows), TextEdit (OSX) or Nano/Emacs/Vim (GNU/Linux) or any other text editor. Do not use Microsoft Word or any other Word Processor to edit this file.

You can now start customising the configuration you are going to run TopoStats with. All fields have defaults but the ones you may want to change are….

  • base_dir (default: ./) the directory in which to search for scans. By default this is ./ which represents the directory from which run_topostats is called and it is good practice to have one configuration file per batch of scans that are being processed.

  • output_dir (default: output) the location where the output is saved, by default this is the directory output which will be created if it doesn’t exist. If you wish for the output to be somewhere else specify it here. If you want Processed directories to sit within the directories that images are found then simply set the output_dir to the same value as base_dir.

  • cores (default: 4) the number of parallel processes to run processing of all found images. Set this to a maximum of one less than the number of cores on your computers CPU. If unsure leave as is.

  • file_ext (default: .spm) the file extension of scans to search for within the current directory. The default is .spm but other file format support is in the pipeline.

  • plotting : image_set (default core) specifies which steps of the processing to plot images of. The value all gets images for all stages, `core** saves only a subset of images.

Most of the other configuration options can be left on their default values for now. Once you have made any changes save the file and return to your terminal.

Running TopoStats with my_config.yaml#

To use your new configuration file you need to inform run_topostats to use that file rather than the defaults, this is done using the --config my_config.yaml file.

NB this assumes that you are in the same directory as your scans where you have saved the my_config.yaml file that you edited. That doesn’t have to be the case but it makes life easier for if you are not familiar with absolute and relative paths.

run_topostats --config my_configy.yaml
[Tue, 15 Nov 2022 12:39:48] [INFO    ] [topostats] Configuration is valid.
[Tue, 15 Nov 2022 12:39:48] [INFO    ] [topostats] Plotting configuration is valid.
[Tue, 15 Nov 2022 12:39:48] [INFO    ] [topostats] Configuration file loaded from      : None
[Tue, 15 Nov 2022 12:39:48] [INFO    ] [topostats] Scanning for images in              : /home/neil/work/projects/topostats/TopoStats
[Tue, 15 Nov 2022 12:39:48] [INFO    ] [topostats] Output directory                    : output
[Tue, 15 Nov 2022 12:39:48] [INFO    ] [topostats] Looking for images with extension   : .spm
[Tue, 15 Nov 2022 12:39:48] [INFO    ] [topostats] Images with extension .spm in /home/neil/work/projects/topostats/TopoStats : 1
[Tue, 15 Nov 2022 12:39:48] [INFO    ] [topostats] Thresholding method (Filtering)     : std_dev
[Tue, 15 Nov 2022 12:39:48] [INFO    ] [topostats] Thresholding method (Grains)        : std_dev
...

On a successful completion you should see output similar to this at the bottom.

Processing images from tests, results are under output: 100%|XXXXXXXXXXXXXXXX| 1/1 [00:03<00:00,  3.60s/it][Tue, 15 Nov 2022 13:49:14] [INFO    ] [topostats] All statistics combined for 1 images(s) are saved to : output/all_statistics.csv}
[Tue, 15 Nov 2022 13:49:14] [INFO    ] [topostats] Unable to generate folderwise statistics as 'all_statis_df' is empty
[Tue, 15 Nov 2022 13:49:14] [INFO    ] [topostats] Writing configuration to : output/config.yaml

Output#

The output from running TopoStats is saved in the location defined in the configuration file by output_dir. The default is the directory output within the directory from which run_topostats is invoked unless it has been modified in a copy of the default configuration as described above.

At the top level of the output directory are two files config.yaml and all_statistics.csv

  • config.yaml : a copy of the configuration used to process the images.

  • all_statistics.csv : a Comma Separated Variable ASCII plain-text file of the grain and DNA tracing statistics.

The remaining directories of results is contingent on the structure of files within the base_dir that is specified in the configuration. If all files are in the top-level directory (i.e. no nesting) then you will have just a Processed directory. If there is a nested structure then there will be a Processed directory in each folder that an image with the specified file_ext has been found. This is perhaps best illustrated by way of example.

If you have the following three .spm files within your current directory, one at the top level, one under level1 and one under level1/a

[4.0K Nov 15 13:55]  .
|-- [4.0K Nov 15 13:54]  ./level1
|   |-- [4.0K Nov 15 13:54]  ./level1/a
|   |-- [ 32M Nov 15 13:54]  ./level1/a/minicircle.spm
|   |-- [ 32M Nov 15 13:54]  ./level1/minicircle.spm
|-- [ 32M Nov 15 13:54]  ./minicircle.spm

…then under output (the default foroutput_dir) you will see the following directory structure.

[4.0K Nov 15 14:06]  output
|-- [ 381 Nov 15 14:06]  output/all_statistics.csv
|-- [7.4K Nov 15 14:06]  output/config.yaml
|-- [4.0K Nov 15 14:06]  output/level1
|   |-- [4.0K Nov 15 14:06]  output/level1/a
|   |   |-- [4.0K Nov 15 14:06]  output/level1/a/Processed
|   |-- [4.0K Nov 15 14:06]  output/level1/Processed
|-- [4.0K Nov 15 14:06]  output/Processed

…where there is one Processed directory at the sub-directory level that each image was found.

NB If you want Processed directories to sit within the directories that images are found then simply set the output_dir to the same value as base_dir.

Within each Processed directory is a directory for each file found with the specified file_ext and within these are the resulting images from processing scans. If the plotting : image_set is core then there is a single image for each. If this option is all then there is also a sub-directory for each image found within which there are the directories filters, grains/lower and grains/upper which contain additional images from the processing stages and an accompanying histogram for each image showing the distribution of pixel heights for that image.