Logos

1 Introduction

Lawrence N Hudson - software engineer, Ben W Price - curator of small orders and Natalie Dale-Skey - curator of Hymenoptera

This workshop will give a comprehensive overview of the Inselect desktop software and its associated command-line tools.

1.1 Topics

This workshop will cover

1.2 Acknowledgements

Inslect has received support from

1.3 Preparation

  1. Download and run the appropriate installer for the latest release of Inselect (at time of writing, v0.1.33) from the releases tab
  2. If you do not already have one, download and install a good text editor. Two options that we use
  3. Download InselectWorkshopFiles.zip
  4. Optional background reading and viewing:

1.4 The problem

Natural history collections are vast and varied, providing quite a few challenges when digitising them.

At the Natural History Museum, London, there are an estimated 33 million insect specimens, housed in 130 thousand drawers:

Drawers of pinned beetles at Natural History Museum, London

1.5 Whole-drawer imaging

The challenge is to efficiently get a single image of each object along with its associated metadata

1.6 How Inselect can help

Inselect aims to solve some of the problems associated with whole-drawer imaging

2 Quick tour

Start Inselect. You should see a list of keyboard shortcuts.

Keyboard shortcuts

On Windows, many shortcuts are activated by holding down the CTRL key together with another key. On a Mac, the command button () is used instead of CTRL. This workbook uses CTRL - if you are on a Mac, press whereever you see CTRL written.

There are two 'views' of an Inselect document.

2.1 The Boxes view

This holds a zoomable, low res version of the whole drawer image together with bounding boxes:

Boxes view

2.2 The Objects view

This shows a grid of icons, one icon for each bounding box:

Objects view

2.3 The toolbar and status bar

The toolbar offers functions relevant to the currently selected view. The status bar at the bottom shows some useful feedback.

The toolbar and status bar

2.4 The panel on the right-hand side

A 'minimap' thumbnail image and indicates where the Boxes view is zoomed to:

Minimap

2.4.2 Metadata fields

Metadata templates give you control over the fields and validation:

Metadata

2.4.3 Information

Some information about the loaded document:

Information

2.5 File sizes

Many digitisation pipelines use TIFF images. Whole-drawer TIFF images can be extremely large - up to 800MB is common. The examples in this worksheet use JPG files in order to minimize amount of data that you need to download.

2.6 Words of warning

3 Worked example 1 - insect soup

Objective: to place a bounding box around each object in an image and export each image crop to its own JPG file.

This example will cover

3.1 Opening the file

1.InsectSoup/Img0920+LG+C2.jpg - an insect soup image of Diptera - true flies

Use the open file using one of

Insect soup

3.2 Created files

Inselect has created two files

  1. Img0920+LG+C2.inselect

    This is a small file that will contain information about bounding boxes and their associated metadata.

  2. Img0920+LG+C2_thumbnail.jpg

3.3 Image handling

On the 'Boxes' view you can zoom in and out by

You can pan around the image by

3.4 Creating and edited bounding boxes

You can create boxes with

You can select boxes

You can move selected boxes using

3.5 Segmenting

Creating boxes by hand is silly - we want to minimise manual steps and get the computer to do the hard work for us.

Tun Inselect's segmentation algorithm

Inselect will attempt to detect individual objects within the image and place a bounding box around each. It uses a general purpose algorithm that works well with many of the different specimen types that we tried.

Insect soup with bounding boxes

The segmentation does a reasonable job but is not perfect - some manual refinement is required.

3.6 Refining the results of segmentation

We will check and refine each of the bounding boxes created by the segmentation algorithm. We will also create any bounding boxes that are missing.

3.7 Delete unrequired boxes

3.8 Adjust borders of bounding boxes where they are too big or too small

You can adjust boxes using

3.9 Split apart boxes that encompass more than one object

This often happens when objects slightly overlap e.g., insect wings. We could resize the large box and create new ones but this is uneccesary manual work.

Flies with overlapping wings

Run the 'Subsegment box', either from the toolbar or with F6

Subsegmented flies

3.10 Export crops

Once you are happy with the bounding boxes, click on 'Save crops' in the 'Export' section of the toolbar.

Insect soup with fully refined bounding boxes

4 Worked example 2 - pinned insects

Objective: to configure and use Inselect metadata templates.

This example will cover

4.1 Preamble

Open 2.Metadata\Scopelodes_spp_Lim_14.inselect

Moths with bounding boxes

4.2 Metadata template in Inselect

You have complete control over metadata fields and validation through .inselect_template files, which are simple text files that you can edit using any good text editor.

Moths with metadata validation failures

Click on any of the bounding boxes. The Location and Taxonomy fields are both coloured pink, indicating a validation problem

Moths with single box selected

Moths with single box selected

Let's set the metadata for all boxes

Moths with all boxes valid

4.3 Metadata panel

The metadata panel on the right shows Simple Darwin Core fields

4.4 Creating and editing metadata templates

Files are in a format called YAML (YAML Ain't a Markup Language - http://yaml.org) - a structured text format. A reference and examples template files are at https://github.com/NaturalHistoryMuseum/inselect-templates - open this page in a new browser tab and have a quick look through it.

Open limacodidae.inselect_template in your text editor:

Name: Limacodidae
Object label: '{Taxonomy}-{Location}-{ItemNumber}'
Fields:
    - Name: Taxonomy
      Mandatory: true
      Choices:
          - Anaxidia
          - Anepopsia
          - Apodecta
          - Birthamoides
          - Calcarifera
          - Chalcocelis
          - Comana
          - Comanula
          - Doratifera
          - Ecnomoctena
          - Elassoptila
          - Eloasa
          - Hedraea
          - Hydroclada
          - Lamprolepida
          - Limacochara
          - Mambara
          - Mecytha
          - Parasoidea
          - Praesusica
          - Pseudanapaea
          - Pygmaeomorpha
          - Scopelodes
          - Squamosa
          - Thosea
    - Name: Location
      Mandatory: true
      Choices:
          - Drawer 1
          - Drawer 2
          - Drawer 3
          - Drawer 4

When you come to create your own .inselect_template files, it is best to modify an existing template to suit your needs.

4.5 Editing the template

You will append a new, optional free-text field - Notes - to the template.

4.6 Export metadata and bounding boxes to a CSV file

Click 'Export CSV' in the toolbar and open the CSV file in Excel, OpenOffice or similar.

Columns are

5  Worked example 3 - microscope slides

Our third example is another SatScan image, this time of microscope slides arranged in a template. Each of the slides contains a DataMatrix barcode and we will look at Inselect's barcode reader

Objective: to read barcodes on microscope slides, rotate each slide to be in the correct orientation and to export cropped images.

This example will cover

5.1  Preamble

Microscope slides

5.2 Refine

There are 100 sockets but automatic segmentation has created 102 boxes. Some of the sockets do not contain slides but contain red markers that indicate the location and genus of the slides that follow:

Microscope slides

Once refined, you should have 95 bounding boxes:

Microscope slides

5.3 Metadata template

Open Templates/sialidae.inselect_template in your text editor

5.4 Setup barcode reading

Select 'Configure' from the 'Barcodes' section of the toolbar:

Barcode reading options

5.5 Read barcodes

Read barcodes with F7.

5.6 Other metadata

You will select the relevant groups of slides and set their values of location and genus.

Reminders

Once completed, all metadata should be valid with all boxes clear:

Completed slides

5.7 Sorting boxes

You can sort boxes either by rows or columns.

The selected sort option is applied when you segment an image.

5.8 Objects view

Switch to the Objects view.

Some relevant shortcuts

This view shows crops on a grid with a square for each bounding box, along with each box's number and object label:

Completed slides

Selection

5.9 Rotation

You will rotate each crop so that labels are in the correct orientation.

Rotated slides

5.10 Export crops

Click 'Save crops'

Open the directory containing the crops.

6 Worked example 4 - cookie cutter templates

The microscope slides are arranged on a 20 x 5 template. If you are regularly dealing with hundreds or thousands of scanned images with an identical arrangement of objects then automatic segmentation is imperfect.

Objective: to create and use cookie cutter templates.

This example will cover creating and applying cookie cutter template.

With

Open Drawer_60b_61_62a.jpg Inselect creates boxes using cookie cutter.

7 Command-line tools

Objective: to ingest the five example image files in 5.CommandLineTools and apply the cookie cutter that you previously created.

This example will provide an introduction to Inselect's command line tools.

7.1 Workflow

Inselect workflow

Each of the operations shown in blue has an associated command-line tool. You can pick and choose the relevant command-line tools together with cookie cutters and metadata templates to integrate Inselect into your existing workflows. Descriptions of each tool are below.

7.1.1 ingest

7.1.2 segment

7.1.3 read_barcodes

7.1.4 export_metadata

7.1.5 save_crops

7.2 Test that you can run tools

Start the Windows command prompt. The following code fragments assume that you installed Inselect to the default location of C:\Program Files\inselect. You should alter the paths as required, if you installed the program to a different directory.

Each tool supports the --help argument:

C:\Program Files\inselect\ingest.exe --help

You should see

usage: ingest.exe [-h] [-c COOKIE_CUTTER] [-w THUMBNAIL_WIDTH] [--debug] [-v]
                  inbox docs

Ingests images into Inselect

positional arguments:
  inbox                 Source directory containing scanned images
  docs                  Destination directory to which images will be moved
                        and in which Inselect documents will be created. Can
                        be the same as inbox.

optional arguments:
  -h, --help            show this help message and exit
  -c COOKIE_CUTTER, --cookie-cutter COOKIE_CUTTER
                        Path to a '.inselect_cookie_cutter' file that will be
                        applied to new Inselect documents
  -w THUMBNAIL_WIDTH, --thumbnail-width THUMBNAIL_WIDTH
                        The width of the thumbnail in pixels; defaults to 4096
  --debug
  -v, --version         show program's version number and exit

7.3 Ingest images

The 5.CommandLineTools directory contains five JPG files. Run

C:\Program Files\inselect\ingest.exe --thumbnail-width 8000 \
    --cookie-cutter <path to the inselect_cookie_cutter file> \
    <path to the 5.CommandLineTools directory> \
    <path to the 5.CommandLineTools directory>

7.4 Roundup

Logos