Image Segmentation Tutorial

This was originally material for a presentation and blog post. You can get the slides online.

Let us imagine you are trying to compare two image segmentation algorithms based on human-segmented images. This is a completely real-world example as it was one of the projects where I first used jug [1].

It depends on mahotas for image processing.

We are going to build this up piece by piece.

First a few imports:

  1. import mahotas as mh
  2. from jug import TaskGenerator
  3. from glob import glob

Here, we test two thresholding-based segmentation method, called method1 and method2. They both (i) read the image, (ii) blur it with a Gaussian, and (iii) threshold it [2]:

  1. @TaskGenerator
  2. def method1(image):
  3. # Read the image
  4. image = mh.imread(image)[:, :, 0]
  5. image = mh.gaussian_filter(image, 2)
  6. binimage = (image > image.mean())
  7. labeled, _ = mh.label(binimage)
  8. return labeled
  9. @TaskGenerator
  10. def method2(image):
  11. image = mh.imread(image)[:, :, 0]
  12. image = mh.gaussian_filter(image, 4)
  13. image = mh.stretch(image)
  14. binimage = (image > mh.otsu(image))
  15. labeled, _ = mh.label(binimage)
  16. return labeled

We need a way to compare these. We will use the Adjusted Rand Index [3]:

  1. @TaskGenerator
  2. def compare(labeled, ref):
  3. from milk.measures.cluster_agreement import rand_arand_jaccard
  4. ref = mh.imread(ref)
  5. return rand_arand_jaccard(labeled.ravel(), ref.ravel())[1]

Running over all the images looks exactly like Python:

  1. results = []
  2. for im in glob('images/*.jpg'):
  3. m1 = method1(im)
  4. m2 = method2(im)
  5. ref = im.replace('images', 'references').replace('jpg', 'png')
  6. v1 = compare(m1, ref)
  7. v2 = compare(m2, ref)
  8. results.append( (v1,v2) )

But how do we get the results out?

A simple solution is to write a function which writes to an output file:

  1. @TaskGenerator
  2. def print_results(results):
  3. import numpy as np
  4. r1, r2 = np.mean(results, 0)
  5. with open('output.txt', 'w') as out:
  6. out.write('Result method1: {}\nResult method2: {}\n'.format(r1,
  7. r2))
  8. print_results(results)

§

Except for the ``TaskGenerator`` this would be a pure Python file!

With TaskGenerator, we get jugginess!

We can call:

  1. jug execute &
  2. jug execute &
  3. jug execute &
  4. jug execute &

to get 4 processes going at once.

§

Note also the line:

  1. print_results(results)

results is a list of Task objects. This is how you define a dependency. Jug picks up that to call print_results, it needs all the results values and behaves accordingly.

Easy as Py.

§

The full script above including data is available from github

[1]The code in that repository still uses a pretty old version of jug, this was 2009, after all. TaskGenerator had not been invented yet.
[2]This is for demonstration purposes; the paper had better methods, of course.
[3]Again, you can do better than Adjusted Rand, as we show in the paper; but this is a demo. This way, we can just call a function in milk