Image Segmentation Tutorial
This was originally material for a presentation and blog post. You can get the slides online.
Let us imagine you are trying to compare two image segmentation algorithms based on human-segmented images. This is a completely real-world example as it was one of the projects where I first used jug [1].
It depends on mahotas for image processing.
We are going to build this up piece by piece.
First a few imports:
import mahotas as mh
from jug import TaskGenerator
from glob import glob
Here, we test two thresholding-based segmentation method, called method1
and method2
. They both (i) read the image, (ii) blur it with a Gaussian, and (iii) threshold it [2]:
@TaskGenerator
def method1(image):
# Read the image
image = mh.imread(image)[:, :, 0]
image = mh.gaussian_filter(image, 2)
binimage = (image > image.mean())
labeled, _ = mh.label(binimage)
return labeled
@TaskGenerator
def method2(image):
image = mh.imread(image)[:, :, 0]
image = mh.gaussian_filter(image, 4)
image = mh.stretch(image)
binimage = (image > mh.otsu(image))
labeled, _ = mh.label(binimage)
return labeled
We need a way to compare these. We will use the Adjusted Rand Index [3]:
@TaskGenerator
def compare(labeled, ref):
from milk.measures.cluster_agreement import rand_arand_jaccard
ref = mh.imread(ref)
return rand_arand_jaccard(labeled.ravel(), ref.ravel())[1]
Running over all the images looks exactly like Python:
results = []
for im in glob('images/*.jpg'):
m1 = method1(im)
m2 = method2(im)
ref = im.replace('images', 'references').replace('jpg', 'png')
v1 = compare(m1, ref)
v2 = compare(m2, ref)
results.append( (v1,v2) )
But how do we get the results out?
A simple solution is to write a function which writes to an output file:
@TaskGenerator
def print_results(results):
import numpy as np
r1, r2 = np.mean(results, 0)
with open('output.txt', 'w') as out:
out.write('Result method1: {}\nResult method2: {}\n'.format(r1,
r2))
print_results(results)
§
Except for the ``TaskGenerator`` this would be a pure Python file!
With TaskGenerator
, we get jugginess!
We can call:
jug execute &
jug execute &
jug execute &
jug execute &
to get 4 processes going at once.
§
Note also the line:
print_results(results)
results
is a list of Task
objects. This is how you define a dependency. Jug picks up that to call print_results
, it needs all the results
values and behaves accordingly.
Easy as Py.
§
The full script above including data is available from github
[1] | The code in that repository still uses a pretty old version of jug, this was 2009, after all. TaskGenerator had not been invented yet. |
[2] | This is for demonstration purposes; the paper had better methods, of course. |
[3] | Again, you can do better than Adjusted Rand, as we show in the paper; but this is a demo. This way, we can just call a function in milk |