The Image Policing Power of the Crowd, or, Let them See Kelp!
A number of you have noticed that there are a lot of images that are either of land, rivers, lakes, or are just plain bad images, cut off in some places. What gives? I want my kelp!
Well, so do we.
When we get images from Landsat, they’re huge! About 40000km2 (200 km x 200 km). And it’s incredibly difficult to see the kelp at that scale. So, we need to chop the images into bite-sized pieces for your delectation. The Zooniverse team has put together a great algorithm to do this. Here it is, as relayed by Chris Snyder:
1) We start with a raw Landsat image that you are familiar with.
2) That image is chunked into a bunch of small squares.
Next, we run a geospatial query over the images using coastline data (http://openstreetmapdata.com/data/coastlines). We use the PostGIS plugin of PostgresSQL to accomplish this.
3)Region that intersect with the coastline shapeline are selected. You achieve something like this: http://i.imgur.com/0DS1BNU.jpg
One problem – if their algorithm is too aggressive, legitimate pieces of coastline to search get dropped, and we lose precious data.
So what to do?
The simple answer is to add squares *next* to areas selected as coastline. This results in a lot of kruft – image edges that look weird, areas of all land, etc. It’s a bummer. However, this is where the power of the crowd comes in. The number may be tinkered with in the future, but currently, when 3 people submit an image and don’t mark any kelp on it, it is kicked out of the system.
So, see an all-land image? Click next. See an all sea image? Click next. See a cutoff edge? Clock next.
If two other people also click next, no one will ever see the image again. Then we can focus down on those images where at least 1 in 3 people have seen some glimmer of kelp – even if mistakenly – so that even more folk can look at it and lasso some kelp. This means that bad images will be quickly kicked out so that we can all spend our time with kelp. Because, frankly, who wouldn’t want to just spend all of their time with kelp?
Are you doing this with Nereocystis kelp forests too?
A lot of the areas in the Northern California images also contain Nereocystis and mixtures of both. We’re *really* curious as to whether we can pick it up well from space (we’re going to be releasing a lot of additional Nereo imagery later in the year). We know that it’s likely difficult, as Nereocystis canopy visibility is dependent on the tides in a way that Macrocystis is not. BUT – we just don’t know! For example, I just classified an image from near Fort Ross in N. Sonoma county. It had some beds, but not as much as I would think given that I know there are some vast Nereo beds there. But, is that because there was none, or because it was hard to see? It’s only once we build up a vast database of these images that we’ll be able to say. Also later in the year, we’ll be bringing in areas that have Ecklonia maxima (S. Africa) and Eualaria (Alaska). These will again be experiments to test the reliability of visualizing these known canopy forming kelps.
Heck – once we get some good data, we’re gonna need to get in the water and validate these classifications! Now *that* is going to be fun!
Hi, I’m writing a brief paper (for a postgrade course) about this project. I’d love to have some informations.
Here I read “2) That image is chunked into a bunch of small squares.” I’d like to know the dimensions of theese squares.
I know every image is classified by different users to cross-validate their entries (I hope this is the proper phrase!) I wonder how many entries are needed.
Thank you for your kind answers. And thank you for your amazing job!