Frankly, I'm not very impressed. This problem is a very very well known problem in Image Processing. Usually people call it background subtraction or the more general form is image segmentation.
There is many really well working algorithms in this field. A google search with the major research conferences (ICCV, ICIP, SIGGRAPH etc) will give you the latest and the greates of these algorithms. You'll also find good (image segmentation) work if you limit the search for csail.mit.edu.
If you want your uploader to help you, you can also go for one of the supervised image segmentation algorithms. Otherwise you'll need an unsupervised algorithm.
Given that the authors also want to detect what is in the image, this might be helpful for them:
The really crazy thing is that they seem to be reinventing the wheel so that they can lean on the optimized code a graphics library provides. I think that's flawed thinking to begin with. In my experience with graphics programming, a reasonably optimized direct implementation tends to beat the hell out of a nine-step filter chain, no matter how good your graphics library. Even if they used the same algorithm, a direct implementation could condense steps 2, 3, and 4 into a single convolution [edit: whoops, no you cant; Sobel has a non-convolution step].
But more importantly, they've tied their hands by limiting themselves to a tiny set of operations. Combing the computer vision literature would have been a better use of time than trying to chain filters together.
In fact, some background subtraction algorithms effectively do as an intermediate step what Lyst wanted as an end result.
"Image segmentation" is usually the term for doing this with still images, and OpenCV provides a couple of functions for it. They're not perfect, but they're probably more effective than what this article describes.
okay, this is image segmentation that attempts to estimate and remove the 'background' segments using some domain specific assumptions. This is generally referred to, in common language, as 'background subtraction'.
I will not debate common language use of 'background subtraction' because I have not discussed it with laypeople before, but what I can say is that if you were trying to implement what the article is trying to implement and were trying to find relevant literature, then searching for background subtraction would not turn up anything useful.
> The really crazy thing is that they seem to be reinventing the wheel so that they can lean on the optimized code a graphics library provides. I think that's flawed thinking to begin with.
Actually I disagree with you, even though what you say is correct in principle. On any given day, the vast majority of my workload revolves around working with other people’s code. So I may spend 7 hours trying to get a new library to compile or figure out why I’m getting an exception or how to get out of dependency hell and only 1 hour getting “real work done”.
For me, it’s exhausting to find a new library or API that loosely does what I need, only to find that I have to install a new language, new framework, new compiler, or even new package manager to use it. Developers have a tendency to copy other developers (even when the “normal” way of doing things is not ideal), so many libraries have no binary that I can test, and in fact no example of usage or up to date documentation.
Then there are subtleties with new libraries such as speed or memory usage. So perhaps a library that does exactly what you want runs at 1 frame every N seconds while the highly optimized function in a mainstream graphics library runs at many hundreds or even thousands of frames per second by utilizing concurrency or the graphics card.
So in fact when it’s all said and done, I tend to think more in transformations. I ask myself if I can express a solution slightly differently if it allows me to use an existing tool, then encapsulate it in a black box that has the same inputs and outputs as my ideal solution. Then my frustration is that other developers don’t seem to think this way. “Make one tool that does one thing well” has become the mantra and driven us into this fragmented ecosystem.
This is just an aside, but: The only truly general purpose language that has a syntax that doesn’t make me want to club myself over the head is probably MATLAB, but unfortunately their licenses are too expensive for me. So I have high hopes for Octave, and after that, maybe NumPy. So maybe one point we could agree on is that we shouldn’t need a graphics “library” in the first place. If we had a good mainstream concurrent language, then many of these algorithms become one paragraph code snippets and run with speed comparable to C or Java.
Edit: I wanted to give a concrete example. Low level languages like C are overly verbose in their implementations, by 10 or 100 times usually, because they try to leverage the wrong metaphors. Notice how compactly concepts like image compression can be expressed with the right ones:
Yes, reinventing the wheel is okay when you have a simple problem. But the article describes a complicated problem, and the presented solution delivers pretty poor results. In this specific case, I am pretty sure that looking for an existing solution would be the wiser choice. I saw two separate solutions to this problem at a recent graphics conference alone.
(Tip: if you have trouble compiling something you found on Github, contact the author and offer $200 to walk you through the installation. Might save a lot of frustration)
I agree that using optimized libraries is the better choice when those libraries do something close to what you are trying to do, but this algorithm is Rube Goldberg-esque.
Also, OpenCV includes functions for image segmentation. If they couldn't use that, it would have been nice for the article to at least touch on why.
I think that's being a bit unkind, it's simply clear that the people are not experienced with computer vision and/or image processing in general. They approached the problem from their domain and found the solution that worked for them. If I had gotten to the point that I wanted to do a domain-specific image segmentation heuristic I would certainly build it up from a series of filters and image morphology steps. If the performance (both accuracy and speed) was satisfactory, I don't know why you would invest in optimization at that point. Also, from experience I know that implementing algorithms from papers is a slow and many times painful process as the original authors generally don't publish reference code, and if they do it's probably in Matlab. And if they don't have someone adept at image processing their potential for success would likely be low.
Also, OpenCV includes functions for image segmentation. If they couldn't use that, it would have been nice for the article to at least touch on why.
I agree that a quick "Here's what we tried that's readily available" would have been good for others to learn from, because for many the general solutions would be sufficient.
Everything I can quickly think of in OpenCV would tend to include too much background without user interaction unless similar transformations were performed anyway, so I assume if they had thought of it, that's why they ignored OpenCV.
You should give Node.js and npm a shot. It's one of the best examples of practical modularization. It's very easy and quick to test a new package, and while the documentation is often rough, the main readme on GitHub almost always has an example to show what you want to do. (Obviously Node.js is only suited for certain applications, and it's not great with graphics in particular).
I work in the same market as Lyst, and can say for a fact that the images variations we get from clients even from renowned fashion houses is huge. By variations I mean compression, quality, subject's clarity and the list just goes on...
A good algorithm would surely solve part of the problem, but given the imagery non-uniform patterns a great deal of resources (engineering skills + computing power) would likely be required. Given Lyst's recent VC rounds maybe they can afford that, but I surely cannot.
Funny - I'm in the same domain and I have found that product images are generally high quality and somewhat uniform. You can usually find the product placed front and centre and with few distractions (Most images have either white or light grey background).
Renowned fashion houses overall do a great job, except when they add gradient or shadows to mask stuff. But the bulk of the problem, in my case, comes from large department stores when they blend furniture or accessories in fashion photoshoots making them complex and distracting from the main image subject.
The reason given is that the algorithm (Sobel) looks for light-to-dark transitions. There are two such transitions in the positive image: one from the background to the edge, and another from the edge into the object. Negating the object leaves only one light-to-dark gradient on the edge of the object.
Yeah, but that reason is nonsense. The Sobel operator uses convolution to approximate the magnitude of the gradient. It doesn't "look for" anything.
And since it's the magnitude of the gradient that Sobel finds, Sobel(Invert(img)) is mathematically equivalent to Sobel(img). The invert step is essentially an expensive noop.
I don't think background subtraction means what you think it does. Background subtraction refers to the subtraction of neighboring frames of a video sequence in order to find moving objects in a video sequence. So, yes segmentation is a more general and difficult problem than background subtraction, but it is in no way relevant to the task described in the article.
Image segmentation is exactly what they are trying to do. You can quibble about the exact definition of "background subtraction", but it doesn't change the fact that they are reinventing the wheel over a solved problem.
The corpus of product images at Lyst seems to be quite diverse. How many of those algorithms are good at very generalised segmentation, i.e. equally good at segmenting boat shoes on decking as they are say a tank in a field? What are your favourite papers on the subject?
One of the coolest approaches I've seen does some cool inference on fully connected Conditional Random Fields via high-dimensional filtering. Amazing results.
There is many really well working algorithms in this field. A google search with the major research conferences (ICCV, ICIP, SIGGRAPH etc) will give you the latest and the greates of these algorithms. You'll also find good (image segmentation) work if you limit the search for csail.mit.edu.
If you want your uploader to help you, you can also go for one of the supervised image segmentation algorithms. Otherwise you'll need an unsupervised algorithm.
Given that the authors also want to detect what is in the image, this might be helpful for them:
http://people.csail.mit.edu/mrub/papers/ObjectDiscovery-cvpr...
This guy is also doing some great work in that field:
http://www.engr.uconn.edu/~cmli/