Recent Changes - Search:

Advanced Graphics 2009

All Pages
All Changes

Login

Readings / Read0223AbstractionBlog

For Monday, February 23rd, the topic will be "abstraction of photographs" (or more generally, creating abstracted images). This topic is a bit less technically dense than the prior ones.

The required reading is:

  • Doug DeCarlo, Anthony Santella. Stylization and Abstraction of Photographs. In SIGGRAPH 2002, pp. 769-776. (pdf)

The web page for the project is: http://www.cs.rutgers.edu/~decarlo/abstract.html, where you can find this paper and several others.

In reading and thinking about this paper, I want you to consider what this work suggests if you don't have access to an eye tracker.

Rather than scouring this paper for the details, I'd rather that you look over another paper that does similar things (in terms of their artistic effects):

The latter one is actually pretty simple, but is described in a way that makes the math sound hard (its actually pretty simple to implement once you see what is going on).

Please post a comment question before class (before noon the day of class), and then at least one comment to the discussion afterwards.

February 16,2009 by gleicher (link)
An initial posting

To seed the conversation.

February 19,2009 by tgrim (link)
Without an eye tracker...

...DeCarlo's paper seems to convey that we won't be able to derive what parts of an image are "meaningful" as it currently (and maybe forever) requires a human to define meaning.

At the same time, the Real-Time Video Abstraction paper showed that it could be mathematically attacked and human trials could "prove" how effective the stylized image really was.

It appears that both methods agree on the necessity for a human as a verifiable source of "meaning" in the abstracted images.

February 21,2009 by zietlow (link)
Initial reading

Other than my usual "how does this crazy math work" question, I was also wondering if an eye-tracker was really worth using, because I thought the Real-Time Video Abstraction results looked better and were rendered much faster, whereas in this paper it seems like the eye-tracker only aids in washing out anything in the photograph that isn't blatantly stared at (O_O).

For instance, in the example photo of a woman with a guitar, she is presented well, but the interesting background is completely washed away, and the result looks kinda ugly. Also, strangely, in their car example, the tree in the top-right corner only disappears when they use the eye-tracker, and without it, I find the picture less interesting.

I thought Real-Time Video Abstraction was a better paper because 1) I subjectively liked their results better 2) they were in real-time, unlike the first paper's 3) they didn't need to use an eye-tracker 4) their examples seemed more informative (the diagram of their algorithm) and honest (failure case) 5) they simply had more examples 6) the video was very entertaining, and watching their algorithm work in real-time was impressive. The Photographs paper is a few years older, but still.

February 21,2009 by yuzhen (link)
Interesting topic

Image and video abstraction / simplification / stylization is an interesting topic.

The results of real-time video abstraction impress me most, since temporal coherence is always a difficult and important problem for video processing. Pseudo-quantization used in the paper spread the changes of feature over a large area, so make the changes less noticeable. I'm glad to see the results are suit for small displays.

The idea of hierarchical image representation in "stylization and abstraction of photographs" is useful. In this paper, I don't understand how the frontier regions got smoothed.

February 21,2009 by elisabet (link)

I personally liked the results of the eye-tracking paper best, even though it did abstract away most of the backgrounds. The others looked good, too, but that was my favorite.
It's interesting the different ways people attempt to determine which parts of an image are important, by tracking eye movements or by contrast or by curvature. I suppose if one didn't have an eye tracker, they could use the math to determine what's meaningful to an image, but using human input seems more intuitive to determine what's important to a human viewer. An explicit importance map might achieve similar results, but seems like a lot more work.

February 22,2009 by rosin (link)
Lacking an eye tracker

Real-Time Video Abstraction is the paper I mentioned in a previous reading blog that shows the advantage of abstract images in recognition tasks.

A few things come to mind that might eliminate the need for an eye tracker. The first is blur detection; it seems obvious that if any areas of the image are out of focus, they will be less important and more detail can be sacrificed there. Another possible approach is tracking motion across frames; areas that move are probably the main subject of a video sequence. Of course, that's potentially misleading, since if the camera is tracking a person or object than it's likely that the object stays in about the same place in each frame, while everything around it moves. It's disappointing to see that the video abstraction paper doesn't attempt to use this information though. Does anyone know whether there's been much work trying to identify the main subject(s) of a video sequence, as distinct from the background or irrelevant "extras"?

February 22,2009 by bmsmith (link)

I thought the results in the DeCarlo paper were more visually compelling than the Real-Time Video Abstraction paper's results, to be honest. However, I feel like the eye tracking idea is a little gimmicky. If the important part of an image is what people naturally notice and focus on, why do any more work to highlight it? On the other hand, if you want to draw attention to something people don't normally notice, then using eye tracking data to guide the drawing isn't a good idea. In Figure 7 of the DeCarlo paper I actually much preferred the finer scale result using no meaningful abstraction. In the result that uses fixation data, they de-emphasized the background, I guess so I don't have to, but I do this naturally in my head anyway.

It would have been interesting in the DeCarlo paper if they had shown the eye tracking data so we could get a better sense of what people's eyes were doing when they viewed the images. Also, it would be interesting to see a comparison between the results generated using eye tracking data from different people.

It probably just comes down to taste, but I might appreciate the Real-Time... results more if they had made the quantized results crisper, with less blurring between the colors.

Upon skimming the Shape-preserving paper and getting the gist of it, I kind of want to implement it *right now*. Thankfully, Jake already did (awesome!) so I might just go play with the executable he posted on his project 1 blog. Hooray!

February 22,2009 by mccardel (link)
The Details Aren't Important!

I remember reading the DeCarlo paper a while ago for some reason, and after glancing at it then, I figured that there would have to be a way to figure out what the "important" parts of an image where. Happily, the Real-Time Video Abstraction paper kinda shows that this is correct. It seems that there's some work to be done, since their method isn't nearly as good as the one outlined by DeCarlo, but it has the very nice property of not needing humans. One further step towards making humans obsolete I suppose.

Regarding the Shape-preserving paper, it seems to me that they seem to have gone "too far" with what they're doing, to the point where I don't "agree" with all their pictures. For example, the lion picture. After 60 iterations, it seems to me to have lost too much detail: it's harder to differentiate depth. And at 20 iterations, it seems to have picked up too much detail.

February 22,2009 by finn (link)

I agree with Brandon on this one. The eye-tracking data seemed like more of a novelty than anything, and I definitely preferred the results without data (well, I liked the image at the title). I also view the lack of real-time results as a pretty big detriment. I suppose if they wanted to use this for video they could just hire a bunch of people to sit and look at every frame for days on end.
With regard to aesthetics, I thought the "Shape-Simplifying Image Abstraction" paper produced the most pleasing results.

February 23,2009 by zoerb (link)

It looks like with the eye tracker data, the image is much more focused on the main object of an image, as can be seen in the Real-Time Video Abstraction paper. Although the background is very degraded visually, it seems that that was the point of the DeCarlo algorithm. I think the images they produced were pretty good.

February 23,2009 by yangk (link)
Initial Read

The results to me seemed appealing. I found it interesting how they combined edge dection and image segmentation with the use of an eye tracker to determine what should be the main focus of the outputed abstract image. My final project for computer vision last semester was on image matting (seperating an image into a foreground and background picture) so by using similar techniques, it could be possible to seperate the object the user is looking at from its background. I'm not sure what this could be useful for though.

February 23,2009 by aderhold (link)

It seemed that the main benefit of the eye tracking was that it allowed the unimportant details to be essentially left out of the rendered drawing. In the picture of the car, the drawing that used the eye tracking data was less detailed over the entire picture than the drawing with constant (high) eccentricity, and especially so in the backgrounds. Comparing the two, I don't feel that the lack of background detail really detracts from the quality of the eye-tracking-enhanced drawing. On the other hand, I don't find that the additional detail in the background buildings really makes it any less obvious that the car is the main subject of the non-eye-tracking enhanced drawing.

When I was at Google this past summer, I was talking with one of the user interface designers about Google's use of eye tracking in UI testing. He said that at least in the area of computer interfaces, the eye follows only a couple of patterns when trying to understand a new interface. Any interface that puts critical elements outside of these set patterns tends to be less intuitive to new users in tests. However, the thing is that these patterns are known, so tracking eye movements in the tests does nothing but confirm what we already know about where the eye naturally goes to try to learn a new interface. I don't know if similar studies have been done for photographs in general, but if there are certain "rules" about where the eye is naturally drawn when looking at a photo, maybe it would be more productive to try and come up with some heuristics to predict that rather than relying on an actual human to look at each photograph to identify the important bits.

February 23,2009 by alex (link)
...

I can't see eye-tracking becoming a normal, regularly used method for stylizing photographs. It's an interesting experiment, but as many people have pointed out, the images that don't use it are at least AS visually compelling, if not more so, than the images using the tracer. Throw in the fact that the other method (and others, I'm sure) can be done in real-time, and the practicality of the eye-tracer just goes out the window. I don't think the benefit justifies the cost.

February 23,2009 by mikola (link)

It seems like a lot of people here are very critical of DeCarlo & Santella's work. However, I have much the opposite feeling. I really like DeCarlo's paper on the basis that it provides an empirical, scientific basis for what it is doing. Eye tracker measurements are a plausible way to figure out which features are visually most important. One question I have is, outside of eye-tracking, what other types of measurements can we take about perception?

In the other papers, I don't really get why what they are doing is particularly important. If you want to increase contrast while smoothing out contours in an image, it is relatively obvious to just run it through a couple of bilateral / edge filter it. The last paper is probably the most arbitrary of the bunch. I agree that their results look relatively good, but I wonder how much of this is just an artifact of careful data selection + tweaking. What is the significance of applying the mean curvature flow / shock propagation? Would other types of geometric flow work equally well? (eg. Ricci flow, Gaussian curvature flow, bilateral filter, or even Laplacian smoothing?)

February 23,2009 by bfield (link)
other ways of finding interesting regions

I was wondering if there might be a good way to use video to find interesting regions of a scene. It could be that locating regions of an image that change between frames indicate areas of interest.

February 23,2009 by amoore (link)

I think the eye tracking was a good idea for finding the details that an eye focuses on, but a bit impractical for common use. I much prefer the shape preserving image abstraction for this task. The tiger picture at the end is especially cool. Unfortunately for this one, you still need the user to find small round details like the eyes and mark them. I also skimmed the video abstraction paper, and kinda want to read that next. Is anyone doing any kind of video post-processing this time? Because something like that would make a cool project.

February 23,2009 by yoh (link)
alternative to eye tracking

i could be wrong, but i think my eyes move in predictable ways when i take in an image. i think my eyes focus on the "foreground" subjects closely, and briefly flicker over the background, and anything else that "stands out" (colorful, textured, etc). it might be possible to train some sort of eye-movement-simulator of sorts, using machine learning techniques. if this worked at all, it would be more practical than hooking up human subjects to images to generate the abstraction.

February 23,2009 by cory (link)
Interesting but lacks meaningful computer vision

The DeCarlo paper was very interesting and created beautiful images, but it still requires a human element. I would be interested to see a fusion of the video abstraction and decarlo papers where they examine the fixation points to try to determine what about them the human finds interesting, and from this, create a humanless system that tries to approximate the results.

I have to agree with Finn and the others, it seems more gimicky... though perhaps less of a gimick amd more of a standard for comparing other algorithms?

February 23,2009 by cory (link)

February 23,2009 by bmsmith (link)
Post lecture: scale space is the cat's pajamas

I especially appreciated the portion of lecture about scale space. It's surprising how often it keeps coming up in computer vision and graphics papers that I read. In fact, the deblurring method that I'm working on implementing for project 1 uses this idea. Starting with a very small (down-sampled) version of the image, the algorithm deconvolves (deblurs) it and then uses the result to guide the deconvolution of the next higher resolution image. It also uses bilateral filtering to reduce ringing artifacts -- another thing Mike mentioned in lecture that keeps coming up again and again.

That's a little off topic maybe, but that's what I left class thinking about...

February 24,2009 by mccardel (link)
Post lecture thingy

In response to blayne, I'd think that areas of an image that don't move could also be important. Mike was talking about a scene from the Hitchcock movie where the only thing that matters are the keys on the table. Maybe something where you check "how much" something moves, in relation to everything else, could be useful. The more something moves, strictly compared to everything else that's moving, the less important it is. Granted this has some fairly obvious counter-cases: a video of a car race would show the slow car being more important, whereas you'd want the fast cars to be more important. I suppose another problem would be the backgrounds of a scene, which might not move much.

I found some cool videos when looking up NPR papers a few weeks ago...Let me see if I can find them.

This one has some cool videos. They were linked from the Lighting Projects Blog, and I'm just reposting it here for all those people who found it interesting.

February 24,2009 by rosin (link)
Eye tracking for retargeting

Professor Gleicher mentioned retargeting as a possible application for image abstraction. If his comment on DeCarlo's eye-tracking data (that it measures salience, not importance) is correct, there may be an easy fix. If I take a large image and scale it down for a much smaller screen, I'd be throwing away a lot of irrelevant details along with detail I'd want to keep. If I then get eye-tracking info for someone shown this new, smaller image, I think I'd expect them to concentrate on the areas where they expect important information to be. The difference between this and the original case is that in the original, whatever important information is present is probably easily visible. In the smaller, unclear image, the person might have to try harder to identify objects where important detail was lost, and probably won't pay any more attention to areas where things are still identifiable enough at the smaller scale.

The main idea would be, rather than get important info that drives an abstraction, produce an abstraction (scaled version, in this case) without regard to where the important information may have been, then use eye-tracking to model where information seems to have been lost. Another abstraction can then be produced which reveals more detail in those areas.

February 24,2009 by finn (link)

I like Jake's idea. The unimportant details in the original would be de-emphasized (read blurry), but not completely discarded. Perhaps that first guitar player example in the paper wouldn't look as weird; the whole right-hand side might look less like just a gray blob, but more like a gray-ish blob with some variation.
I don't think I have anything intelligent to contribute, so I'll go with the "What makes this a good paper?" angle. The paper presents a new (I assume it was new at the time) and interesting idea for determining the importance of particular details, though I agree that it really gets at salience instead. It does, however, produce decent results so that in itself probably makes it worth publishing.

February 24,2009 by elisabet (link)

Ah, most people have said what I was thinking, so I'll just add a couple little things that are slightly less on topic. First one is that I'm glad we talked about salience and the difference between that and importance because now a bunch of titles of the NPR stuff I've been reading for the project make more sense.

The other thing I was wondering about is what the effect of different groups of people interacting with the eye tracker would be. I started thinking about this when Cory and Jeff pointed out after class that there are like 5 fixation points on the girl with the guitar's legs. So it stands to reason that if you got a group of guys vs a group of girls vs a group of kids to look at a picture, you'd get different results with their abstraction algorithm. Maybe more on the psych side of things, but I thought it was interesting.

February 24,2009 by tgrim (link)
The salience issue is definitely an interesting one

It leads me to wonder if we can get to the point where a computer can generate its own importance and salience based upon object recognition software and artificial intelligence.

February 24,2009 by bfield (link)
random thoughts after lecture

After the discussion about saliency vs. importance, I had to wonder: wouldn't it just be about as easy to skip the eye-tracker and just have somebody mark important regions in the image?

I agree with Will's comment: it's probably not absolute motion in a scene that's interesting so much as relative motion.

Extending Jake's idea, it might be interesting to see what happens when you combine the saliency map you get from the eye-tracking data with the seam-carving approach. You would basically be using the eye data to avoid removing parts of the image that were viewed the most.

February 24,2009 by zietlow (link)
Post-lecture

On salience vs. importance, I feel like the intent of the image is too much of a factor for an eye-tracker to single-handedly determine what's important. I could see DeCarlo's algorithm being useful for more practical images, but the notion of art skews the idea of importance.

For instance, if any part of a Rembrandt painting was deemed unimportant and washed out based on DeCarlo's abstraction, I feel it would be a poor abstraction because even if, say, the center of the painting is the most salient, the background details are certainly important too (not just from an artist's perspective, but also perhaps from a historian's). Betsy makes a good point about the eye-tracker too...I'd like to see their results for the cover of Cosmopolitan or something. It would be hilarious if someone's face was completely abstracted away.

February 25,2009 by zoerb (link)
A recurring thought

I keep wondering what methods like these would be used for. There seems to an emphasis on conveying only important information in an artful way, but going along with what David said, a piece of art with the background washed out would essentially be a waste. Why not just make it half the size and throw out the background? I might see the point of deemphasizing the unimportant parts of a real picture (maybe a person glances at it and sees the foreground object without being distracted by the background), but why the NPR?

History - Print - Recent Changes - Search
Page last modified on February 25, 2009, at 12:11 AM