[4eyes] EGSR post-mortem

Sat Jul 13 15:30:57 PDT 2019

Hi everyone,

We just had the Eurographics Symposium on Rendering (EGSR) this past week,
so here are some some of the things that I learned while I was there that
might be of interest to the rendering folks at UCSB.  Lingqi was also there
and he can probably add some other things that he thought was interesting.

(Sorry to spam everyone, but we don’t have a list dedicated to everyone who
does rendering projects across our labs.  Maybe we should either create a
mailing list or start a Slack channel focused on rendering. Maybe one of
the students can help with that process.)

*SAMPLING*
There were several good papers on sampling that we should probably read and
go through. Specifically, the paper by Heitz and Belcour got the “best
paper” award, and the paper by Jarosz et al. also got some attention. He
posted his talk here so you can watch it:
https://cs.dartmouth.edu/~wjarosz/publications/jarosz19orthogonal-slides.mp4

Let's read these and some of the other papers of interest in the rendering
discussion group.

*PATH GUIDING*
It seems path guiding is a very hot area of research and many people are
pursuing various aspects of it.  Some people feel like this will be the key
for the next big rendering algorithm that could potentially transform the
field. Many people that I talked to are looking into ideas that are similar
to the ones that we have either done, or are currently trying to do.

@Steve: I think that our last project with Pixar was precisely starting to
drill at the entrance of this potential goldmine. It is most unfortunate
that we couldn’t get it to the level it needed to get it accepted at a
proper venue, but hopefully we can get it out soon and release a code that
fixes the obvious limitations of the current method (e.g., properly sample
the hemisphere, handle specular/glossy surfaces by properly accounting for
the BRDF in the importance sampling process, etc.) so that this work can
start to get traction. It seems many people are interested in these ideas!

Of course, the main problem with this algorithm (only guiding the first
bounce) is an artifact of the decision to represent the incident radiance
with a hemisphere map like we did, and I don't think we are going to be
able to fix that in this version of the algorithm. Hopefully we will be
able to address this in Chris's project.

@Chris: we need to push hard on your project because there is a sense in
among researchers in rendering that we are drilling near a large goldmine
and you may have some serious competition (e.g., Pascal Grittman, Jaroslav
Krivanek, and others) who seem to be examining quite similar ideas. This is
a good sign, because it means that we are absolutely on the right track,
but now it’s a race to the top to see who gets there first!

The last time I was this excited about the potential of a rendering project
was in 2008-2009 when Soheil and I started working on the first general MC
denoisers, and then again in 2012-2013 when we started thinking about using
ML for denoising for the first time. In both cases, I felt like we were on
the brink of doing something that would change the way the rendering
community does things, and in both these previous cases these are things
that are now gaining huge traction in research and industry.

I think this has the potential to be the next big thing, but we have to
move very, very fast and make sure we pull out all the stops by
demonstrating our algorithms on complex scenes. Some of the things we need
to be able to do are:

- Compare successfully (and convincingly) against standard path tracing,
bidirectional path tracing, VCM, etc., and of course the current slew of
path guiding methods. However, since many of these path guiding methods do
online learning, they should be no problem to beat. We also need to find
scenes where neither unidirectional nor bidirectional path tracing has no
hope of working at all. We can discuss more in detail.

- Handle complex materials, not just basic diffuse and specular, but
hopefully more complex BRDFs since a lot of the bulk of the rendering time
in production goes into evaluating shaders.  We need to make sure that our
method accounts for these BRDFs properly when computing the PDFs for
sampling so that it can handle arbitrary materials.

- Keep memory in check. Although this isn’t a major issue, we need to make
sure that our memory usage doesn’t balloon from our method, as usually
production scenes have to deal with a large amount of memory for geometry,
buffers, etc.  I think your approach of culling old passes of vertices is
not a bad idea, but we should also explore things like deleting vertices
that are not recently used (kind of like in caching). We should also
explore the full trade off between memory and quality, to see how much
better things can get if we decide to keep every vertex (although that
would consume a large amount of memory).

- Demonstrate good performance on sufficiently complex scenes. See below.

- Subsurface scattering/volumetric transport (?) Folks were concerned about
how some bidirectional or path guiding approaches might work in these kinds
of situations. It would be interesting to explore in more detail. This
could be the subject of a follow on paper, as it seem to me that trying to
handle this in one submission could be too much.

- Some people seemed to be skeptical about running a neural network at
every bounce to do anything, say reconstruct the sampling PDF.  They
thought it would be too slow. I still think that is the way we should do it
for the first version of the paper, say the SIGGRAPH paper, but then we can
come up with a "simplified" version that we can submit to say, EGSR, that
does some quick adhoc thing and hopefully show comparable results.

We can discuss this in more detail when we meet face to face next week.

*KEYNOTES*

We had three good keynote speakers. Summary of their thesis of their talks:

Jaakko Lehtninen (Aalto University/NVIDIA): Machine learning models do not
provide semantically meaningful abstractions that allow them to leverage
human-understandable constructs, such as the Newton's equations of motion
or Kajiya's rendering equation. How do we build systems that can bridge
this gap?  This was not necessarily new; we have talked about this issue
ourselves in the ML discussion group and I think it is an interesting area
of research. Unfortunately, there were no concrete details about how to do
this in this talk.

Marcos Fajardo (Solid Angle): Lots of pretty pictures/short films and a
history of the development of the Arnold path tracer renderer. He also made
some interesting comments about how only unidirectional path tracing is
used in industry, and about keeping things efficient (such as only loading
things once). One of the things he was the most proud of was the fact that
he doesn’t use pre-processing in Arnold, like photon mapping, which he
didn't seem to think it was a good thing (not only because of bias
issues). So @Chris, we should consider tracing the eye rays first and then
starting the iteration, so that there isn't any pre-processing. Let's think
about what difference that makes.

Ali Eslami (Google Deep Mind): He presented his work in Science where he
trained a machine learning system takes in a set of images and creates a
scene representation as a latent vector (encoding) and then another machine
learning system takes this and a camera pose and creates an image
(decoder). The latter is essentially learning the rendering process, and
can do occlusions, etc. It can also do some very cool "image math", such as
image of blue sphere minus image of red sphere plus image of red triangle =
image of red sphere. We've discussed this paper in our group before, but we
should probably schedule some time to look at it in more detail. It's very
"far out" kind of research, but it's also very exciting! In particular, I
have some interesting ideas based on this kind of work that I think could
significantly impact rendering. If anyone is interested in using Machine
Learning to directly learn the rendering process, please talk to me. I'm
looking for a someone to work on this!

*INDUSTRY PERSPECTIVE*I also spent some time at the conference talking with
folks in production (e.g., Luca Fascione, Johannes Hanika at Weta, Marcos
Fajardo at Solid Angle) trying to understand their problems better, what
things academic researchers like us should focus on, and how the algorithms
we develop can be made more portable to industry.  We talked about various
things:

*Unidirectional Path Tracing*
It seems that everyone in the community currently uses unidirectional path
tracing for the bulk of their rendering. Of course, every single system
uses some kind of MC denoising method (it’s good to see that our work can
have significant impact!) but surprisingly they seldom use bidirectional
path tracing. Specifically, they listed several problems with bidirectional
path tracing:

   - Significantly increase in computation. Say you have 8 bounces of eye
   and light paths. You have to compute the PDFs of all possible
   interconnections 8x8 = 64 possible complete paths between them, which
   involves computing a lot of complex BRDFs and evaluating everything. Very
   expensive, especially if you take into account the following bullet
   points...
   - Kinds of scenes. Most of the scenes they work with in production are
   open, not closed boxes like the Cornell box. Furthermore, the positions
   of the lights often lead to paths that lead away from the eye. For
   example, a common example in production is a closeup of a character's
   face lit by a rim light from behind, like this:[image: image.png]

This is a very common shot in production, since cinematographers like to
light their characters from behind because it is more dramatic. So this
means that the light paths will hit the back of the character's head, and
(most often) bounce off into open space or hit parts of the scene that have
nothing to do with any eye path. So bidirectional path tracing doesn't buy
you anything, and it only eats up your time trying to make connections with
light vertices that have bounced far behind the occluding character and
will never be seen by the camera.

   - Their scenes have a LARGE number of lights. Thousands. Many of those
   are nowhere near the camera, nor do they have some path to reach the
   camera.  So when you start a light path in bidirectional path tracing, you
   have to pick a random light and then a random point on that light and trace
   the path, only to find that most likely it never connects with the eye path
   at all.
   - Fireflies. Bidirectional path tracing produces fireflies that
often unidirectional
   path tracing does not because the light source is never seen. Although
   there are methods to deal with them (e.g., clamping their values during
   denoising), apparently they are annoying to deal with. Furthermore the
   noise patterns from unidirectional path tracing is easier to deal with so
   this is why they prefer to deal with that.

For these reasons, most production rendering systems rely mostly on
unidirectional path tracing. Some systems include switches that allow them
to turn on bidirectional path-tracing in certain scenes, but more often
than not unidirectional path tracing is used even though it cannot handle
light transport in complex scenes.

However, @Chris, your project will DIRECTLY address many of these problems
by using the eye vertices to guide the light transport paths.  This is why
it is so exciting: it will combine the benefits of bidirectional path
tracing and those of the simpler, unidirectional path tracing. It could be
a game changer.

*Scenes to Render*
On a related note, I asked the folks in production about the kinds of
scenes we use in academia and how they compare against the scenes used in
production. What kinds of scenes should we be using in order to demonstrate
that our algorithms would work in practical scenes that are used in
production? Are we “overfitting” the algorithms we are working on to a
bunch of useless scenes that are not useful in practice?  In short, mostly
yes! There are a few reasons for this:

   - Light oversimplification. Our scenes usually have a few lights, and
   maybe a single environment light. Most production shots have hundreds (if
   not thousands) of lights. For example, at Weta their area light sources are
   tesselated and have different colors per "cell", so a single area light
   source can easily be considered thousands of smaller lights of different
   colors. To deal with these complex lights, most production rendering
   systems build complex light hierarchies and use them for sampling.  It is
   unclear whether they take visibility into account or not, or only the
   intensity of each light source. We should manually (or rather procedurally)
   modify our scenes to have these kinds of light sources to emulate more
   realistic scenes.
   - Geometry. Our scenes usually have much simpler geometry than the
   billions of polygons found in typical production scenes. For this reason,
   Disney has released the Moana island scene (
   https://www.technology.disneyanimation.com/islandscene) and we should
   probably think about using it in our experiments more often. We should also
   build our scenes that stress our systems.
   - Shaders. Simply using "diffuse" and "specular" as our material
   properties is FAR removed from the state-of-the-art in practice. If we want
   our work to gain the respect we want (in particular for some of our methods
   that are either computing the PDF for importance sampling, or modeling the
   "phase" function of a voxel of geometry)
   - I think rather than rendering empty room environments, we would do
   ourselves a huge favor by putting characters in these same environments and
   lighting them properly (see pic above) in order to get scenes that behave
   much closer to what you see in production environments.

*Bottlenecks*
I asked a lot about performance of their systems, and got a rough breakdown
of render times for a typical 10 hour render (which is not uncommon in
production):

30% tracing rays [3 hrs]
30% evaluating shaders [3 hrs]
10% picking light sources [1 hr]
20% Texturing, other memory access [2 hrs]
10% everything else (e.g., writing buffers, file I/O, scene loading) [1 hr]

Or something like that. Note that many of the renderers do some kind of
deferred shading where they output the texture coordinate lookups and then
do the memory fetch all at once at the end to reduce memory bandwidth.

@Chris, your project would at least help address 40% of this work load
(tracing rays and picking light sources), but as a group we should also
think about ways of evaluating shaders more efficiently. Perhaps we should
revisit an approach that @Steve explored in one of his internships at
Pixar, which tried to use machine learning to simplify the
shading/texturing process. I still think there is value here, perhaps by
encoding the shader in a latent vector and using that at run-time.  We
should discuss at some point.

*How they get the low variances*
Most production shots are rendered between 128-256spp, and some even at
64spp (Marcos played a nice short movie that he claimed was rendered at
64spp!).  Of course, most of them are denoised, but even without denoising
they still exhibit an extremely amount of variance for an MC rendering
system. I probed them quite a bit about what kinds of things they found to
have the biggest impact on variance reduction.  Here is what they said:

   - Direct lighting! The biggest source of noise is the direct lighting.
   They spend quite a bit of time (10% according to the breakdown above)
   building the light hierarchy and figuring out which of the light sources
   they need to sample from to ensure that they are mostly sampling the
   brightest sources in the scene.
   - Good sampling patterns.  In his talk, Marcos admitted that they could
   not use QMC patterns because of patent reasons but they found good sampling
   patterns that worked very well.
   - Multiple Importance Sampling (MIS). This is one of the keys to getting
   reduced variance, but getting good weighing schemes is tricky. If @Chris's
   method is successful, how much would we need this?

*They don't render shots*
Production rendering systems don't usually render entire shots, just
individual frames. I pressed them on this point and they said that they
don't have a specific reason for doing this, but that's the way they
typically do it. Weta folks did admit they had a secret project they
couldn't talk about where they are working on rendering a shot all
together. Personally, we should move towards rendering of entire shots; it
just makes sense. For example, in your project @Chris we can leverage
vertex information in neighboring frames to further inform the process, not
just the ones that are neighboring in space. I think this should really
make a huge difference, because if you think about it, in a scene where the
camera is moving but the scene is not the light transport will not change
from frame to frame, so information from one frame will be completely
useful in another. When scene objects start moving around this breaks down
a bit, but still there will be considerable coherence. So it will still be
tremendously useful.

Overall, I feel that THIS IS AN UNDEREXPLORED ASPECT OF RENDERING and I
have been clamoring to see someone work on for the past 10 years. So far,
however, except for a little bit of temporal coherence in the MC denoising
paper no one has really taken me up on the challenge and took this head on.
I think this would make a huge difference.
In any event, this is all I have time to core dump for now. We can discuss
more at a meeting if folks want to discuss it further.

Best,

-Pradeep

---
Pradeep Sen
Professor
UCSB MIRAGE Lab
Dept. of Electrical & Computer Engineering
University of California, Santa Barbara
Santa Barbara, CA 93106-9560
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cs.ucsb.edu/pipermail/ilab-users/attachments/20190713/d3a466fa/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 510840 bytes
Desc: not available
URL: <https://lists.cs.ucsb.edu/pipermail/ilab-users/attachments/20190713/d3a466fa/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 459808 bytes
Desc: not available
URL: <https://lists.cs.ucsb.edu/pipermail/ilab-users/attachments/20190713/d3a466fa/attachment-0003.png>