This article is an augmented architecture description for the Perceptuum 3 renderer. It contains outlines of the ethos and techniques, and simplified views of all the models of the system (as for the Unified Software Development Process). (3000 words)
For the Unified Software Development Process[1.1], an ‘architecture description’ is a view of the whole system structure with the important characteristics made more visible by leaving details aside. It is composed of views of each and all the models of the system, based on architecturally significant use cases (most important functionality and risk). Production of this document marks the end of the elaboration phase of this project.
The currently dominant approach to global illumination architecture is based on the monte-carlo ray-tracing method. Although it has strengths in generality and theoretical physical fidelity, its weaknesses show in practical use. The stochastic essence translates to an obdurate noisiness, and the high dimensionality of the domain causes insurmountably long convergence. The incoherency is not supported by common specialised hardware.
The requirements of users are principally aesthetics and speed, and these may be better served by a different architecture with a different emphasis. The initial ethos directing the design is:
That is: deliberately try to simulate less completely, and orient towards the strongest computing resource. From these follow some primary internal aims:
And these can be more broadly guided by an external product aim:
The core algorithm can be characterised by the term: projection-interpolation light gathering.
An elucidation of which is: inward light at any point is determined by projection — passing scene objects through a transform and rasterization pipeline. And, most points do not have inward light determined, but instead have it approximated by interpolating points that are fully determined. The overall direction of the algorithm is to start at the eye and move out, gathering light, rather than start at light sources, spreading light. To summarise:
This is a re-use and adaptation of the algorithm devised by Ward[2.2] and used in the ‘Radiance’ renderer. The modifications are to avoid ray tracing mostly, and avoid monte-carlo completely — replacing them with projection, and to augment it with separate feature sub-algorithms for caustics, volume scattering, motion-blur, etc.
If the number of gatherings is of the order of 1000, and each has a resolution of about 1000 points, and projecting a much simplified geometry: that would approximate the effort needed for a projective rendering of a single whole-screen image. Such a rendering can be done in the order of a few seconds.
Projection is faster and more robust than monte-carlo ray tracing. It exploits coherence, and uses more fully the available scene data. Where monte-carlo requires practically infinite time to find small features, projection can easily capture them in an accumulation buffer.
Projection can be directly accelerated with specialised graphics hardware, and maybe even mostly removed by running in parallel to the CPU.
Separating feature techniques allows more specialised, faster and more robust, approximations, whereas a generalised monte-carlo approach becomes intractable with multiple dimensions.
Avoiding monte-carlo removes otherwise ineradicable noisiness in images. Aliasing can be suppressed by control of scene level-of-detail, and other established projection techniques.
Eliminating monte-carlo allows simplification of BRDF/shader handling since probability distributions are not needed.
Global light gathering more simply and automatically calculates relevant illumination, where general photon mapping (global light spreading) needs special restriction of its domain to what is visible to the viewer.
The first operation is projection of the scene as a triangle mesh, into pixels containing the IDs of the visible triangles. Every pixel can then be visited to intersect a ray from the camera, using the ID and perhaps other geometric data. Maybe a triangle list can be extracted from the pixels, then each rasterized more incrementally.
The surface point determined for each pixel is then used. The position is sent to the illumination package to get all the incident light. The emission is read directly. Both can then be put through the (light-)interaction package to calculate the outward light towards the eye.
The global gather algorithm is separated into two parts, the first level, nearest the eye, is made more sophisticated. The incoming light determination is separated into four parts: gather, ray trace, photon map, translucency map. They will be combined by simple addition, which means their light paths must not overlap.
The gather is itself separated into two parts: indirect and direct. The direct part only gathers emissive surfaces, and at just below screen resolution, so that shadows will be sharp. The indirect part gathers a shadowed render using nominated and brightest emitters, which provides an extra level of light bounce.
These less important contributions can be evaluated very much as the original Ward global gather. The trees can start at low resolution on the image, and, with high error allowance, propagate by ray tracing. The gather projection at each node could be with shadows, or just emitters only, — the tradeoff needs to be experimented with. Either way the accumulation can probably be simplified to pure diffuse.
All light gatherings are stored. A straightforward explicit representation would require a lot of, but not too much, memory. Reducing size would be beneficial to performance so some simple compression may be good. Conversion to spherical wavelets seems more appropriate than spherical harmonics, since the data would be closer to piece-wise linear.
The gather could be a uniform hemi-(or whole)-cube. More efficient may be a single surface-parallel plane, since for diffuse reflection low-angle light is a small contributor. Perhaps best would be to orient and concentrate a single plane perpendicular to the BIDF ‘peak’ (if some analysis can be done), supplemented with a surface-parallel plane. A basic requirement is that the resolution be greater than the nyquist limit for the BIDF (discluding perfect specular).
Density of gathering points is set by the Ward error term. Interpolation between points is done with Ward gradients[2.3].
Since projection is used ‘internally’ rather than for the final image, various approximations can be made. Models can have: lower level of detail, no textures, simple shaders. Specular reflection is approximated with glossy, glossy transmission is approximated with specular. No ray tracing or photon mapping or translucency mapping is used.
So that light paths can be separated for different calculation, projection must parameterise: shadowed renders, transparency (per item), primary/secondary emitters (per item). Also projection needs to able to just produce triangle IDs.
Antialiasing and fog can be done straightforwardly. Transparency and shadows require a bit more work.
The restriction of OpenGL 1.1 to 8 bits per color channel means special work will be needed to handle bright emitters like the sun and sky. A higher level scene-graph component wrapper will be needed to maintain performance for large scenes.
At the top level extra features are enabled: ray tracing is used to follow perfect specular interaction only, and propagates in a tree for both reflection and transmission. Photon mapping is used only for nominated objects, and built by gathering at points on the object surface, and traced with perfect specular only. Translucency mapping is used only for nominated objects.
The BIDF class is fully general, and mostly follows its mathematical form: returning a scaling (for each spectral channel) given an inward and outward direction. But it must separate perfect specular interaction: by returning a scaling (of a perfect inward ray) for an outward direction. The BIDF must also provide a rough approximation of itself in traditional form: diffuse and specular weightings and shininess value.
Some built-in BIDFs are provided: perfect diffuse, fresnel specular, Ward (and probably others). But plugins are enabled, specified in the model file with lists of tagged parameters/textures.
Can be done with the image convolution technique, probably with a limit on lens size to constrain the filter size. The weakness is with reflections, which would be blurred according to the focus of the reflecting surface, rather than the reflected objects themselves. But only when the reflective surface is in focus would the fault be noticeable.
The moving objects case can be handled by separating rendering into two passes, and treating indirect and direct illumination differently. The first pass renders all static objects with indirect gather, static photon map and translucency map illumination. All indirect gather is done with the scene fixed in mid-time position. The second pass renders iteratively: each step accumulates direct gather and ray tracing illumination, and accumulates renders of moving objects, each on a separate sub-image. When iterations complete, everything is merged and composited.
The moving camera case can be handled by including motion vectors with the pixels, then filtering the normally rendered image.
To combine depth of field and both kinds of motion blur, follow the moving objects process and apply depth of field to each sub-image, then apply the moving camera vector-filtering at the end.
This image refinement can be done post-render, by conventional distribution ray tracing techniques. High contrast pixels can be fully sampled in both dimensions of time and lens position.
The scene is a tree of two principal node types: objects and instances, both inheriting a common interface. An object contains a shape definition, but no transform. An instance references objects or instances, each with a transform. The basic object/instance common interface defines both projectable and ray traceable capabilities. An object need not be a triangle mesh internally, but must be able to generate one, preferably at different levels of detail.
Every triangle is uniquely identifiable. All scene tree nodes note how many triangles they contain: leaf objects know their triangle counts, instances sum their sub-part counts. So navigating a path from the root can either determine or find a particular numbered triangle. Numberings can be stored in separate trees for different levels of detail.
To spatially index the scene, each instance has an octree holding the triangles of sub-objects and the bounds of sub-instances.
The basic format is the OpenEXR high dynamic range image. A tonemapping, gamma and color transform can be used to produce RGB images in PNG format. Supplementary buffers containing z or other geometry, and alpha can be included.
Fully exploiting the GPU and multiple CPUs is not straightforward. Having the GPU work in parallel to the CPU is awkward because the GPU is used in the middle of a pipeline, and substantial computation is dependent on data produced. Having multiple CPUs work in parallel is awkward because substantial computation is routed through a single GPU.
The GPU can be wrapped by a thread running a queue. Then multiple CPU threads can submit requests and lookup results, without blocking their execution too much.
Illumination pipelines (incorporating GPU work) can be split into two: a ‘seed’ pipeline to do all computation leading to submitting GPU requests, and a ‘harvest’ pipeline to do all computation on GPU results. Since multiple pipeline instances would be required across the rasterization, both CPU and GPU parallelization is possible, as long as each individual pipeline completes its seed before starting its harvest. However, experimentation with prototypes is needed to see if this refactoring is really worth it.
A system's external behaviour is described by a set of use cases. Each use case is a sequence of actions that provide the user with a result of value. With some supplementary requirements, this constitutes the requirements model.
The external behaviour has a very simple structure, the complexity being contained in the algorithms.
actor – user
actor – user
features:
actor – user
features:
Use of OpenGL 1.1, and maybe higher level scene-graph library.
Reuse of components and libraries: Perceptuum2, Radiance, X3D, OpenEXR, libpng, boost, stlport, cppunit.
Portability to different compiler/OS/hardware.
This has the purpose of refining the use cases in more detail, and making an initial allocation of the behavior of the system to a set of objects. The perspective is from the outside, leaving implementational considerations aside.
The broad structure is an ‘open’ hierarchy: higher packages use/depend on any lower ones. There are three divisions: project specific, general graphics, and non-application specific general.
Some key classes can be found and their basic relationships sketched.
classes listed in package groups
rasterization
illumination
interaction
modelling
imaging
This defines the static structure of the system as subsystems, classes, and interfaces; and realizes the use-cases as collaborations among those elements.
Further classes are added to the packages.
classes listed in package groups, with architecturally significant members in bold
rasterization
illumination
interaction
modelling
imaging
graphics
platform
general
The interfaces are divided into methods, each with informal parameter lists.
interfaces of important classes
Camera
PixelCalculator
Projector
LightTransport
WardTransport
RayTracer
PhotonMap
TranslucencyMap
SurfacePoint
Bidf
Scene
ObjectRenderable
ImageHdr
The main render use case can be realized as a high level pseudo-code sequence, each line being an operation description or method call.
Camera::render pre render build photon map build translucency map enumerate scene [using view] main render Projector::construct for id production Projector::project start illumination cache loop thru pixels following low res grids at each point and ray trace node call LightTransport to make illumination loop thru pixels PixelCalculator::getPixel get SurfacePoint from Scene Bidf::isTransmissive LightTransport::getIllumination WardTransport::getIllumination if Bidf::getScalingSpecular larger than zero RayTracer::getIllumination PhotonMap::getIllumination TranslucencyMap::getIllumination SurfacePoint::getEmission calculate overall light interaction equation Bidf::getScalingGeneral Bidf::getScalingSpecular Image::setPixel apply depth of field post render adaptive supersample pick high-contrast pixels loop thru timesteps loop thru lens points trace into pixels
set time point to mid render static objects step through time render top layer background accumulation, and each moving object on a separate sub image accumulation
render, including pixel vectors convolve image according to vectors
Rather than distribute across computational nodes, this divides the system into separate executable programs of particular types.
Command-line programs:
Optional dynamic-link plugins can be added:
This contains general implementation notes and guides.
Component/libraries reused:
Code details:
Though testing is usually not part of the architecture for USDP, an integrated XP[1.3] approach to testing makes it so.
The overall strategy is that testing will be mostly unit testing — the complex, numerical nature of the algorithms demand detailed probing in several places. System testing will be by viewing specially constructed test scenes.
Construction order: