Russian3DScanner (R3DS) is a company that develops tools that can be used to create digital characters based on 3D scans. Your wrap software can automatically adapt an existing topology to a 3D scan of a person. R3DS has now taken this a step further and recently released a new product called Wrap4D which is designed to process 4D sequences of scans.
4D performance tracking has attracted a lot of attention in recent years. More and more studios are now viewing it as a source of training data for machine learning and a way to take the quality of facial animation to a new level.
4D Capture has long been available as a service that few vendors provide. Wrap4D offers a larger number of studios the opportunity to use this technology. It is a stand-alone application that users can install and use in production.
A 4D sequence captured by Infinite Realities using AEONx Motion Scanning & Wrap4D.
What is wrapping?
When an artist wants to create a photorealistic digital character based on a real actor, they usually start with a scan. While the exact geometry of the face comes from a 3D scan, the actual scan is usually made up of millions of triangles and usually contains noise, artifacts and missing parts. To turn a 3D scan into an animated character, it must be converted to a low poly mesh with a user or animation friendly topology.
Traditionally, this means performing a retopology manually. This option works well for a single character. However, if a studio needs to scan 10 characters, it would ideally want to process the different scans in the same topology.
R3DS Wrap offers an automatic solution to this problem. This process is known as wrapping. A base network with the desired topology is used and an optimization approach is used to automatically adapt the base network to the surface of the scan.
How the packaging works
Although the wrapping can be done fully automatically, an artist can control the fitting process if necessary. For each point on the base network, a user can explicitly specify a corresponding target on the surface of the scan.
In practice, artists often need to scan not just one but dozen of actor expressions. While the wrapping method works well for a neutral expression, processing other expressions requires a different approach. In the case of non-neutral expressions, the method should also take the textures into account. Each vertex on the neutral mesh corresponds to a skin feature or pore on the actor's face. When determining the correct position for this vertex on a target printout, it is important to find exactly the same skin pores on each scan. Ideally, this process should be repeated for each of the thousands of vertices in the scan – which is virtually impossible to do manually. This process is often referred to as "resolving correspondence".
To solve this problem, R3DS implemented its own OpticalFlowWrapping method. With a set of virtual cameras, the neutral network and the target scan are rendered from different angles. The optical flow between the images is determined for each camera and a decision is made as to where the individual pixels should be moved. The decisions of all cameras are then combined into a global solution.
After adjusting the base mesh to all expressions with pixel-level accuracy, it is possible to mix not only between expressions, but also between their textures, creating realistic blood flow and wrinkle effects. Without this correct alignment, the skin of the face appears to smear or drift over the face.
This is how OpticalFlowWrapping works
Wrapping and OpticalFlowWrapping tools are currently used by many studios to create digital characters for games and movies. With each scan, they reduce the retopology time from several hours to a few minutes. However, Wrap is not limited to just working with 3D scans. For example, it's often used to adjust the desired topology onto a high-resolution digital sculpture, or to quickly apply skin micro-detail maps to a new character.
4D performance recording
“With the release of the OpticalFlowWrapping tool, we realized that we could extend this approach to processing 4D sequences made up of thousands of frames. This research later led to a new product called Wrap4D, ”says Andrey Krovopuskov, CEO of R3DS.
Unlike traditional performance tracking methods that rely on tracking marks and contours, 4D tracking gets information about where every millimeter of an actor's skin is moving. It can be thought of as high-speed 3D scanning at video-level frame rates. Some methods actually explicitly calculate a 3D scan for each frame, while others get information about the shape implicitly.
It's worth noting that 4D capturing isn't a new idea. One of the earliest examples of 4D capture was the Universal Capture System developed for The Matrix Reloaded. (UCAP would win a sci-tech Oscar in 2015).
As the quality of digital double images continues to increase, 4D capture is widespread in high-end VFX productions. With the growing demand for believable digital characters, more studios in the game and film industry are also considering building 4D capture rigs.
Today most VFX and game studios have access to 3D scanning rigs with multiple cameras. A traditional photogrammetry device is a series of DSLR cameras located around the actor's head. The cameras are synchronized with a series of flashes or polarized light. The rig simultaneously takes a series of photos of an actor's face from different angles. The photos are then processed in photogrammetry software to create a 3D mesh with textures.
4D capture (or videogrammetry) rigs use a similar idea, except that they can capture image sequences in sync at a high frame rate (usually 30-60 FPS). This small difference leads to big changes in the hardware.
The first problem is camera synchronization. Each frame of a sequence should be recorded by all cameras simultaneously. Unfortunately, most DSLR cameras used for 3D scanning cannot guarantee such precise synchronization. A common practice is the use of image processing cameras. These small devices are specifically designed for computer vision tasks and allow streaming of synchronized image sequences, usually in an uncompressed format. Recording high frame rate image streams from dozens of cameras requires very fast drives and large storage capacities.
The second problem is the lighting. To produce high quality scans, it is important to minimize motion blur and maximize depth of field. This can be achieved with a shorter exposure time and a smaller aperture, but it also means that less light reaches the camera sensor. In practice, exposure values of approx. 1-2 ms require extremely bright lighting. When using constant light sources, the brightness can be so bright that it becomes very uncomfortable and has a negative effect on the actor's performance. A common solution to this problem is to use flash lights. The flash sources are synchronized with the camera shutters and only emit light when the shutter is open. For example, if you record 1 second of animation at 25 FPS and 1 ms exposure, the flash lighting should only emit for 25 ms, compared to 1000 ms with constant lighting. This means that 40 times less light is required. However, flash lighting comes at a cost. It can only be used efficiently with global shutter camera sensors, and these are more expensive than rolling shutter sensors.
While R3DS was working on Wrap4D, “We used our custom 4D capture device based on 18 image processing cameras that capture 3 MP images at 60 FPS. Shortly after the first tests, we switched to far superior data provided by our friends at Infinite Realities, ”commented Krovopuskov.
Infinite-Realities is a UK-based scanning company that has developed IDA-Tronic's Darkstar lighting system, a standout 4D capture device with more than 50 cameras. Your rig generates more than 5 TB of data per minute and offers extremely high quality 3D scans. Company co-founder Lee Perry-Smith was delighted with the early results from R3DS and provided a sample 4D sequence packaged with Wrap4D, so new users can experiment and immerse themselves immediately.
The AEONx Motion Scanning System developed by Infinite-Realities
Although 4D capture rigs are more expensive and difficult to build, more and more studios are developing such rigs to meet their needs.
“The next question after building a 4D capture device is how the captured data is processed. There are only a few vendors in the market that offer 4D capture and processing, and they mainly offer this as a service. Today I see more clients looking into 4D acquisition as a source of training data for machine learning algorithms or as a way to get an unlimited number of intermediate mixes for rigging. It is important that they can generate as much data as they need without paying for every second of processing. That is why we chose Wrap4D, ”says Krovopuskov.
As expected, wrapping a 4D sequence is very different from wrapping a single expression. Working with a single expression allows you to manually specify point correspondences and clean up the mesh if something goes wrong. With 4D processing, you have thousands of frames, so manual cleanup can be extremely time-consuming. The goal of the R3DS team was to develop a method that would produce clean results without the need for manual editing.
“Wrap4D takes a series of scans as input and creates a series of meshes with the desired topology as output,” adds Krovopuskov. “Unlike other frame-by-frame approaches, we process each frame independently, so we can compute all frames in parallel. It offers great scalability as we can now process a single sequence on multiple computers. "
A character that was created by Infinite Realities with a 4D recorded performance and Wrap4D.
Processing each frame independently prevents error accumulation from one frame to another. You can quickly preview the result with each frame in the sequence without having to calculate all previous frames.
On the other hand, it is more difficult to adjust frames independently than to track them frame by frame because the difference between the current frame and the neutral mesh is much larger than the difference between neighboring frames.
To make this method controllable and robust, R3DS uses three additional data sources to support the algorithm:
- Marker tracking. Markers help with extreme facial expressions where the optical flow can drop out due to blood flow and wrinkle effects.
- Lip and eyelid recognition. You can train a personalized contour detector by providing it with a set of training frames. The recognized contours help the algorithm deal with loud eyelashes and inner parts of the lips.
- Reference mixed forms. They are a series of networks that help the algorithm understand how different parts of the base network can be deformed. You can use a personalized set of the actor's FACS, or just a small set of generic hybrids.
“The tracking steps are done in our new software called R3DS Track. The idea is to bring all of the labor intensive work into 2D space so that you don't have to clean up meshes in 3D, which takes a lot more experience and time. The tracked data is then passed to Wrap4D and the rest of the process is fully automated, "explains Krovopuskov." We tried to make Wrap4D flexible and open. “The process can work with different input data, regardless of whether 360-degree scans or ear-to-ear scans are being recorded. The process can easily be adapted to different input data qualities, for example if the artist capture scans are performed with a static arrangement of dozen of cameras or only a few cameras on a helmet.
“I was overjoyed after sending Wrap4D's early beta to Infinite-Realities. The guys saw the software for the first time and came back to me two days later with a fully processed sequence, ”he adds.
An example of a 4D sequence captured by Infinite Realities and processed with Wrap4D.
Even before Wrap4D was officially released, it was used in a number of productions. One of the first projects was the real-time short film The Heretic, created in 2019 by the Unity demo team.
It was in this project that the Unity team decided to use 4D face capture for the first time. All sequences were captured by Infinite Realities and processed by R3DS. The film contains 8 shots with 4D shots. The R3DS team was responsible for network tracking, head stabilization, noise filtering and refinement of the eyelids and lips. The last two recordings consisted of more than 1000 images and were processed by R3DS in just one day.
The resulting mesh was further refined by the Unity team. Then shading, maintenance and micro details were added. A snappers rig was used to control facial expressions as needed in addition to 4D performances. Unity released the character asset from the movie so it can be explored in detail and used as a reference by Unity artists.
“We really enjoyed working on this project and it gave us a lot of ideas that were later turned into tools for Wrap4D,” said Krovopuskov.
Gawain, the main character from The Heretic Short
“I hope that Wrap4D will help more studios bring the facial animation in their games, films and film sequences to a quality level that is comparable to Hollywood blockbusters. It is amazing to see how you can create an extremely complex facial animation in a matter of days without being a rigger or animator. I'm really excited to see how far we can go with 4D data to improve head-mounted camera capture methods and facial treatments, ”Krovopuskov concludes.
Andrey Krovopuskov, CEO of R3DS (Black Tshirt) and the Russian 3D scanner team.
Note: The R3DS team has a virtual booth at SIGGRAPH 2020.