US11704853B2 - Techniques for feature-based neural rendering - Google Patents
Techniques for feature-based neural rendering Download PDFInfo
- Publication number
- US11704853B2 US11704853B2 US16/511,961 US201916511961A US11704853B2 US 11704853 B2 US11704853 B2 US 11704853B2 US 201916511961 A US201916511961 A US 201916511961A US 11704853 B2 US11704853 B2 US 11704853B2
- Authority
- US
- United States
- Prior art keywords
- machine learning
- learning model
- computer
- character
- implemented method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000009877 rendering Methods 0.000 title claims abstract description 87
- 238000000034 method Methods 0.000 title claims abstract description 85
- 230000001537 neural effect Effects 0.000 title description 2
- 238000010801 machine learning Methods 0.000 claims abstract description 136
- 238000012549 training Methods 0.000 claims description 61
- 238000013528 artificial neural network Methods 0.000 claims description 5
- 230000003190 augmentative effect Effects 0.000 claims description 5
- 230000006978 adaptation Effects 0.000 abstract description 3
- 230000015654 memory Effects 0.000 description 20
- 238000010586 diagram Methods 0.000 description 14
- 230000033001 locomotion Effects 0.000 description 13
- 230000000875 corresponding effect Effects 0.000 description 10
- 238000012545 processing Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 239000000203 mixture Substances 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000013459 approach Methods 0.000 description 6
- 238000004590 computer program Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 238000013507 mapping Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- 230000001276 controlling effect Effects 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000000047 product Substances 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 210000003414 extremity Anatomy 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000014616 translation Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
- 210000003857 wrist joint Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
- G06T19/006—Mixed reality
Definitions
- Embodiments of the present disclosure relate generally to image rendering and, more specifically, to techniques for feature-based neural rendering.
- Feature animation films generally include high-definition, high-fidelity characters. Typically, such characters are animated using high-resolution models and textures as well as complex proprietary rigs and deformation algorithms.
- Previsualization is the visualization of scenes prior to final animation or filming. For example, motion capture with a single camera may be employed to visualize a character's movement in the early stages of story authoring and storyboarding.
- Rendering engines used in real-time applications typically support only linear blend skinning and blend shapes, not the proprietary rigs and deformation algorithms used to render feature animation films. Further, real-time rendering engines may require lower-resolution models and textures.
- One embodiment of the present application sets forth a computer-implemented method for rendering an image.
- the method includes determining pose information for a first character based on a control signal, and processing the pose information using a trained machine learning model to generate a rendering of the first character.
- Another embodiment of the present application sets forth a computer-implemented method for training a machine learning model.
- the method includes receiving training data that includes a plurality of rendered images and an associated set of control points for each rendered image.
- the method further includes training the machine learning model based on a perceptual loss between one or more images generated by the machine learning model and one or more associated rendered images included in the training data.
- inventions of the present disclosure include, without limitation, a computer-readable medium including instructions for performing one or more aspects of the disclosed techniques as well as a computing device for performing one or more aspects of the disclosed techniques.
- At least one technical advantage of the disclosed techniques relative to the prior art is that, in the disclosed techniques, a machine learning model is implemented that translates control points to two-dimensional (2D) rendered images, without requiring full resolution geometry or proprietary rigs or deformers.
- computer graphics (CG) characters including high-resolution characters traditionally limited to feature animation films, can be controlled or puppeteered using 2D (or 3D) control points, such as a skeleton.
- Examples of real-time applications of techniques disclosed herein include (1) previs, in which, e.g., motion capture data from a single camera can be fed into a machine learning model to generate renderings of a character; and (2) computer-based games.
- a perceptual loss for training the machine learning model is disclosed that converges successfully more often than traditional discriminators used in adversarial learning.
- a common interface is disclosed that permits different sources of motion to be transformed to the common interface and input into a machine learning model that renders 3D characters.
- FIG. 1 illustrates a system configured to implement one or more aspects of various embodiments
- FIG. 2 illustrates an exemplary architecture of a machine learning model, according to various embodiments.
- FIG. 3 illustrates an approach for generating a training data set and then training a machine learning model, according to various embodiments
- FIG. 4 illustrates an approach for rendering a character using a trained machine learning model, according to various embodiments
- FIG. 5 sets forth a flow diagram of method steps for generating a training data set including rendered characters and associated pose information, according to various embodiments
- FIG. 6 sets forth a flow diagram of method steps for training a machine learning model to render a character based on pose information, according to various embodiments.
- FIG. 7 sets forth a flow diagram of method steps for rendering a character using a trained machine learning model, according to various embodiments.
- FIG. 1 illustrates a system 100 configured to implement one or more aspects of various embodiments.
- the system 100 includes a machine learning server 110 , a data store 120 , and a computing device 140 in communication over a network 130 , which may be a wide area network (WAN) such as the Internet, a local area network (LAN), or any other suitable network.
- WAN wide area network
- LAN local area network
- a data generating application 116 (“data generator”) executes on a processor 112 of the machine learning server 110 and is stored in a memory 114 of the machine learning server 110 .
- the processor 112 receives user input from input devices, such as a keyboard or a mouse.
- the processor 112 is the master processor of the machine learning server 110 , controlling and coordinating operations of other system components.
- the processor 112 may issue commands that control the operation of a GPU that incorporates circuitry optimized for graphics and video processing, including, for example, video output circuitry.
- the GPU may deliver pixels to a display device that may be any conventional cathode ray tube, liquid crystal display, light-emitting diode display, or the like.
- a system memory 114 of the machine learning server 110 stores content, such as software applications and data, for use by the CPU 112 and the GPU.
- the system memory 116 may be any type of memory capable of storing data and software applications, such as a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash ROM), or any suitable combination of the foregoing.
- a storage (not shown) may supplement or replace the system memory 116 .
- the storage may include any number and type of external memories that are accessible to the CPU 112 and/or the GPU.
- the storage may include a Secure Digital Card, an external Flash memory, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
- machine learning server 110 shown herein is illustrative and that variations and modifications are possible.
- the number of CPUs 112 the number of GPUs, the number of system memories 114 , and the number of applications included in the system memory 114 may be modified as desired.
- the connection topology between the various units in FIG. 1 may be modified as desired.
- any combination of the CPU 112 , the system memory 114 , and a GPU may be replaced with any type of virtual computing system, distributed computing system, or cloud computing environment, such as a public or a hybrid cloud.
- the data generator is configured to generate training data based on a three-dimensional (3D) model and animation data.
- the data generator 116 may be any suitable renderer or software toolset that renders the 3D model in various poses based on the animation data. Examples of renderers include the RenderMan® and Hyperion renderers.
- the rendered images may depict a character in poses corresponding to poses of a two-dimensional (2D) skeleton or other control points, and the data generator 116 may generate multiple renderings of the character in different poses and views.
- a control point which is also sometimes referred to as a “handle,” is a position that can be controlled to update the pose of a character.
- a skeleton is one example of a set of control points, in which the position and rotation angles of various joints in the skeleton may be adjusted or manipulated to achieve a desired character pose.
- the data generator 116 saves the images it renders, as well as related data such as masks, normal maps, and depth maps generated along with the rendered images and 2D skeleton pose information associated with the rendered images, to use as training data.
- a model training application 118 (“model trainer”) that also resides in the memory 114 and executes on the processor 112 trains a machine learning model that takes as input 2D (or 3D) pose information, such as a rendering of control points (e.g., a skeleton), and outputs a corresponding rendering of the character, as well as a mask and normal map, and optionally a depth map.
- model trainer that takes as input 2D (or 3D) pose information, such as a rendering of control points (e.g., a skeleton), and outputs a corresponding rendering of the character, as well as a mask and normal map, and optionally a depth map.
- the architecture of the machine learning model and techniques for training the same are discussed in greater detail below.
- Training data and/or trained machine learning models may be stored in the data store 120 .
- the data store 120 may include any storage device or devices, such as fixed disc drive(s), flash drive(s), optical storage, network attached storage (NAS), and/or a storage area-network (SAN).
- the machine learning server 110 may include the data store 120 .
- the data store 120 may include one or more databases.
- system 100 may include a database management system (DBMS) for accessing and storing data in the data store 120 .
- DBMS database management system
- Trained machine learning models may be deployed to applications that render images of characters using such machine learning models.
- a rendering application 146 is stored in a memory 144 , and executes on a processor 142 , of the computing device 140 .
- Components of the computing device 140 including the memory 144 and processor 142 may be similar to corresponding components of the machine learning server 110 and will not be described in detail herein for conciseness.
- the rendering application 146 may receive a control signal, such as a joystick signal or a video, that controls control points such as a 2D skeleton.
- the rendering application 146 is configured to (optionally) transform such a control signal to the format of a common interface that the rendering application 146 feeds to the trained machine learning model, which in turn outputs a rendering of a character based on the input.
- the machine learning model may also output a mask and a normal map (and optionally a depth map), which may be used to compose the rendering of the character into a scene.
- a computer graphics (CG) character can be controlled or puppeteered using a 2D (or alternatively, a 3D) skeleton.
- machine learning servers and application servers may be modified as desired. Further, the functionality included in any of the applications may be divided across any number of applications or other software that are stored and execute via any number of devices that are located in any number of physical locations.
- FIG. 2 illustrates an exemplary architecture of a machine learning model 200 , according to various embodiments. Although a particular architecture of the machine learning model 200 is shown for illustrative purposes, it should be understood that, in other embodiments, any technically feasible machine learning model may be trained and used to render images depicting characters.
- the machine learning model 200 receives a rendering of a 2D skeleton 202 and associated 3D information 204 as inputs.
- the rendering of the 2D skeleton 202 and the 3D information 204 are shown as examples, in other embodiments any suitable 2D or 3D control points and associated 3D information may be taken as input, and the input may further be defined by a common interface, as discussed in greater detail below.
- the machine learning model 200 is configured to translate the rendering of the 2D skeleton 202 (or other 2D or 3D control points) into a rendered image 240 depicting a character in the same “pose” as the 2D skeleton, as well as an associated mask 242 and normal map 244 (and optionally a depth map), which are discussed in greater detail below.
- the machine learning model 200 is a modification of a 2D U-Net architecture 201 with skip connections that incorporates 3D information 204 , when such information is available.
- U-net is an encoder-decoder architecture traditionally used for image translations.
- Experience has shown that using the rendering of the 2D skeleton 202 alone, without the 3D information 204 , admits ambiguities, as the same 2D skeleton can correspond to multiple 3D skeletons. Such ambiguities can, in turn, cause visual artifacts in the rendered image 240 , as the machine learning model 200 attempts to “average” the different 3D possibilities. Incorporating the 3D information 204 can solve this problem.
- the 3D information 204 that the machine learning model 200 receives may include volumes of occupancy, slices of positions, orientations, and/or depth, etc. Ideally, the 3D information 204 should include position and orientation information. As discussed in greater detail below, the machine learning model 200 may also be trained differently from the traditional U-Net encoder-decoder, using a perceptual loss between a generated image and a ground truth image rather than the traditional discriminator used in adversarial learning.
- the 2D U-Net architecture 201 includes a number of decreasing blocks of encoding, including blocks and 214 and 216 .
- the blocks of encoding are blocks of convolutions that each reduces the image size by, e.g., 2 , with the blocks creating a set of various versions of an input image as the image is transformed.
- the versions of the input are also referred to herein as “features.”
- a skip connection such as the skip connections 221 a and 221 b , is linked to the decoding layers, which permits the reconstruction by the decoding layers to benefit from processed information from the encoding.
- the encoding ultimately produces a sequence of 1 ⁇ 1, i.e., scalar features 222 .
- Such a sequence of 1 ⁇ 1 features 222 may then be reconstructed by the decoding layers, which as shown includes a number of blocks of decoding, including blocks 226 and 230 .
- the decoding may reuse the information from the skip connections to help in the reconstruction process.
- the sequence of 1 ⁇ 1 features 222 from the bottleneck passes through successive deconvolutions that expand the resolution of the features from, e.g., 1 ⁇ 1, to 2 ⁇ 2, to 4 ⁇ 4, etc.
- the features are further concatenated with the features from the encoding process received via the skip connections. Doing so re-uses some the features that may be required to know, e.g., the orientations of limbs, etc.
- the U-Net architecture 201 in the machine learning model 200 could include eight decreasing blocks of encoding, each of which includes a 4 ⁇ 4 convolution with stride 2 followed by a 3 ⁇ 3 convolution with stride 1 , and further followed by a non-linear activation function. Encoding begins with 64 convolutions and increases to 512 as the filter size is reduced. As described, a skip connection may also be linked to the decoding layers after each such encoding block, and the result of encoding in this case may be a sequence of 1 ⁇ 1 features of length 512 .
- the U-Net architecture 201 in the machine learning model 200 may include eight layers of encoding from a 256 ⁇ 256 resolution rendering of the 2D skeleton 202 to the sequence of 1 ⁇ 1 features 222 , and a further eight layers that decode the 1 ⁇ 1 features 222 back to the 256 ⁇ 256 rendered image 240 , the mask 242 , and the normal map 244 .
- the eight layers of the encoder may be: C64-C128-C256-C512-C512-C512-C512-C512-C512-C512
- the eight layers of the decoder may be: C512-C512-C512-C256-C128-C64.
- the 2D U-Net architecture 201 is adapted in embodiments to account for the 3D information 204 , which as described may include, e.g., volumes of occupancy, slices of positions, orientations, and/or depth, etc.
- the 3D information 204 could include volume of occupancy, with volumes occupied by a character represented by 1 and the remaining volumes represented by 0.
- the 3D information 204 could include multiple slices indicating the x, y, and z components of each joint of the 2D skeleton.
- the 3D information 204 could include a depth map indicating the depth of every pixel in the rendering of the 2D skeleton 202 .
- the 3D information 304 could include slices that provide 3D orientation information.
- the 2D U-Net architecture 201 may be informed (i.e., augmented), via skip connections, by 3D processed features.
- the 3D information may be reduced along the x and y dimensions, which also reduces the 3D information in the z dimension, by a number of encoder blocks, such as encoder blocks 206 and 208 . That is, as the x-y image dimensions are reduced by the 3D encoder blocks, the depth information is also reduced. For example, the depth slices may be reduced by 2 every time the encoding reduces the x, y dimensions by half. The result of the encoding is 1 ⁇ 1 ⁇ 1 features 210 at the bottleneck.
- the processed features are concatenated with outputs of corresponding encoding blocks of the 2D U-Net architecture 201 at, e.g., 216 and 220 . That is, the 2D U-Net architecture 201 is augmented by the 3D processed features via skip connections that concatenate the features.
- the 2D skeleton 202 and 3D information 204 pass through separate convolution layers, but skip connections are used to concatenate the 3D and 2D features. Doing so may help in the encoding and decoding process, as the 3D information may help remove ambiguities that could otherwise cause artifacts in the final rendering if only 2D information were used.
- the processed features are further passed down to the reconstruction units (e.g., the blocks 226 and 230 ) to be concatenated with other features to provide additional 3D-related features to aid the reconstruction.
- the reconstruction units e.g., the blocks 226 and 230
- 3D volumes or information do not need to be reconstructed, as the machine learning model 200 may only reconstruct the 2D rendered image 240 , mask 242 , normal map 244 , etc.
- the 3D information 204 is 3D volumetric input in the form of a volumetric occupancy map of 256 3 , or multiple 256 ⁇ 256 images, which may be, e.g., slices indicating the scalar occupancy of the joints, slices indicating the x, y, and z components of each joint of a 3D skeleton, slices that provide 3D orientation, or any other suitable 3D information, as described above.
- the encoding blocks 206 , 208 , etc. may include volumetric convolutional filters that encode and reduce in all three dimensions, yielding arrays of volumetric features.
- the first encoding block 206 may be a C 3 64 volumetric convolution encoding block that produces 128 3 ⁇ 64 features
- the second encoding block 208 may be a C 3 128 volumetric encoding block that produces 64 3 ⁇ 128 features
- the volumetric convolution filters may include the following volumetric convolutions, denoted by C 3 : C 3 64-C 3 128-C 3 256-C 3 512-C 3 512-C 3 512-C 3 512-C 3 512-C 3 512-C 3 512-C 3 512.
- each of these volumetric convolutions reduces all dimensions (x, y, and z) by 2, proceeding all the way down to a 1 ⁇ 1 ⁇ 1 ⁇ 512 (i.e., 1 3 ⁇ 512) sequence of features.
- the features output by the volumetric convolutions may be concatenated with feature outputs of corresponding encoding modules (e.g., the encoding blocks 214 , 216 , etc.) of the 2D U-Net architecture 201 at symmetric resolutions (e.g., 128 3 ⁇ 64 with corresponding 128 2 ⁇ 64).
- Some embodiments may include skip connections to the last reconstruction layers where the final rendering is decoded.
- the 1 3 ⁇ 512 sequence of features (corresponding to the sequence of 1 ⁇ 1 ⁇ 1 features 210 ) that results from encoding the 3D information may be concatenated with a 1 ⁇ 1 ⁇ 512 (i.e., 1 2 ⁇ 512) sequence of features (corresponding to the sequence of 1 ⁇ 1 features 222 ) generated by the encoder of the 2D U-Net architecture 201 during input skeleton image encoding, producing a 1 3 ⁇ 1024 sequence of features as the output of the encoding.
- a 1 ⁇ 1 ⁇ 512 i.e., 1 2 ⁇ 512 sequence of features (corresponding to the sequence of 1 ⁇ 1 features 222 ) generated by the encoder of the 2D U-Net architecture 201 during input skeleton image encoding, producing a 1 3 ⁇ 1024 sequence of features as the output of the encoding.
- decoding blocks may apply successive deconvolutions to the encoded 1 3 ⁇ 1024 sequence of features, while reusing information from skip connections to help in the reconstruction process, as described above.
- volumetric features may be concatenated with planer features during the decoding.
- the decoding block 230 may be a deconvolution filter that yields 128 2 ⁇ 64 features that are concatenated with 128 3 ⁇ 64 volumetric features from the encoding block 206 , yielding 128 2 ⁇ 8256 features.
- the 3D volumes or information is not re-constructed in some embodiments. Rather, the decoding may only reconstruct the 2D rendered image, mask, normal maps, and (optionally) depth.
- the machine learning model 200 outputs the rendered image 240 depicting a character, from which the rendering of a 2D skeleton 202 was translated, as well as the associated mask 242 and normal map 244 (as well as an optional depth map).
- the mask 242 indicates whether pixels of the rendered image 240 belong to a background or to the character depicted therein.
- the mask 242 could include pixels whose values are either 0, indicating the background, or 1, indicating the character.
- the rendering application 146 may use the mask 242 to overlay the character depicted in the rendered image 240 onto different backgrounds.
- the normal map 244 indicates surface normals in the rendered image 240 .
- the normal map 244 could include a respective vector for each pixel of the character indicating a surface normal direction. It should be understood that the rendering application 146 may use such surface normals to re-light the character depicted in the rendered image 240 in different environments.
- the machine learning model 200 may also output depths of pixels in the rendered image 240 in a depth map. For example, in the context of games, depth information may be used to determine collisions (e.g., 2D collisions) between the rendered character and other visible objects in a 3D scene.
- collisions e.g., 2D collisions
- synergies within the network may be created, permitting more accurate predictions of the rendered image 240 , the mask 242 , the normal map 244 , and/or the depth map.
- FIG. 3 illustrates an approach for generating a training data set and then training a machine learning model, such as the machine learning model 200 described above with respect to FIG. 2 , according to various embodiments.
- the data generator 116 receives as inputs a 3D model 310 , which is associated with 3D control points, and a collection of 3D motions 320 .
- the 3D model 310 could be a high-resolution model used in feature animation films.
- the 3D model 310 does not need to be such as a high-resolution model.
- the data generator 116 combines the 3D model 310 and 3D motions 320 by rendering the character represented by the 3D model 310 in different views and poses, as the 3D model 310 is animated according to the 3D motions 320 .
- the data generator 116 may be any suitable renderer, or software toolset, capable of performing such rendering.
- the data generator 116 outputs the rendered images 330 , as well as associated masks 370 and normal maps 380 .
- depth information such as depth maps, may also be output and saved.
- the training data set should include extreme cases and a large variety of poses that covers well the space of poses.
- the 3D model 310 is posed using the 3D motions 320 , the associated control points are also deformed, as the control points may be parameterized by the surface mesh of the 3D model 310 . At runtime, such posing may produce 3D poses in the proportions of a user, as discussed in greater detail below.
- the data generator 116 also saves the 3D control points 340 after such a deformation, as well as projected 2D positions 350 of those control points 340 and joint orientations 360 .
- the data generator 116 may go through a database of 3D poses to deform and render the character, while saving the 3D control points 340 and the 2D projected positions 350 .
- the masks 370 , normal maps 380 , and depth information may be saved as well, which can all be learned by a machine learning model and predicted as a function of the control points.
- the joint orientations 360 are rotational values.
- a wrist joint may store the orientation of the hand, which may be represented as, e.g., angles, matrices (normalized directional vectors), or normalized quaternions.
- slices may be output by the data generator 116 , with each slice being an angle component of the orientation.
- the 3D character mesh of the 3D model 310 may be used to parameterize the position and orientation of control points.
- a common interface including 2D or 3D control points, or a skeleton, may be defined.
- Such a common interface is used to control the trained machine learning model, and various control signals (e.g., a 2D skeleton generated by a pose predictor based on a video, a joystick signal, etc.) may be transformed to the common interface and input into the machine learning model.
- various control signals e.g., a 2D skeleton generated by a pose predictor based on a video, a joystick signal, etc.
- a weighted average of binding triangle meshes is assumed.
- a common interface may be defined as a set of control parameters that parameterize the shape of the character.
- the control parameters in a common interface may include 2D control points, but may also include 3D orientation points (with 3 positions and 3 angles). Further, the control points may be dense (e.g., a mesh) or sparse (e.g., a skeleton).
- the common interface e.g., 2D points, 3D points, or skeleton
- image-based pose predictors can be more successfully trained with skeleton data that may include points more strongly correlated to body pixels in an image.
- the first step of the data generation process may include defining the common interface. For example, the 3D skeleton of a character may need to be parameterized by the shape of the character, such that labeled data can be produced indicating those proportions.
- the model trainer 118 takes as inputs the rendered images 330 , 3D control points 340 , projected 2D positions 350 , joint orientations 360 , masks 370 , and normal maps 380 .
- the model trainer 118 may also take as inputs depth maps. Using such inputs as a set of training data, the model trainer 118 learns a mapping between control points and rendered images of the character. The mapping is shown as a trained machine learning model 390 , and such a mapping allows the image-based 3D character to be parameterized by the control points.
- the trained model 390 could have the architecture of the machine learning model 200 described above with respect to FIG. 2 .
- the model trainer 118 trains the machine learning model 390 using adversarial learning and a perceptual loss between images generated by the machine learning model 390 and ground truth images (e.g., the rendered images 330 ). This is in contrast to the traditional discriminator used in adversarial learning to train traditional U-Net architectures, which experience has shown has difficulty converging successfully.
- the perceptual loss in some embodiments may be defined based on a number of layers of a pre-trained deep neural network that is trained for classification.
- the pre-trained network is used to transform the predicted and ground truth images, with the model trainer 118 essentially attempting to make the predicted and ground truth images close to one another in the “eyes” of the pre-trained network whose layers are used to filter those images.
- the model trainer 118 essentially attempting to make the predicted and ground truth images close to one another in the “eyes” of the pre-trained network whose layers are used to filter those images.
- confining the loss to the lower-resolution filtered images may help achieve convergence during training.
- the perceptual loss could be the L1 norm of the VGG(M*I) between the predicted and the ground truth image, where M is the mask, I is the image, VGG is the first five layers of a pre-trained VGG (Visual Geometry Group) convolutional neural network, and the L1 norm (also sometimes referred to as the Manhattan Distance or Taxicab norm) between vectors is defined as the sum of the lengths of projections of the line segment between the points onto the coordinate axes.
- VGG convolutional neural network is used herein as an illustrative example, alternative embodiments may employ one or more layers of other convolutional neural networks or machine learning models.
- the model trainer 118 may train the machine learning model 390 using a loss that is simply the L1 norm between the prediction and ground truth for the normal map and mask that the machine learning model 390 is also trained to output.
- the training process may use a subset of the training data to train the machine learning model 390 , which is then evaluated using another subset of the training data. For example, a majority of the training data may be used to train the machine learning model, and the remainder of the training data used to evaluate the trained model. Evaluation of trained machine learning models may include validating that the trained models perform sufficiently well (e.g., less than some desired error).
- FIG. 4 illustrates an approach for rendering a character using the trained machine learning model 390 , according to embodiments.
- the rendering application 146 or a user may control a 3D character using a 2D skeleton or other control points.
- the rendering application 146 is configured to feed, into the machine learning model 390 , a set of such control points, shown as a rendered skeleton 430 derived from an image 420 , and associated 3D information 410 .
- the rendering application 146 may first convert a received control signal to a common interface and input the converted data into the machine learning model 390 .
- the rendering application 146 could determine a 2D or 3D skeleton from a video using a well-known pose prediction technique. Then, the rendering application 146 could re-target the 3D skeleton into the common 3D skeleton by copying joint angles to the common interface, which is then fed into the machine learning model 390 . In the case of 2D skeletons, heuristics based on body proportions may adjust the user's skeleton to the proportions of the common interface, which may then be fed into the machine learning model 390 .
- the machine learning model 390 outputs a rendered image, shown as the rendering of the 3D character 440 , along with a mask 442 and a normal map 444 , which the machine learning model 390 generates based on the skeleton 430 and the associated 3D information 410 .
- the machine learning model 390 may also (optionally) output a depth map. Rendered images output by the machine learning model 390 may differ from the rendered images 330 used during training in some cases. Typically, if new data points are in between training data points on a manifold, then a trained machine model such as the machine learning model 390 may be able to generalize to the new data points.
- the machine learning model may be unable to extrapolate.
- the training data set should include extreme cases and a large variety of poses that covers well the space of poses. Even in the worst case, the machine learning model should be able to find a rendering close by, i.e., a nearest neighbor if the machine learning model is unable to generalize.
- the rendering application 146 has used the mask 442 to compose the rendered character into a scene in a rendering 450 .
- the rendered character could be added to an augmented reality (AR) environment.
- the rendering application 146 may perform some re-lighting by sampling the normal map 444 and computing a product of the sampled normal map with light directions in the new environment.
- the machine learning model 339 may also output depth, and the rendering application 146 could determine collisions between the rendered character and other objects based on such depth when producing the rendering 450 .
- the machine learning model 339 may render the character with occlusions to support visual interaction with scene objects in a game. As a result, 2D single camera motion capture can be used to produce the rendering 450 of the character overlaid in the scene.
- FIG. 5 sets forth a flow diagram of method steps for generating a training data set including rendered characters and associated pose information, according to various embodiments.
- the method steps are described in conjunction with the system of FIG. 1 , persons of ordinary skill in the art will understand that any system configured to perform the method steps, in any order, is within the scope of the present disclosure.
- a method 500 begins at step 510 , where the data generator 116 receives a 3D model and animation data.
- the 3D model 310 could be a high-resolution model used in a feature animation film.
- the data generator 116 poses the 3D model based on the animation data, and then, at step 530 , the data generator 116 renders the posed 3D model. Any suitable 3D rendering technique may be employed to render the posed model.
- control points associated with the 3D are also deformed when the 3D model is posed, as the control points may be parameterized by the surface mesh of the 3D model.
- the data generator 116 saves, for each pose of the 3D model, rendered image(s), deformed 3D control points, a 2D projection of control points, and orientations of joints, a mask, and a normal map.
- the data generator 116 may render the character represented by the 3D model 310 in different views and poses, and the data generator 116 may save such renderings along with other information typically generated by renderers, such as a mask and normal map, as well as the 3D (and projected 2D) control points, and orientation of joints, that are deformed along with the posed 3D model.
- a depth map may also be generated and saved in some embodiments.
- FIG. 6 sets forth a flow diagram of method steps for training a machine learning model to render a character based on pose information.
- a method 600 begins at step 610 , where the model trainer 118 receives a training data set.
- the training data may include data output by the data generator 116 , including a character rendered in different poses and views, together with a mask of the character and control point (e.g., 2D skeleton pose) information.
- the data generator 116 may generate rendered images of the character and associated deformed 3D control points, 2D projections of control points, orientations of joints, masks, normal maps, and (optionally) depth maps in some embodiments.
- the model trainer 118 trains a machine learning model based on a perceptual loss between images that are generated by the machine learning model and ground truth images in the training data set.
- the model trainer 118 may feed predicted and ground truth images into a pre-trained deep neural network and compute the perceptual loss as a L1 norm between features output by a number of layers of the pre-trained network.
- the model trainer 118 may train the machine learning model using a loss that is simply the L1 norm between the prediction and ground truth for a normal map and a mask (and an optional depth map) that the machine learning model is also trained to output.
- FIG. 7 sets forth a flow diagram of method steps for rendering a character, according to various embodiments. Although the method steps are described in conjunction with the system of FIG. 1 , persons of ordinary skill in the art will understand that any system configured to perform the method steps, in any order, is within the scope of the present disclosure.
- a method 700 begins at step 710 , where the rendering application 146 receives a control signal.
- a control signal Any technically feasible signal may be received, such as a video including frames from which a posed skeleton may be extracted, a signal from a joystick used to control a skeleton, etc.
- a user may perform in front of a camera, and estimates could be made of 2D and/or 3D skeletons from a video captured by the camera.
- the game engine may control the 3D or 2D skeleton by blending animation clips.
- animation clips there may be a predefined 3D animation clip for walking forward and another clip for walking to the right, but to turn at a different rate, such as between the full turn right and walking forward, the game engine may blend (interpolate) the forward and right turn clips rather than storing large amounts of animation clips for each possible turning direction. That is, the rendering application 146 may blend and mix animation clips to span a larger range of possible motions with fewer clips.
- the rendering application 146 determines 2D control points based on the control signal. As described, determining the 2D control points may include transforming the control signal into a common interface for controlling the character using predefined transformation functions. Returning to the example of 3D skeleton poses in the previs case, the rendering application 146 could determine the 2D control points for input into a trained machine learning model by, e.g., rendering the 3D skeleton to a 2D image. In alternative embodiments, the machine learning model may be trained to take as input a 3D skeleton, in which case the 3D skeleton would not need to be projected to 2D.
- the rendering application 146 processes the 2D control points using a trained machine learning model to generate a rendering of a character, a mask, and a normal map.
- a machine learning model such as the adaptation of the U-Net architecture discussed above with respect to FIG. 2 , may be trained to output such a rendering, mask, and normal map.
- the machine learning model may also output a depth map.
- the rendering application 146 (optionally) composes the character into a scene.
- the rendering application 146 could multiply the mask with the rendered image and place the result in different backgrounds, such as in an AR environment.
- the rendering application 146 may perform re-lighting by sampling the normal map and computing a product of the sampled normal map with light directions in a new environment.
- the rendering application 146 could determine collisions between the rendered character and other objects based on depth output by the machine learning model.
- the machine learning model may also render the character with occlusions to support visual interaction with scene objects.
- a machine learning model that maps control data, such as renderings of skeletons, and associated 3D information to 2D renderings of a character.
- the machine learning model may be an adaptation of the U-Net architecture that accounts for 3D information and is trained using a perceptual loss between images generated by the machine learning model and ground truth images. Once trained, the machine learning model may be used to animate a character, such as in the context of previs or a video game, based on control of associated control points.
- At least one technical advantage of the disclosed techniques relative to the prior art is that, in the disclosed techniques, a machine learning model is implemented that translates control points to 2D rendered images, without requiring full resolution geometry or proprietary rigs or deformers.
- computer graphics (CG) characters including high-resolution characters traditionally limited to feature animation films, can be controlled or puppeteered using 2D (or 3D) control points, such as a skeleton.
- Examples of real-time applications of techniques disclosed herein include (1) previs, in which, e.g., motion capture data from a single camera can be fed into a machine learning model to generate renderings of a character; and (2) computer-based games.
- a perceptual loss for training the machine learning model is disclosed that converges successfully more often than traditional discriminators used in adversarial learning.
- a common interface is disclosed that permits different sources of motion to be transformed to the common interface and input into a machine learning model that renders 3D characters.
- a computer-implemented method for rendering an image that includes at least one character comprises: determining pose information for a first character based on a control signal; and processing the pose information using a trained machine learning model to generate a rendering of the first character.
- determining the pose information includes rendering a skeleton.
- processing the pose information further comprises generating at least one of a mask, a normal map, and a depth map associated with the rendering of the first character.
- control signal comprises a joystick signal or a video signal.
- a computer-implemented method for training a machine learning model comprises: receiving training data that includes a plurality of rendered images and an associated set of control points for each rendered image; and training the machine learning model based on a perceptual loss between one or more images generated by the machine learning model and one or more associated rendered images included in the training data.
- each of the associated sets of control points includes a respective rendering of a skeleton.
- the perceptual loss is defined as an L1 norm C(M*I) between the images generated by the machine learning model and the corresponding rendered images in the training data, wherein M is a mask, I is an image, and C is a plurality of layers of a pre-trained convolutional neural network.
- training the machine learning model comprises performing one or more adversarial learning operations.
- training the machine learning model is further based on losses defined as L1 norms between normal maps and masks generated by the machine learning model and normal maps and masks included in the training data.
- training data is generated by: receiving a three-dimensional (3D) model and animation data; posing the 3D model based on the animation data; and rendering the posed 3D model.
- a computer-readable storage medium including instructions that, when executed by a processing unit, cause the processing unit to train a machine learning model by performing steps comprising: receiving training data that includes a plurality of rendered images and an associated set of control points for each rendered image; and training the machine learning model based on a perceptual loss between one or more images generated by the machine learning model and one or more associated rendered images included in the training data.
- each of the associated sets of control points includes a respective rendering of a skeleton.
- aspects of the present embodiments may be embodied as a system, method or computer program product. Accordingly, aspects of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “module” or “system.” Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
- the computer readable medium may be a computer readable signal medium or a computer readable storage medium.
- a computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
- a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Mathematical Optimization (AREA)
- Artificial Intelligence (AREA)
- Pure & Applied Mathematics (AREA)
- Evolutionary Computation (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Algebra (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Computer Graphics (AREA)
- Computer Hardware Design (AREA)
- Processing Or Creating Images (AREA)
Abstract
Description
Claims (19)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/511,961 US11704853B2 (en) | 2019-07-15 | 2019-07-15 | Techniques for feature-based neural rendering |
EP20185752.1A EP3767592A1 (en) | 2019-07-15 | 2020-07-14 | Techniques for feature-based neural rendering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/511,961 US11704853B2 (en) | 2019-07-15 | 2019-07-15 | Techniques for feature-based neural rendering |
Publications (2)
Publication Number | Publication Date |
---|---|
US20210019928A1 US20210019928A1 (en) | 2021-01-21 |
US11704853B2 true US11704853B2 (en) | 2023-07-18 |
Family
ID=71620173
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/511,961 Active US11704853B2 (en) | 2019-07-15 | 2019-07-15 | Techniques for feature-based neural rendering |
Country Status (2)
Country | Link |
---|---|
US (1) | US11704853B2 (en) |
EP (1) | EP3767592A1 (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3671660A1 (en) * | 2018-12-20 | 2020-06-24 | Dassault Systèmes | Designing a 3d modeled object via user-interaction |
CN115298708A (en) * | 2020-03-30 | 2022-11-04 | 上海科技大学 | Multi-view Neural Human Rendering |
CN112907421B (en) * | 2021-02-24 | 2023-08-01 | 中煤科工集团重庆智慧城市科技研究院有限公司 | Business scene acquisition system and method based on spatial analysis |
CN113570673B (en) * | 2021-09-24 | 2021-12-17 | 北京影创信息科技有限公司 | Rendering method of three-dimensional human body and object and application method thereof |
US12147496B1 (en) * | 2021-11-03 | 2024-11-19 | Amazon Technologies, Inc. | Automatic generation of training data for instance segmentation algorithms |
CN115205707B (en) * | 2022-09-13 | 2022-12-23 | 阿里巴巴(中国)有限公司 | Sample image generation method, storage medium, and electronic device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050203380A1 (en) * | 2004-02-17 | 2005-09-15 | Frank Sauer | System and method for augmented reality navigation in a medical intervention procedure |
US20120223940A1 (en) * | 2011-03-01 | 2012-09-06 | Disney Enterprises, Inc. | Sprite strip renderer |
US20130342527A1 (en) * | 2012-06-21 | 2013-12-26 | Microsoft Corporation | Avatar construction using depth camera |
US20150206341A1 (en) * | 2014-01-23 | 2015-07-23 | Max-Planck-Gesellschaft Zur Foerderung Der Wissenschaften E.V. | Method for providing a three dimensional body model |
US20190130562A1 (en) * | 2017-11-02 | 2019-05-02 | Siemens Healthcare Gmbh | 3D Anisotropic Hybrid Network: Transferring Convolutional Features from 2D Images to 3D Anisotropic Volumes |
WO2019090213A1 (en) | 2017-11-03 | 2019-05-09 | Siemens Aktiengesellschaft | Segmenting and denoising depth images for recognition applications using generative adversarial neural networks |
-
2019
- 2019-07-15 US US16/511,961 patent/US11704853B2/en active Active
-
2020
- 2020-07-14 EP EP20185752.1A patent/EP3767592A1/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050203380A1 (en) * | 2004-02-17 | 2005-09-15 | Frank Sauer | System and method for augmented reality navigation in a medical intervention procedure |
US20120223940A1 (en) * | 2011-03-01 | 2012-09-06 | Disney Enterprises, Inc. | Sprite strip renderer |
US20130342527A1 (en) * | 2012-06-21 | 2013-12-26 | Microsoft Corporation | Avatar construction using depth camera |
US20150206341A1 (en) * | 2014-01-23 | 2015-07-23 | Max-Planck-Gesellschaft Zur Foerderung Der Wissenschaften E.V. | Method for providing a three dimensional body model |
US20190130562A1 (en) * | 2017-11-02 | 2019-05-02 | Siemens Healthcare Gmbh | 3D Anisotropic Hybrid Network: Transferring Convolutional Features from 2D Images to 3D Anisotropic Volumes |
WO2019090213A1 (en) | 2017-11-03 | 2019-05-09 | Siemens Aktiengesellschaft | Segmenting and denoising depth images for recognition applications using generative adversarial neural networks |
Non-Patent Citations (14)
Title |
---|
ANDREA K; CHRISTINE LEITNER; HERBERT LEITOLD; ALEXANDER PROSSER: "Advances in Databases and Information Systems", vol. 11131 Chap.20, 23 January 2019, SPRINGER INTERNATIONAL PUBLISHING , Cham , ISBN: 978-3-319-10403-4, article HUDON MATIS; GROGAN MAIRéAD; PAGéS RAFAEL; SMOLIć ALJOšA: "Deep Normal Estimation for Automatic Shading of Hand-Drawn Characters", pages: 246 - 262, XP047501164, 032682, DOI: 10.1007/978-3-030-11015-4_20 |
Chan, Caroline et al., "Everybody Dance Now", Cornell University, Aug. 22, 2018, pp. 1-12. https://arxiv.org/abs/1808.07371. |
Extended European Search Report for application No. 20185752.1 dated Dec. 21, 2020. |
G\"UL VAROL; JAVIER ROMERO; XAVIER MARTIN; NAUREEN MAHMOOD; MICHAEL J. BLACK; IVAN LAPTEV; CORDELIA SCHMID: "Learning from Synthetic Humans", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 5 January 2017 (2017-01-05), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081305231, DOI: 10.1109/CVPR.2017.492 |
Hudon et al., "Deep Normal Estimation for Automatic Shading of Hand-Drawn Characters", Advances in Databases and Information Systems; Lecture Notes in Computer Science, Springer International Publishing, Cham, XP047501164, ISBN: 978-3-319-10403-4, Jan. 23, 2019, pp. 246-262. |
LINGJIE LIU; WEIPENG XU; MICHAEL ZOLLHOEFER; HYEONGWOO KIM; FLORIAN BERNARD; MARC HABERMANN; WENPING WANG; CHRISTIAN THEOBALT: "Neural Animation and Reenactment of Human Actor Videos", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 11 September 2018 (2018-09-11), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081081387 |
Liu et al., "Neural Animation and Reenactment of Human Actor Videos", arxiv:1809.03658, XP081081387, vol. 1, No. 1, Article 282, Sep. 11, 2018, pp. 282:1-282:13. |
Nestmeyer et al., "Structural Decompositions for End-to-End Relighting", arxiv.org, arxiv:1906.03355, XP081374721, Jun. 8, 2019, 17 pages. |
Partial European Search Report for application No. 20185752.1 dated Oct. 12, 2020. |
Sengupta et al., "SfSNet: Learning Shape, Reflectance and Illuminance of Faces ‘in the Wild’", 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, XP033473546, DOI: 10.1109/CVPR.2018.00659, Jun. 18, 2018, pp. 6296-6305. |
SENGUPTA SOUMYADIP; KANAZAWA ANGJOO; CASTILLO CARLOS D.; JACOBS DAVID W.: "SfSNet: Learning Shape, Reflectance and Illuminance of Faces 'in the Wild'", 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, IEEE, 18 June 2018 (2018-06-18), pages 6296 - 6305, XP033473546, DOI: 10.1109/CVPR.2018.00659 |
THOMAS NESTMEYER; IAIN MATTHEWS; JEAN-FRAN\C{C}OIS LALONDE; ANDREAS M. LEHRMANN: "Structural Decompositions for End-to-End Relighting", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 8 June 2019 (2019-06-08), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081374721 |
Varol et al., "Learning from Synthetic Humans", XP081305231, DOI: 10.1109/CVPR.2017.492, arxiv:1701.01370, Jan. 5, 2017, pp. 1-10. |
Zakharov et al., "Textured Neural Avatars" arXiv, XP055731671, Retrieved from the Internet: URL:https://www.researchgate.net/profile/Karim Iskakov/publication/333259973 Texture d Neural Avatars/links/5ce8f16092851c4eabb-c5576/Textured-Neural-Avatars.pdf, May 21, 2019, pp. 1-12. |
Also Published As
Publication number | Publication date |
---|---|
EP3767592A1 (en) | 2021-01-20 |
US20210019928A1 (en) | 2021-01-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11704853B2 (en) | Techniques for feature-based neural rendering | |
US11308676B2 (en) | Single image-based real-time body animation | |
US11398059B2 (en) | Processing 3D video content | |
US11257276B2 (en) | Appearance synthesis of digital faces | |
CN111986307A (en) | 3D object reconstruction using photometric grid representation | |
CN114863038B (en) | Real-time dynamic free visual angle synthesis method and device based on explicit geometric deformation | |
CN100530243C (en) | Shot rendering method and apparatus | |
CN116071278A (en) | UAV aerial image synthesis method, system, computer equipment and storage medium | |
CN115298708A (en) | Multi-view Neural Human Rendering | |
WO2023015409A1 (en) | Object pose detection method and apparatus, computer device, and storage medium | |
CN115428027A (en) | Neural opaque point cloud | |
CN116228962A (en) | Large scene neuroview synthesis | |
KR20230036543A (en) | Method and apparatus for reconstructing 3d scene with monocular rgb image based on deep learning | |
US11217002B2 (en) | Method for efficiently computing and specifying level sets for use in computer simulations, computer graphics and other purposes | |
CN117853686A (en) | Free text guided arbitrary track three-dimensional scene construction and roaming video generation method and system | |
Eisert et al. | Volumetric video–acquisition, interaction, streaming and rendering | |
Liao et al. | Self-supervised random mask attention GAN in tackling pose-invariant face recognition | |
US20230360327A1 (en) | Generating three-dimensional representations for digital objects utilizing mesh-based thin volumes | |
US11995749B2 (en) | Rig-space neural rendering of digital assets | |
KR101566459B1 (en) | Concave surface modeling in image-based visual hull | |
CN117911609A (en) | A three-dimensional hand modeling method based on neural radiation field | |
KR102594258B1 (en) | Method and apparatus for virtually moving real object in augmetnted reality | |
WO2023217867A1 (en) | Variable resolution variable frame rate video coding using neural networks | |
Huynh et al. | A framework for cost-effective communication system for 3D data streaming and real-time 3D reconstruction | |
CN119006663B (en) | Digital human head portrait generation method based on real-time audio drive |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: ETH ZUERICH, SWITZERLAND Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BORER, DOMINIK TOBIAS;GUAY, MARTIN;BUHMANN, JAKOB JOACHIM;AND OTHERS;SIGNING DATES FROM 20190704 TO 20190715;REEL/FRAME:050043/0950 Owner name: DISNEY ENTERPRISES, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:THE WALT DISNEY COMPANY (SWITZERLAND) GMBH;REEL/FRAME:050043/0978 Effective date: 20190715 Owner name: THE WALT DISNEY COMPANY (SWITZERLAND) GMBH, SWITZERLAND Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BORER, DOMINIK TOBIAS;GUAY, MARTIN;BUHMANN, JAKOB JOACHIM;AND OTHERS;SIGNING DATES FROM 20190704 TO 20190715;REEL/FRAME:050043/0950 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCV | Information on status: appeal procedure |
Free format text: NOTICE OF APPEAL FILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |