What if you could manipulate the facial features of a historical figure, a politician, or a CEO realistically and convincingly using nothing but a webcam and an illustrated or photographic still image? A tool called MarioNETte that was recently developed by researchers at Seoul-based Hyperconnect accomplishes this, thanks in part to cutting-edge machine learning techniques. The researchers claim it outperforms all baselines even where there’s “significant” mismatch between the face to be manipulated and the person doing the manipulating.
MarioNETte is technically a face reenactment tool, in that it aims to synthesize a reenacted face animated by the movement of a person (a “driver”) while preserving the face’s (target’s) appearance. It’s not a new idea, but previous approaches either (1) required a few minutes of training data and could only reenact predefined targets, or (2) would distort the target’s features when dealing with large poses. MarioNETte advances the state of the art by incorporating three novel components: an image attention block, a target feature alignment, and a landmark transformer.
The attention block allows the model to attend to relevant positions of mapped physical features, while the target feature alignment mitigates artifacts, warping, and distortion. As for the landmark transformer bit, it adapts the geometry of the driver’s poses to that of the target without the need for labeled data, in contrast to approaches that require human-annotated examples.