A new machine learning technique developed at Georgia Tech may soon give budding fashionistas and other designers the freedom to create realistic, high-resolution visual content without relying on complicated 3-D rendering programs.
TextureGAN is the first deep image synthesis method that can realistically spread multiple textures across an object. With this new approach, users drag one or more texture patches onto a sketch — say of a handbag or a skirt — and the network texturizes the sketch to accurately account for 3-D surfaces and lighting.
Prior to this work, producing realistic images of this kind could be tedious and time-consuming, particularly for those with limited experience. And, according to the researchers, existing machine learning-based methods are not particularly good at generating high-resolution texture details.
Using a neural network to improve results
“The ‘texture fill’ operation is difficult for a deep network to learn because it not only has to propagate the color, but also has to learn how to synthesize the structure of texture across 3-D shapes,” said Wenqi Xian, computer science (CS) major and co-lead developer.
The researchers initially trained a type of neural network called a conditional generative adversarial network (GAN) on sketches and textures extracted from thousands of ground-truth photographs. In this approach, a generator neural network creates images that a discriminator neural network then evaluates for accuracy. The goal is for both to get increasingly better at their respective tasks, which leads to more realistic outputs.
To ensure that the results look as realistic as possible, researchers fine-tuned the new system to minimize pixel-to-pixel style differences between generated images and training data. But the results were not quite what the team had expected.
Producing more realistic images
“We realized that we needed a stronger constraint to preserve high-level texture in our outputs,” said Georgia Tech CS Ph.D. student Patsorn Sangkloy. “That’s when we developed an additional discriminator network that we trained on a separate texture dataset. Its only job is to be presented with two samples and ask ‘are these the same or not?’”
With its sole focus on a single question, this type of discriminator is much harder to fool. This, in turn, leads the generator to produce images that are not only realistic, but also true to the texture patch the user placed onto the sketch.
The work was presented in June at the conference on Computer Vision and Pattern Recognition (CVPR) 2018 held in Salt Lake City and is funded through National Science Foundation award 1561968. School of Interactive Computing Associate Professor James Hays advises Xian and Sangkloy. Georgia Tech is collaborating on this research with Adobe Research, University of California at Berkeley, and Argo AI.