That, very roughly speaking, is how the new technology operates. It provides a simplified map that allows the program to decide which changes to make without actually needing to process all the data in the original image. Evaluating the less-complicated version of the image drastically reduces the time and energy required to amend the photograph, which, in terms of your hand-held device, means more room for other tasks and quicker turn around time for enjoying your images.
“The key here is that we predict transformations to the pixels, instead of predicting pixels,” said Michael Gharbi of MIT, who worked with Google during an internship at the company to develop the final version. “If we train our method to recognize faces and retouch the skin differently from, say a landscape or a tree, then if we show the network enough examples of how the image should be transformed, it will be able to learn this retouch.”
The low-resolution image is used like a simplified model to predict what kinds of changes should be made to the full-resolution picture. Performing an analogous level of processing on a full-resolution image would take up to a minute after the picture is taken, Gharbi said, whereas this new method can be accomplished effectively in real time.
During the machine-learning sessions, the system was trained by examining raw and retouched photographs to predict color transformations and then taught to express those transformations in mathematical language. The system utilized a database of 5,000 photographs taken with SLR cameras, which had then been retouched using the photo-adjustment software Adobe Lightroom five times each by different human photographers to allow the system to study different approaches.
“Our architecture learns to make local, global, and content-dependent decisions to approximate the desired image transformation,” the team wrote in the abstract of a paper that they will present this week at the Siggraph digital graphics conference. “Our algorithm processes high-resolution images on a smartphone in milliseconds, provides a real-time viewfinder at 1080p resolution, and matches the quality of state-of-the-art approximation techniques on a large class of image operators.”
RELATED: The iPhone at 10: The Myth and the Reality of Apple’s Famous Device
Gharbi and MIT professor of electrical engineering and computer science Frédo Durand collaborated on the project with Jiawen Chen, Jon Barron, and Sam Hasinoff of Google.
Google reached out to Gharbi after he wrote a paper on a similar technique a few years ago, and offered him the internship.
“This technology has the potential to be very useful for real-time image enhancement on mobile platforms,” Google’s John Barron told MIT News. “Using machine learning for computational photography is an exciting prospect but is limited by the severe computational and power constraints of mobile phones. This paper may provide us with a way to sidestep these issues and produce new, compelling, real-time photographic experiences without draining your battery or giving you a laggy viewfinder experience.”