Google has released for its own researchers and developers a method of hand tracking based on mobile devices using machine learning, something Google Research engineers Valentin Bazarevsky and Fan Zhang call "a new approach to hand perception."
First launched at CVPR 2019 in June, Google's direct and real-time hand tracking methods are now available to be developed by developers - implemented in MediaPipe, an open source cross-platform framework for developers who want to build a processing pipeline to handle data perception. , such as video and audio.
This approach is said to provide high-fidelity hand and finger tracking through machine learning, which can deduce 21 3D 'key points' from one hand from just one frame.
"While the current approach relies mainly on a strong desktop environment for inference, our methods achieve real-time performance on mobile phones, and even scales to many hands," Bazarevsky and Zhang said in a blog post.
Google Research hopes its hand tracking method will trigger in the community "creative use cases, stimulate new applications and new research avenues."
Bazarevsky and Zhang explained that there are three main systems that play a role in their hand tracking method, a palm-detection model (called BlazePalm), a 'hand pointer' model that returns 3D hand keypoints with high fidelity, and sign identifiers that classify keypoint configurations into one set of separate movements.
Here are a few prominent bits, boiled from the full blog post:
The BlazePalm technique is said to achieve an average precision of 95.7% in palm detection, the researchers claim.
This model learns consistent and strong internal hand pose representations even for partially visible and self-occluded hands.
Existing pipelines support the calculation of movements from several cultures, e.g. America, Europe, and China, and various signatures including "Thumbs", closed fists, "OK", "Rock", and "Spiderman".
Google is opening a hand tracking source and gesture recognition pipeline within the MediaPipe framework, accompanied by relevant usage scenarios and end-to-end source code.
In the future, Bazarevsky and Zhang said Google Research plans to continue its hand tracking work with stronger and more stable tracking, and also hopes to increase the number of movements that can be detected reliably. In addition, they hope to also support dynamic movements, which could be an advantage for sign language translation based on machine learning and control of liquid hand movements.
Not only that, but having hand tracking on a more reliable device is a must for AR headsets to move forward; as long as the headset relies on a camera facing outward to visualize the world, understanding that the world will continue to be a problem for machine learning to overcome.