According to the documentation, the system analyzes one or more videos of a user's hand as they perform various actions. The video is processed to extract hand landmark data, which includes 21 hand-knuckle coordinates. According to Google, "the videos are never associated with a user's identity and are deleted after the verification process." Audio is also reportedly never recorded.
Source: 80.lv