How Standard Is FaceTime on the iPhone? Packet Capture Verification

Apple CEO Steve Jobs announced FaceTime video conferencing for the iPhone 4 during his keynote at the Apple World Wide Developer Conference (WWDC) in June. FaceTime takes advantage of new frameworks that are part of iOS 4 in addition to the new hardware capabilities of the iPhone 4 including the front facing camera, the high-resolution Retina display, and the increased speed of the A4 processor. Jobs stated that FaceTime based on existing standards and that FaceTime itself would be published as an open standard. Packet captures of FaceTime sessions give a clearer picture of which standards Apple employs and how Apple implements these standards.

Jobs’ demonstration showed a seamless video conferencing experience that could be initiated directly as a video chat or by upgrading a traditional voice call to video. FaceTime currently only operates over a WiFi connection on an iPhone 4 and not on earlier devices. Jobs said that Apple was working on carrier agreements to allow FaceTime to work over a 3G connection. You can read a transcript of the 2010 WWDC Keynote at Macworld, view a gallery WWDC 2010 keynote images at The Mac Observer or watch the official video of the Apple WWDC 2010 Keynote Address.

Jobs stated that FaceTime was based on H.264, AAC, SIP, STUN, TURN, ICE, RTP, and SRTP standards. Stephen Strowes has a nice description of the standards and how they interact in his post iPhone4, Facetime, and open standards. Even though Jobs explicitly listed the standards on a slide during the presentation, I could find no official mention of the standards on the Apple web site or a record of a submission of FaceTime to a standards body. Apple will certainly publish all the details in time, however I wanted to see what I could verify at the present time.

I assumed that observing a FaceTime session with a packet sniffer would provide all the information needed. Unfortunately my iPhone 3GS is not capable of running FaceTime, so I looked for others who had analyzed packet captures of FaceTime sessions with an iPhone 4.

Arjun Roychowdhury and FryGuy both posted quick analyses on June 25th. Both primarily looked at the voice portion of the call setup. In Facetime on Iphone 4: Vanilla unencrypted STUN and SIP, Roychowdhury used Wireshark to find that Apple implemented the voice setup portion using standard SIP mechanisms. He posted further clarifications in the comments. FryGuy published similar findings in iPhone 4 and FaceTime Packet Capture using a Cisco ASA capture filter.

Joshua Wright’s ongoing series in the Packetstan blog is far and away the most detailed analysis of the FaceTime protocol. Wright nicely describes his use of Wireshark, videosnarf, and openssl so that others can replicate his experiments. In Face Time (part 1: Introduction), Wright provides a quick characterization of a FaceTime session, which traffic is delivered of TCP vs. UDP and which portions are encrypted. In Face Time (part 2: SIP and Data Streams), he dissects the SIP portion of the session with Wireshark and uses videosnarf to analyze the RTP media streams. Wright found that FaceTime extends SIP MESSAGE authentication in non-standard way and that neither the audio nor the video portions of the FaceTime sessions are encrypted. Finally, in Face Time (part 3: Call Connection Initialization), Wright finds that FaceTime authentication uses Jabber/XMPP with SSL on TCP port 5223 that connects to a Jabber server at Apple with client certificates. The certificate-based authentication means that Apple will be able to control which devices are able to connect to its own servers. Wright speculates that the certificate could be extracted from a jailbroken iPhone and used with other clients. Joshua’s own blog, Will Hack For SUSHI, is sporadic, but excellent.

Comments

thanx for info

nice info, thanx!) but as for me, i usually use this tool for FaceTime calls recording http://www.imcapture.com/IMCapture-for-FaceTime/, it’s simple and nice!)