iChat AV Bittage

Back in 'Deconstructing H.264/AVC' I spent a bunch of time going through MPEG-4 and leading into the upcoming H.264/AVC codec. I got a lot of feedback, questions mostly, as people seem to be under the impression that I may or may not read the comments. :)

I've also been greasing the wheels of whispers with alcoholic beverages, trying to garner more info about what we'll be seeing with Tiger; keep in mind this is pretty much taken from bits I've been able to gather from here and there and then tried to form a coherent picture from. It's amazing what you can infer from scribbling on napkins while someone is using the loo.

So take the following answers to your questions with a salt mine, but it should be reasonable.

Why didn't Apple support MPEG-4 ASP (Advanced Simple Profile)?

In the post I basically said I didn't know why, just that for whatever reason they chose not to. I still don't know for sure why, but more than one person pointed to the Quicktime architecture as a reason.

The idea seems to be that Quicktime has gotten incredibly crufty over the ages, and hasn't really had a serious re-rewrite with more capabilities in mind; hence it has real problems even supporting B frames. There's lots more to this, and while I'm aware of a lot of the cruftiness in general, and that Quicktime is fast becoming a choke-point for more advanced functionality, it's not something I have the background in to comment on sanely nor the time to dig into at the moment.

So I'm not going to go into great detail just yet... much greasing of the whispers to do there. But basically deep and wide changes to Quicktime are having to occur to allow it to handle H.264/AVC in a sane way. It's good to remember that Quicktime is very much Carbon-based, and Apple has to not only contend with the OSX version but the fact that it's cross-platform makes things even more complicated.

Whether this means that the changes will bring with them MPEG-4 ASP support is unknown at this point, but it would really make my day. More coming on that after I get some other upcoming things off my plate.

What kind of hardware will I need for H.264/AVC?

I answered this one as best I could in the comments of that original post, so I'm going to stick to that for now. Minus the examples I gave, and am going to give further on regarding iChat AV, much is going to depend on what you really are trying to do with the codec.

Will I really get the quality improvements with H.264/AVC?

I thought this was covered back in the original post, but for a little more detail, yes, but you basically have a choice between much better quality or larger sizes.

As mentioned above, with the new codec you can just about get CIF-sized video for the same bitrate needed for QCIF if you were using H.263. So if you choose to still use QCIF-sized windows, quality will be greatly improved. Alternately you could just use a larger sized CIF windows and you'll be using the same bandwidth.

The rumblings I'm hearing will allow for CIF to be sent at 10fps with a 56 kbps connection, which will be outstanding.

Do you really think we'll see the next version of iChat AV released before Tiger?

I do, simply because it's a nice extra revenue stream for Apple. There are people who will want the improved iChat AV, but for whatever reason not be able to upgrade to 10.4 for awhile, and hence will pay the ~$30, and then probably will upgrade to 10.4 at some point in the future. Same thing for Quicktime Pro.

It's just some nice extra cash, kind of like George Lucas releasing edition after edition of his movies, and when released as a beta allows them a free round of targeted bug fixing. This is somewhat going to depend on when Tiger actually ships; if 10.4 ships very early, it may not make sense to ship them separately.

Apple has stated that it'll ship in the first half of 2005. Half a year is a pretty darn broad time frame, but this basically means the latest we'll be seeing it is at the 2005 WWDC. Sensical.

However I'm hearing rumblings that schedules actually have the internally scheduled release date pegged much earlier, as in the first quarter of 2005, and even the end of January 2005. But it's important to remember that internal schedules are just that, internal, and happen to be what they're shooting for. If something big comes up, there goes that.

If I were a betting man, even if you assume everything goes great I'd say we might see 10.4 hit final candidate around the beginning of 1Q 2005, with lots of testing, and ship early in 2Q 2005... this will let them throw some sales figures on the screen at the WWDC.

But in no way am I ruling WWDC out; in all honesty, with how long it's going to be until 10.4 is out the door I'd much prefer they test and improve it longer than may be necessary to actually throw it in the box and save a 10.4.x update.

Why are you limited to 10 users in an audio conference?

Unknown; this seems to be an entirely artificial constraint. From what I've had whispered, a dually 867Mhz G4 or 1.6GHz G5 should be able to handle way more than 10 users since it uses Qualcomm's PureVoice codec for audio, in fact that hardware I've mentioned should be able to handle 20+.

I'm sure there are reasons for this, but they aren't readily apparent given the technology and may well be business-related.

Will audio conferences require improved hardware?

There's a difference between participating in a conference and being the end-point for a conference. We can dub this 'the mixer machine'.

The way this works is that if 5 people are in an audio chat, they are all connecting to a master machine via SIP, RTP and RTCP streams. That end-point machine takes all inbound streams, decodes them, remixes them into one stream and sends that out to all of the others.

Obviously the machine doing the mixing is going to have to do a hell of a lot more work than the clients, or peers, which are just taking the combined feed. But I think the key here is going to be will you be able to do it comfortably.

If you're able to do two-person audio connections with iChat AV at the moment with no problem, I wouldn't worry about being able to handle most conferences if you have the bandwidth. If you're tight on resources to begin with, you're prolly going to be looking hard at upgrading your RAM and CPU.

It's worth noting that at the moment, if the machine mixing the conference drops out, there goes the whole conference. I'm also getting the impression that there's a lot of room for improvement here in the future, as you're going to need to be aware of what machine is the faster and on the best bandwidth... the software isn't smart enough to negotiate that for you.

Will I be able to video conference with more than four users?

I don't believe so, at least not for awhile. If I'm right in how they're doing this (and I could be wrong) there's a reason why a CIF sized frame is being used for video conferencing.

As a quick recap:

  • QCIF = 176x144 pixels
  • CIF = 352x288 pixels

Basically, if this works the way I think it does, it'll be very much like the audio example I gave above. Each peer computer sends a QCIF-sized frame to the computer dedicated to mixing, which then builds a CIF-sized frame from it and sends this on to the peers who then split the frame back out into the separate pictures and composite them into that nifty screen you can see on their iChatAV page.

I could be off on the sizes they're using, but I don't think so from what they're showing. And if you'll notice, four QCIF frames fit into one CIF frame, which I believe is where you're seeing your four-user limit. This could theoretically be bumped up in the future, but it's going to be awhile.

What kind of bandwidth will I need for conferencing?

This is worth splitting out into audio and visual components, as while you can do both at the same time, their requirements are going to very different.

Additionally it might help clear up some of the above, as I'm sure some of you are going "Eh? Why would they do that instead of..." and there really is a method to Apple's madness here... assuming I'm correct.

  • Audio conferencing
    This is going to be relatively unchanged from prior versions, with the exception of the mixing endpoint which is going to need broadband. Depending upon the quality, peer machines should be able to be on 56k, but for that peer machine you're going to want to be on something faster.

    Remember one machine acts as a mixer and takes upstream feeds from all of the others and then feeds them back out as one combined stream. It's pretty much going to have to work this way, as while it introduces some delay, everyone is on the same delay.

    It's good to remember that a typical broadband setup will have downloads speed an order of magnitude faster than it's upload, so the mixer can feed out the combined and re-compressed signal within it's upstream bandwidth while being able to pull down the other feeds easily.

  • Video conferencing
    If you hadn't gathered from the prior post, H.264/AVC will allow CIF-sized frames to be sent at the same bitrate you'd previously need for sending QCIF using H.263. As mentioned, the 'mixer' computer will, I believe, send out a combined composite CIF-sized frame to the peer machines which then break the image up into their individual squares.

    This is actually pretty efficient if you go back over what H.264/AVC is able to treat a frame, and the client machine would simply have to take the QCIF frame, split it, and go.

    This again means that that you're going to want to have one machine with more bandwidth than the others acting as the mixer, but for the peer machines, if you're able to do H.263 at a reasonable speed with your bandwidth you should be happy with conferencing.

Will I be able to display the feeds in individual windows?

Unknown. I'm going to have to assume that Apple has some pretty tight code compositing that big window, and several windows might be more expensive. But I really don't know.

What kind of hardware will I need for actually conferencing?

This was somewhat covered above, but to go a little more in detail, in terms of audio conferencing I wouldn't worry about it except for the mixing end-point. On a G4 with enough RAM you should be fine, and for the peer machines this should be unchanged.

However for video things are going to be quite a bit more steep, and again it's worth separating out the peer machines from the one dedicated to mixing. For peer machines, you're going to want to remember that H.264/AVC is about twice as expensive just to decode MPEG-4 at the same bitrate, so yes you're going to notice a difference over H.263, and the end machine still has to split the composite frame out and re-composite onto the pretty new window.

If you have a G4 and notice no problems now, your machine might feel a bit sluggish if you're trying to do other things while acting as a peer. God help you if you're on a G3.

For the mixer machine things are going to be steeper still; to conference four people you're going to need at least the equivalent of a 1.6GHz G5.

Apple is including Jabber support, does that mean they are pushing their weight behind XMPP and not SIP?

They're sticking with SIP. The jabber-side of things deserves more investigation and is something I'm greasing at the moment, but they're assuredly sticking with SIP.

Will iChat AV stay peer-to-peer, or moving server-side?

The peer-to-peer nature of iChat is something that can throw people so it's worth going into a bit as, well, it originally confused me as well, while iChat AV is peer-to-peer it's sort of a hybrid.

For connecting, most services use an MCU approach, which stands for multipoint control unit. This sounds scary, but this is basically the moderator of the conversation, and acts as an intermediary between two or more clients. This isn't a bad thing at all, and generally provides a higher QoS (quality of service) and other extended capabilities even if it happens to sound way less cool than P2P.

Think of this as man-in-the-middle approach; the clients each connect to the server which establishes what each client is capable of, and can make certain things, like getting around NAT firewalls much, much easier. It also makes the MCU the central point of failure for a conference, so it can be aware when a client drops out.

If you're still having problems, MacOS 10.4 Server will allow you to create your own Jabber-based MCU for your chatting if you so choose, meaning that your own server box becomes the MCU for iChat, not the AIM network. Unfortunately there is an operational cost to having an MCU; Yahoo, AOL and others have a lot of servers dedicated to providing that functionality out to the clients. When someone chats, or does you're watching someones' webcam, it's not 'free'.

There are other benefits, such as not having to have one of the peers be a piece of uber-hardware capable of handling the mixing, and all the other endpoints don't have to be able to talk to each other. This may seem odd, but it's entirely possible for two computers to not be able to see each other, but for their users to be able to communicate if they're going through an MCU.

Peer-to-peer works differently, where the clients establish a connection between themselves and negotiate how things are going to go down. There are benefits to this, such as not being beholden to the CMU's capabilities, but drawbacks also, like, well, connecting; things akin to peer-to-peer services just hate things like routers with NAT enabled.

You can often get around this by forwarding specific ports to a static IP. But unfortunately you're lucky if you're able to get a normal user to open specific firewall ports without hurting themselves, let alone setting themselves up with a static IP and logging into their router and forwarding ports.

I'm fairly sure iChat AV gets around this by using something similar to STUN (Simple Traversal of UDP through NAT). The idea is that you have something like a STUN server sitting somewhere on the public internet, and the clients query it and long story short, if your router supports 'consistent translation' and you have your firewall ports open, some magic happens at the STUN server and the clients are able to know where the other SIP client is and what ports it should be sending data to.

If it doesn't support consistent translation, or it's a blue moon, you're going to have major problems unless you are able to forward those ports. This is why you'll periodically see flame wars on mailing lists where person A says "I can't get iChat to connect using NAT" and person B says "You're an idiot, cus I can". And yes, I really wish Apple had saved some mailing list bandwidth and included some software in iChat to check the capabilities of the router, but hey, it's become somewhat less of an issue as just about all routers you can pick up now support it, as Apple isn't the only one who needs this functionality.

So yes, there is a bit of MCU magic going on to get around NAT'd networks. Once the clients are able to communicate however they communicate with each other, bypassing the server, so iChat is P2P. I've heard rumblings that the server connection to get around NAT might be going away, making iChat AV fully P2P, but to be honest I can't for the life of me imagine how that would be accomplished without doing something new.

Oddly enough, I've also heard rumblings of the other sort; that iChat AV could well move away from P2P and tie into AOL's MCUs. More on this in a future post.

yummy alcohol posted button Posted by drunkenbatman
    August 22, 2004, at 05:58 PM


Comments (14)




Post a comment



Anonymous comments are allowed, but please enter something for a name.

And do endeavor to appear sane.









Remember personal info?