In the past few years a whole range of visual effects have been standardized. Future websites can render pretty much anything using bitmap canvasses, display 3D content using CSS 3D Transforms or WebGL and even implement entire key-frame based animations using nothing but CSS. Combined with specifications like the Application Cache and Local Storage, “HTML5? enables a whole new range of web-based applications.
Unfortunately, now that almost everything can be visualized on your monitor, the inability to synthesize, process, and analyse audio streams is becoming more and more obvious. While Flash provides fairly extensive APIs for working with sound, having a native (and preferably more extensive) API available to synthesize, process, and analyse any audio source is much more convenient. That’s why the W3C Audio Incubator Group was founded!
Don’t get too excited just yet: while an initial draft has been published by Google’s Chris Rogers, you shouldn’t expect the API to be finished within the year. The initial version received lots of input from six Apple engineers: Maciej Stachowiak, Eric Carlson, Chris Marrin, Jer Noble, Sam Weinig and Simon Fraser, and now frequently gets updated based on feedback received via the mailing list. The draft specifies various features for the API: spatialized audio, a convolution engine, real-time frequency analysis, biquad filters and sample-accurate scheduled sound playback. Wait, spatialized what?
The reason why it doesn’t exist already
The complexity involved with synthesizing, processing, and analysing audio is one of the key reasons why it doesn’t exist already. Most audio today has a sampling rate of just over 44 thousand samples per second; tracks of DVDs and blu-ray discs can be as high as 192 thousand samples per second. When multiplied by the number of sound channels and considering the decoding required to make sure the file makes sense, you can imagine the amount of work that goes into translating that MP3 file to waves our ears can interpret.
Of course, part of this process is handled by hardware, like converting the digital stream to an analog signal. However, applying effects to an audio stream happens entirely in software where each sample gets processed. In situations where effects are applied and the processed sound is played back almost simultaneously, you can imagine how critical things like buffering and timing are.
Another problem is JavaScript performance. While the scripting engines have become way more powerful in the last few years, they can be in the order of twenty time slower than well optimized native code. When used in combination with one of the SSE instruction sets, which enhance your processor with highly optimized abilities to do audio-related math, today’s scripting engines still got a long way to go.
Native processing to the rescue: just create an API
Performance can be improved by moving most of the processing away from JavaScript. By providing the Application Programming Interface (API) the Audio Incubator Group will likely be proposing that your script gains the ability to “describe” what you want to be doing, rather than doing it. Right now, however, work is being done to implement an interface allowing direct JavaScript processing in the API. Such an interface could be used to prototype audio processing algorithms and creating educational demos, something which already was a possibility using Adobe Flash and Mozilla’s Audio Data API.
The idea is simple: the “base” is an AudioContext interface which manages connections between the different Audio Nodes. The context contains a Destination Node by default, which represents the output device on your computer. This could be your speakers, your headphones or, perhaps in the future, even as a file on your harddrive.
Of course, there have to be audio sources as well. There are various kinds of sources: MediaElementAudio- SourceNode for <audio> and <video> tags and AudioBufferSourceNode for other kinds of input, like MP3 files requested via XHR. Other types are yet to be defined, but source nodes like DeviceElementSourceNode aren’t unthinkable, which could be used to process microphone input via the <device> element.
Between audio sources and destinations, there can be other types of nodes to perform various kinds of manipulations. The specification currently defines the following interfaces:
AudioGainNode Allowing you to change the volume of the audio.
AudioPannerNode Positioning and spatializing audio in a 3D space.
BiquadFilterNode Add lowpass, highpass, and other types of common filters to the audio.
ChorusNode Add a chorus effect to the audio.
ConvolverNode Add effects to audio, such as imitating the sound of a concert hall.
DelayNode Apply dynamically adjustable delays to an AudioNode.
WaveShaperNode Adding non-linear waveshaping effects, like distortion.
These nodes form the foundation of many of the features currently available in audio systems, but the specification is still far from finished and more types of nodes may be added. For analysis you could use a RealtimeAnalyserNode, which allows you to analyse the audio node in real time. This could be used for example, to display the tones output by a stream.
An example: dynamically changing the language of a video
Currently there is no clean way to switch between alternative audio streams for a HTML5 <video> element. The Audio API is ideal for such a purpose. When you keep a number of things in mind, like fragmenting the audio in smaller files to speed up the (initial) loading, it won’t be hard to create a language switcher:
Create an AudioContext,
Get the audio sources from the <video> element using a MediaElementAudioSourceNode,
Decrease the volume of the video using an AudioGainNode,
Get the new audio stream by requesting the MP3 via XHR and putting it in an AudioBufferSourceNode,
Combine the two using the Dynamics Compressor (DynamicsProcessorNode),
Play the audio stream.
This can be demonstrated using the following diagram:
These same techniques could be used to dynamically control background sounds for clips and create timed effects for games using an arbitrary number of output channels (which could be 2 for stereo, 5.1 for surround or even more!). Of course, more normal use-cases can be thought of as well: a beep when you click on a button, messages when interactive validation in a form fails or a music player featuring cross-over effects.
A number of examples demonstrating the capabilities of the Web Audio API are available as well, but keep in mind that you have to build WebKit yourself. They do show the involved JavaScript code however!
I’m really interested in the progress of the Audio Incubator Group and can see quite some benefits in being able to synthesize, process, and analyse audio through JavaScript. I’ve signed up to their mailing list and follow prototypes in Gecko and WebKit. Are you interested too? Consider following @AudioXG on Twitter or subscribe to the public-xg-audio mailing list at the W3C — lots of cool things are yet to be invented!
Thanks and credits to Chris Rogers and Koen ten Berg for their technical input and feedback!
Today, exactly 217 days after the first Internet Explorer 9 announcement, Microsoft has released the third Developer Preview of the latest version of their browser. One of the most awaited and unannounced features this preview brings is the addition of the HTML5 Canvas Element. Defined in section 4.8.11 of the HTML5 specification, already implemented in all other major browsers, and now upcoming for Microsoft’s Internet Explorer 9: 2D Canvas is a go!
The history of <canvas>
The first signs of the -then still proprietary- element were committed to the WebKit source tree by Richard Williamson on the 25th of May, 2004. Apple’s idea came down to exposing Mac OS X’s Quartz drawing system to JavaScript and HTML in order to ease up writing graphical widgets for the Apple Dashboard. Consequently, as both products share the rendering engine, the element became available in the Safari browser as well.
In July of that year Dave Hyatt announced the new element on the Surfin’ Safari blog. This immediately brought up a lot of controversy, of which Eric Meyer’s post is a clear example: “What the bleeding hell?!?” In defense, Hyatt elaborates Apple’s rationales for including the proprietary features and said to submit a proposal to the WHATWG lists, however, it never came. Ian Hickson therefore, despite his opinion on how Apple handled the new elements, reverse engineered a draft based on available source code.
A few years earlier, late October 2001, Joe Hewitt opened bug 102285 in Mozilla’s bug tracker. Sharing the same name and rationale, his proposal was to implement a custom painting control to Mozilla’s XML User Interface Language. Interestingly enough, Brendan Eich, founder of the JavaScript language, tore down the idea as something for rendering fanboys. The patches were never used, and inclusion in official builds was unlikely, as Eric Murphy stated in the discussion.
On the first day of April in 2005 Mozilla’s Vladimir Vukicevic uploaded a patch featuring basic canvas functionalities, which opened the road for further work in Firefox. While this first implementation only worked on Linux due to different color formats on Windows and Mac OS X, the release of their “Deer Park” project late November, known as Firefox 1.5, featured a cross-platform implementation of canvas.
Opera introduced the <canvas> element mid 2006 with their Opera 9 release, in quite a humble way (can you see it without searching?). This meant that all major browsers, with the exception of Internet Explorer, implemented the element natively. However, it didn’t mean that the element was unusable, as Google’s ExCanvas and Mozilla’s IECanvas projects brought limited support for the element to Microsoft’s browser.
The long and juridical path to standardization
The path to proper standardization wasn’t very smooth. This began with the lack of a proper proposal coming from Apple’s side, resulting in the initial specification being based on reverse engineering works by Ian “Hixie” Hickson, editor of the HTML5 specification. In 2005, Jayant Sai brought up an initial idea for drawing text on a canvas, which later got formalized into a decent proposal by Stefan Haustein.
However, not everything went nice and smooth. After Mozilla Firefox and Opera had implemented the element, Apple’s Senior Patent Counsel Helene Plotka Workman sent a message to the WHATWG and Ian Hickson stating that Apple believed to have intellectual property over the canvas element, and would only consider to release these IP Rights if the Web Applications draft would become a formalized draft standard with the W3C.
Despite the fact that the rationale behind Apple’s message was unclear, the timing of their message was interesting. Exactly a week before, the W3C would be re-launching the HTML Working Group. Less than half a year later, in February 2008, the first draft of the HTML5 specification was published as a W3C Working Draft. On the 18th of June that year, Apple disclosed patent 11/144384 for use by the HTML5 specification. The same patent has been disclosed in six other jurisdictions, enabling the WHATWG to continue including <canvas>.
Going 3-dimensional with WebGL
More recently, on December 10 last year, Mozilla’s Arun Ranganathan announced the first draft of the WebGL specification. While you would expect the specification to be hosted by either the WHATWG or the W3C, because it defines a context for the HTML5 Canvas Element, WebGL is available at Khronos. This can be explained by the fact that the specification originally was intended as a simple binding of OpenGL ES2.0 to JavaScript, whereas the Khronos consortium already hosted the OpenGL ES specs.
WebGL is the second context that can be used with the <canvas> element. As said before, it has been based on the OpenGL ES 2.0 specification and provides a JavaScript interface for 3D graphics. The specification has evolved out of an experiment by Mozilla’s Mozilla’s Vladimir Vukicevic. He first demoed the possibilities in his “Web Graphics: Canvas, SVG, and more” talk at XTech 2006, and later announced as the “moz-glweb20? context. Opera published their opera-3d context late 2007, but decided to add abstraction in order to leave the door open for other implementations based on, for example, Direct3D.
WebGL is a specification in which all browser vendors, with the exception of Microsoft, participated. This can be clearly seen by the fact that nightly builds of Firefox, Google Chrome and Safari contained implementations of WebGL. While Opera actively participated in discussions, they have yet to release a public build containing the 3D context. Nokia has announced WebGL support in a new firmware version for their Nokia N900 phone.
Of course, Google hasn’t been silent either. In May they announced the ANGLE project, which basically translated OpenGL calls to their DirectX equivalents. Two weeks later, on April Fools’ this year, Googler Chris Ramsdale announced a WebGL port of the Quake II Game Engine.
No really, Thank You, Microsoft!
Even before the Internet Explorer 9 announcement, Microsoft tried to move the Canvas 2D context to its own module. There was no word about <canvas> in the first two Developer Previews, and the company’s position could best be described as vague. In May, Microsoft Evangelist Giorgio Sardo publically stated that he would like the element to be included, but also added that the company was in no way committed to Canvas.
More recently, at the SVG Working Group meeting earlier this month, Internet Explorer’s General Manager Dean Hachamovitch stated that the team wouldn’t be talking about implementing Canvas at that point. However, he added the following: “all your graphics needs will be taken care of, and I’m smiling broadly.” Today they finally confirmed the suspicion a lot of web developers have been having for months: the 2D canvas has been included in Internet Explorer 9.
Google’s Vangelis Kokkevis, former lead developer of the O3D project, enabled support for the “–enable-accelerated-compositing” flag in Chromium nightlies. By supplying it to the browser, the so called “fast path” for rendering gets enabled in WebKit. This path is responsible for accelerating a number of performant features in the engine, such as CSS 3D Transformations, video decoding and various components of the WebGL Canvas. While the software implementation landed back in March, this change allows you to use it as well. Milestone 6, builds of which frequently get pushed to the dev-channel, already mentioned plans for supporting CSS 3D Transforms. These transformations were introduced by Apple about a year ago and can now be found in their own W3C Working Draft. About a month ago the Qt WebKit Port announced support for the draft, and the nightly Chromium builds introduced it yesterday. Mozilla’s stance on the specification has yet to be defined, and no word from Internet Explorer and Opera either.
Why would I want to use 3D in my webpages?
You don’t. No really, I truthfully hope we’re not going to see entire websites created in 3D for a while to come. A large part of the websites today are horrible in terms of usability; Perspectives and animations really aren’t going to improve that. The real use-cases can be found in examples such as Charles Ying’s Snow Stack: eye-candy and graphics are becoming more important in applications, and going 3D is a logical next step in that.
No, far from it. Currently the only supported platform is Windows, using OpenGL drivers. In the following weeks more support for Windows will be added by including Google’s Angle Project, which was announced in March. Simply put, ANGLE is a bridge between OpenGL and DirectX, enabling a much larger part of the Windows users to use the GPU acceleration. Support for Linux and Mac OS X is on its way, but isn’t stable enough yet in order to be included. Finally, when you enable accelerated compositing, video and WebGL will be disabled.
As for the current implementation, it’s quite rough. When putting a perspective on the <body> tag the renderer crashes, on any other element my scrollbar turns blue and artifacts aren’t rare either. Still, the results look great, smooth and barely takes any CPU time. Compositing gets triggered by various animation effects, such as transparency and transforms, usage of 3D Transforms and IFrames under certain conditions. Some of the most important things which are still to come include Safe Browsing, which will make sure that effects such as 3D Transforms and WebGL will not lock up your browser, fast paths for accelerated canvas and video elements, and support for Linux and Mac OS X.
Why doesn’t Chrome render a full page using the GPU?
Google’s position on full-page GPU rendering, such as Internet Explorer 9 and Mozilla Firefox are capable of doing, isn’t entirely clear yet. Keep in mind that GPU rendering isn’t everything: Opera comes incredibly close to Microsoft’s performance in the Flying Images demonstration without any GPU acceleration at all, and a large part of Chrome’s performance on that page can be accounted to the high-quality image scaling algorithms.
As Pete LePage, an Internet Explorer Program Manager, already noted: browser speeds aren’t all about JavaScript. The same can be said about hardware acceleration: while it can provide significant performance improvements, other components such as the DOM, styling and images need to be available before they can be rendered in the first place.
One of the subjects Microsoft is giving a lot of attention to in Internet Explorer 9 is graphics: extensive SVG implementation plans, various CSS specifications related to graphics, and with GPU Acceleration available by default. This isn’t entirely surprising: considering Microsoft owns the operating system and graphic libraries which have to be used for the acceleration and only support a limited amount of configurations, implementation is a lot easier than having to create cross-platform implementations based on third-party software.
Nevertheless, they are doing quite a good job at implementing the specifications, and certainly with a lot of sense for the finer details. A good example of this can be found in their rendering of borders. One of the new CSS properties that Internet Explorer 9 introduces is border-radius, as defined in the Backgrounds and Borders specification. While the specification still isn’t entirely clear on how to render mixed border-style connections, Microsoft’s latest implementation does look smooth and well adapting.
Scalable Vector Graphics
SVG is becoming a widely implemented specification for scalable graphics. With a file format based on XML, confirmed support for in-line HTML rendering in three major browsers, a mature specification and lots of support from professional graphic editing software such as Adobe Illustrator, it’s destined to strongly gain in popularity in the near future.
Internet Explorer 9 will support the entire SVG 1.1 specification, with the exception of Fonts, Filters and SMIL. Microsoft believes that web fonts have a decent future as it is, mainly because the W3C Fonts group is making vast progress in standardizing a common format. Filters will not be included because they are not convinced about how often they will be used, considering Internet Explorer has supported various filters since version 4, but barely anyone has used them. Finally, SMIL will be omitted because the company believes there would be too many different ways of handling animation. A fair reason, as CSS 3 introduces two additional ways of adding animation to your webpage.
Per-pixel rendering using <canvas>
While the Internet Explorer team is making a lot of progress with the development of Internet Explorer 9, not only on the technical side, but also through a development process which is more open than was the case with previous releases, their position on the HTML5 canvas element is simply vague. Early May Giorgio Sardo, an Internet Explorer evangelist, already indicated that he would like to have support for the element in IE9, but the company has neither confirmed nor denied inclusion of the element in the new browser. Just yesterday, however, Dean Hachamovitch hinted that Microsoft still has some tricks up their sleeve: “We’re not talking at this point about whether we’re supporting canvas or not, but I’m smiling broadly. All your graphics needs will be taken care of, and I’m smiling broadly.”
My guess would be, something I have been saying ever since the release of the first Internet Explorer 9 Platform Preview, that they certainly will be implementing canvas. I mostly base this on their attention to the graphical aspects of IE9, as well as on the comments made by various Microsoft employees. As Mozilla’s Brendan Eich already said about the subject: “Canvas is pretty small. It’s like your postscript level to 2D graphics.” I’m assuming that Microsoft knows their own APIs by heart, so implementation of the 2D canvas standard shouldn’t be a problem at all. In fact, utilizing the power of the Direct3D API, I’d say even WebGL is a likeliness. Microsoft is smiling broadly, let’s hope they allow us to do the same.