qooxdoo on a diet

You may have noticed that we’ve been busy working on a big topic recently: Improving UI responsiveness by reducing the amount of DOM elements created for each desktop widget.

Those additional elements were necessary in older browsers to allow for fully-featured rich user interfaces. Now we can take advantage of native features in modern browsers while trying to implement graceful degradation in older browsers and discontinuing support for some legacy browsers.

Every desktop widget now consists of just one content element. Before, each widget consisted of at least two (often even 3-5) DOM elements and their corresponding JavaScript object representations. So the new element or object count is significantly reduced by about 3:1.

3D DOM View: master vs. diet

qooxdoo Widget Browser in Firefox' 3D DOM view: master branch (left) vs. diet branch

This work happens in a feature branch appropriately named ‘diet’, which we now consider ready for public testing. We’ve prepared a document explaining the changes in detail. This also includes a migration guide for anyone willing to test the modifications. If you don’t use your own custom themes, layouts or widgets, it should be particularly easy to build your app against the diet branch. Give us feedback on the usual channels, e.g the mailing list or Bugzilla.

If you don’t have the time to migrate your applications, you can still help us out by testing the framework’s demo applications in your favorite (modern) browser.

We’re planning to merge the diet branch into master in about two weeks. Anyone currently using the master branch is strongly encouraged to test their application using diet before this time to prevent any unpleasant surprises.

Since these are major modifications including some API changes, they will be published along with a next major release qooxdoo 3.0 this summer.

Generator: Cache Invalidation – Revisited

The compile cache is one of the main means to improve build performance of the generator. Among other things, like optimized versions of qooxdoo classes, it contains information about the dependency relations between them.

Dependency analysis is also one of the most time-consuming activities of the generator. That means: The more you can re-use of this information, the faster the generator will run. To come up with this information, the generator basically parses class code into a syntax tree, then traverses that tree to find symbols that refer to other classes (or to other symbols in general). With each symbol it finds it has to decide about its availability at run time. Built-in JavaScript symbols like Date or RegExp are known to be provided automatically by the JavaScript run time, so there is nothing the generator needs to care about. But e.g. references to other qooxdoo classes are recorded. This information is then used to e.g. derive the complete list of classes that need to go into the current build, so all dependencies are satisfied.

Another important aspect of the dependency information is the fact whether a particular symbol is used by a given class at load time (when the class code is parsed and evaluated in the interpreter) or at run time (when instance methods are executed). This influences the ordering of classes which is done by the generator, and eventually burned into the loader script. Classes that are needed by any particular class at load time need to be loaded ahead of the requiring class (This is usually not true for run time dependencies, which just have to be loaded into the interpreter eventually).

Now, determining whether a given dependency is needed at load time or run time can be a tricky question. There are obvious examples. A class referred to in the extend entry of the class map signify the current class’s super-class; it has to be known to the interpreter ahead to the moment when the the extend key is parsed.

qx.Class.define("foo.Bar", {
  extend : qx.core.Object,
  ...
});

Here, qx.core.Object needs to loaded prior to foo.Bar. A method of another class called in one of its own instance methods needs only be available at run time. It is not necessary at loading time of the class, as the method body is parsed, but references are not yet evaluated.

  baz : function() {
    var a = new foo.Baz();  // <- calling another class' constructor
    ...
  }

Caching Dependencies

As with every cache implementation, a crucial aspect is that of cache invalidation: Knowing when any particular item in the cache is out-of-date and needs to be re-calculated. Using an outdated cache object causes wrong results; invalidating a cache object that is still fresh causes unnecessary re-calculation and hits the performance. This is particular challenging when we look at transitive load dependencies of a class. Some dependencies of a class have to be explored recursively. Here is a simple example:

qx.Class.define("foo.Bar", {
  ...
  statics : {
    bez : new foo.Baz(),
    ...
  }
  ...

A static member of the class is initialized with an instance of another class. So this other class becomes a load time requirement of the current class. Fair enough. But the story doesn’t end here. foo.Baz has to be available when the JS interpreter comes to parsing foo.Bar. As foo.Baz has been loaded before, its constructor method can be called. But what about the dependencies of this constructor method?! Those dependencies have been considered run time dependencies when foo.Baz was analysed. Now for the foo.Bar class, those suddenly become load requirements.

This means that the generator has to follow those recursive dependencies during analysis of foo.Bar. This recursive analysis is particularly expensive. Once completed, the result is cached. The next time an application needs to be build with this class, those dependencies are retrieved from the cache. But wait – are they still fresh? The foo.Bar class might not have changed on disk, but what about the other classes, those discovered analysing the dependencies of the static bez member?! A change in any of them might result in new or altered dependencies coming in for our foo.Bar class. So to validate the cached result, the generator needs to go through the required classes of the recursive dependencies and check if they have changed.

Checking the status of each class is straight forward. It’s last-modified time stamp from the file system is compared to the time stamp of the cached information. In the first implementation this check was done directly going to the file system using an operating system stats() call. But with an increasing number of classes in an app, and an increasing number of detected recursive dependencies, also the number of stats() calls increased. This began to show a significant impact on generator performance.

Checking Freshness

After  some head-scratching I decided to be contempt with a comparison time taken at generator start-up. I reasoned that it is rather unlikely that somebody is changing source files midway through a generator run, and expect the changes to be picked up immediately, and everything remaining consistent :-) . So times taken a start-up seemed good enough for freshness checking. The first solution was to compare against the cumulative last-modified time stamp of any used library (the time stamp of the most recently changed file of any of the files in the library). This has to be calculated anyway, as – you guessed it – the result of library scans (which classes are in it, which resources, …) is also cached.

This had a very nice effect on building the Demobrowser, with its hundreds of demo apps, and a lot of cache checking when building every one of them. But the downside also showed. For iterative development, when you changed just one file in a library, all contained classes where suddenly considered newer, basically invalidating any cached information relating to any of them. That was grossly overshooting, and was like using no cache at all, everything was re-calculated with the next generator run.

So the best solution so far is to record all last-modified time stamps of the classes when scanning a library, and then comparing the cache against each recorded class time stamp individually. This continues the reasoning that generator start-up times are good enough, without penalizing iterative development.

JavaScript Array Performance Oddities (Characteristics)

Update: We’ve added Opera 10.50 tests and rerun all tests on a MacBook Pro. Further links to the code and the running tests have been added.

When working with a programming language we often make assumptions about the performance of certain language constructs. However, as good engineers we know that whenever we make performance assumptions without measurement we drown a kitten.

When working with JavaScript arrays two independent assumptions are prevalent:

  1. They behave like C/Java arrays. Iteration over the contents and accessing data by index is fast – deleting entries and extending the size is expensive.
  2. They behave like JavaScript objects. Iteration over the contents and accessing data by index is slow – deleting entries and extending the size is fast.

It turns out that depending on the context both assumptions can be true or wrong. More on this later.

JS Arrays

Arrays are one of JavaScript’s core data structures. However, arrays in JavaScript are a totally different beast than arrays in most other languages. In JavaScript arrays are a special case of objects. In fact they inherit from Object.prototype, which also explains why typeof([]) == "object". The keys of the object are positive integers and in addition the length property is always updated to contain the largest index + 1. This supports the second assumption (array == object).

Our Use Case

In our use case we wanted to improve our custom event layer. We store the list of event listeners in an array like this.

listeners.push({
  id: id,
  callback: callback,
  context: self
});

Working with the array is quite efficient but removing items by id is a linear operation. Since arrays are just special cases of objects we thought that rewriting the code to use objects should speed up deletions but maintains all other performance characteristics:

listeners[id] = {
  callback: callback,
  context: self
}

The somewhat unexpected result was that while deletions became faster the overall performance became much worse.

Array vs. Object

So while Arrays and Objects are conceptual almost the same, most JavaScript engines treat them very differently.

var ar = [];
for (var i = 0; i &lt; 500000; i++) {
  ar[i] = i;
};
 
var map = {};
for (var i = 0; i &lt; 500000; i++) {
  map[i] = i;
};

These two loops are almost identical but the first one can be up to 200 times faster (Firefox 3.6) than the second one.

So we know that arrays are treated differently by the JavaScript engines but under which circumstances do the optimizations kick in? To test this we kept the first loop but added the following statements before the loop:

  1. ar[10] = 10; // use the array as a small sparse array
  2. ar[50000000] = 10; // use the array as a huge sparse array
  3. ar.monkey = 10; // treat the array as an object

In addition we tested the same code but initializing the array backwards (counting down from 500,000),  the first time without any modifications and the second time initializing the array with new Array(500000) instead of []. The constructor parameter to Array is supposed to give the JavaScript engine a hint about the expected array size. Here is the outcome of the test runs across various browsers:

Array initialization

The tests code is available on Github. Click here to run the tests locally.

Analysis

The first observation is that there are huge differences between the browsers. The second observation is that depending on the context arrays are either extremely fast or as slow as objects.

  • WebbKit: In WebKit there is a huge difference between objects and arrays. Further it is the only one which can optimize the huge sparse array case. WebKit on the other hand doesn’t like the reverse case at all. To me this looks like a bug in the implementation.
  • Chrome 5 (V8): The V8 JavaScript engine of Chrome does an amazing job. With the exception of the huge sparse array case, all tests took almost the same time. From a performance point of view there is almost no difference between objects and arrays.
  • FireFox: We have tested two FireFox versions. Firefox 3.0 with the old SpiderMonkey engine and FireFox 3.6 with the tracing JIT TraceMonkey. The characteristics of both are very similar with FireFox 3.6 being a lot faster. If the array is used purely as array, Firefox 3.6 is the absolute winner. In all other cases performance degrades to the much slower object case.
  • Opera: Since Opera 10.50 has a completely new JavaScript engine we’ve tested both Opera 10.50 beta and the current stable version 10.10. Both versions show a very different performance characteristic, which shows how much the JavaScript engine has changed. In Opera 10.10 there is almost no difference between objects and arrays. This indicates that Opera does not have any special optimizations for arrays. The overall performance is pretty decent. Opera 10.50 is a huge improvement. The performance is similar to V8 and the common tasks are even faster than on V8. Like V8 it has a peak in the huge sparse array test. Continue reading

qooxdoo strong in Taskspeed benchmark

Together with the release of Dojo 1.3 information was disclosed about Taskspeed, a new JavaScript library benchmark test. Taskspeed is created and maintained by Dojo’s project lead Peter Higgins and is based on the well-known Slickspeed benchmark for CSS selector engines. Taskspeed now compares DOM manipulation performance of various JavaScript frameworks. The first published results see Dojo as a clear winner, leaving MooTools, jQuery and Prototype behind. It’s interesting to note that the respective framework developers/authors supplied the tests for their own library, so the code should reflect the best practice and coding style for each library.

Taskspeed Firefox 3 results

Taskspeed Firefox 3 results

Taskspeed IE7 results

Taskspeed IE7 results

Of course, we wanted to know how qooxdoo compares in this “framework-neutral” test suite. qooxdoo’s own DOM layer is not too old, it was released in August 2008 with qooxdoo 0.8 by splitting out all cross-browser code from the widget system. The last missing pieces were a CSS selector engine and a nice way to work with collections of elements. Both features were added to the qooxdoo 0.8.2 release that shipped only four weeks ago. While we were quite confident, we did not really know what to expect. But after complementing Taskspeed with the qooxdoo tests and looking at the results we were stunned. The results indicate that qooxdoo is on par with the fastest framework (Dojo 1.3) on most browsers except IE. On IE qooxdoo is by far the fastest framework.

Across browsers and frameworks, qooxdoo gained the highest ranks on all versions of IE (i.e. 6, 7 and 8), and made its lowest mark coming out third on Firefox 3.0. This exceptional IE performance also leads to the best overall score. The IE results are a big surprise and we’ll try to investigate, what we do different (better) than all the other JavaScript libraries.

As always performance tests should be taken with a grain of salt. It’s hard to judge whether all implementations are really equivalent. For example in the jQuery tests John Resig implemented all tests in a pure jQuery way. There are obvious optimizations he consciously omited, but it apparently reflects the genuine jQuery coding style. There is no official qooxdoo way to work with the DOM yet, so we modeled our tests closely after the Dojo and jQuery tests. Please note that the current tests don’t run on a vanilla qooxdoo 0.8.2, but on the latest trunk. That’s because the tests exposed two bugs in the framework, which had to be fixed. Most importantly however, no code was changed in the qooxdoo repository to make it perform better in the Taskspeed test suite.

Analysis

So why does qooxdoo perform so well compared to other frameworks?

  1. We build a GUI-toolkit on top: Our requirements for a DOM layer are different than those of pure DOM toolkits like jQuery, Prototype or MooTools. We build a whole GUI toolkit on top of the DOM layer and DOM performance is the weakest link in the chain. If DOM operations are slow, the whole widget stack is slow. For this reason we try to keep our DOM layer as thin as fast as possible. We do just enough to cover all cross-browser issues, but not much more. In qooxdoo 0.8 we use it mostly as an internal API to build our widgets. We had no incentive to add any syntactic sugar or convenience API which could have slowed things down.
  2. Sizzle: For the GUI toolkit we have no use for a CSS selector engine. For this reason we were probably the last JavaScript framework to add this feature to its feature list. We always intended to adopt an existing engine and John Resig’s excellent Sizzle engine was a perfect match. Since Taskspeed requires the use of CSS selectors in a couple of tests, our good performance is in part due to Sizzle’s good performance. Since the latest jQuery 1.3.2 has an identical Sizzle implementation but is rather slow in Taskspeed, there’s obviously much more than pure selector speed.
  3. Collection API: This is the final link, which connects Sizzle with qooxdoo’s procedural DOM API. The idea behind the Collection API is to convert the result of a Sizzle query into a collection instance, which provides all the chaining goodness of jQuery. The methods offered by the collection API match most of the jQuery API and map directly to our existing DOM layer. One trick used there is to subclass the native JavaScript array. This way we can conveniently access the elements matched using array accessors, and we can leverage native Array methods like filter, indexOf or map if available.

We plan to position qooxdoo’s DOM level API as a standalone part of the qooxdoo framework. These results show that we have a good base and encourage us to move forward in this direction.

Please help the Taskspeed team to collect more data by running the tests yourself and submitting the results when asked for in a confirmation box.