The compile cache is one of the main means to improve build performance of the generator. Among other things, like optimized versions of qooxdoo classes, it contains information about the dependency relations between them.
Dependency analysis is also one of the most time-consuming activities of the generator. That means: The more you can re-use of this information, the faster the generator will run. To come up with this information, the generator basically parses class code into a syntax tree, then traverses that tree to find symbols that refer to other classes (or to other symbols in general). With each symbol it finds it has to decide about its availability at run time. Built-in JavaScript symbols like Date or RegExp are known to be provided automatically by the JavaScript run time, so there is nothing the generator needs to care about. But e.g. references to other qooxdoo classes are recorded. This information is then used to e.g. derive the complete list of classes that need to go into the current build, so all dependencies are satisfied.
Another important aspect of the dependency information is the fact whether a particular symbol is used by a given class at load time (when the class code is parsed and evaluated in the interpreter) or at run time (when instance methods are executed). This influences the ordering of classes which is done by the generator, and eventually burned into the loader script. Classes that are needed by any particular class at load time need to be loaded ahead of the requiring class (This is usually not true for run time dependencies, which just have to be loaded into the interpreter eventually).
Now, determining whether a given dependency is needed at load time or run time can be a tricky question. There are obvious examples. A class referred to in the extend entry of the class map signify the current class’s super-class; it has to be known to the interpreter ahead to the moment when the the extend key is parsed.
qx.Class.define("foo.Bar", {
extend : qx.core.Object,
...
});
Here, qx.core.Object needs to loaded prior to foo.Bar. A method of another class called in one of its own instance methods needs only be available at run time. It is not necessary at loading time of the class, as the method body is parsed, but references are not yet evaluated.
baz : function() {
var a = new foo.Baz(); // <- calling another class' constructor
...
}
Caching Dependencies
As with every cache implementation, a crucial aspect is that of cache invalidation: Knowing when any particular item in the cache is out-of-date and needs to be re-calculated. Using an outdated cache object causes wrong results; invalidating a cache object that is still fresh causes unnecessary re-calculation and hits the performance. This is particular challenging when we look at transitive load dependencies of a class. Some dependencies of a class have to be explored recursively. Here is a simple example:
qx.Class.define("foo.Bar", {
...
statics : {
bez : new foo.Baz(),
...
}
...
A static member of the class is initialized with an instance of another class. So this other class becomes a load time requirement of the current class. Fair enough. But the story doesn’t end here. foo.Baz has to be available when the JS interpreter comes to parsing foo.Bar. As foo.Baz has been loaded before, its constructor method can be called. But what about the dependencies of this constructor method?! Those dependencies have been considered run time dependencies when foo.Baz was analysed. Now for the foo.Bar class, those suddenly become load requirements.
This means that the generator has to follow those recursive dependencies during analysis of foo.Bar. This recursive analysis is particularly expensive. Once completed, the result is cached. The next time an application needs to be build with this class, those dependencies are retrieved from the cache. But wait – are they still fresh? The foo.Bar class might not have changed on disk, but what about the other classes, those discovered analysing the dependencies of the static bez member?! A change in any of them might result in new or altered dependencies coming in for our foo.Bar class. So to validate the cached result, the generator needs to go through the required classes of the recursive dependencies and check if they have changed.
Checking the status of each class is straight forward. It’s last-modified time stamp from the file system is compared to the time stamp of the cached information. In the first implementation this check was done directly going to the file system using an operating system stats() call. But with an increasing number of classes in an app, and an increasing number of detected recursive dependencies, also the number of stats() calls increased. This began to show a significant impact on generator performance.
Checking Freshness
After some head-scratching I decided to be contempt with a comparison time taken at generator start-up. I reasoned that it is rather unlikely that somebody is changing source files midway through a generator run, and expect the changes to be picked up immediately, and everything remaining consistent
. So times taken a start-up seemed good enough for freshness checking. The first solution was to compare against the cumulative last-modified time stamp of any used library (the time stamp of the most recently changed file of any of the files in the library). This has to be calculated anyway, as – you guessed it – the result of library scans (which classes are in it, which resources, …) is also cached.
This had a very nice effect on building the Demobrowser, with its hundreds of demo apps, and a lot of cache checking when building every one of them. But the downside also showed. For iterative development, when you changed just one file in a library, all contained classes where suddenly considered newer, basically invalidating any cached information relating to any of them. That was grossly overshooting, and was like using no cache at all, everything was re-calculated with the next generator run.
So the best solution so far is to record all last-modified time stamps of the classes when scanning a library, and then comparing the cache against each recorded class time stamp individually. This continues the reasoning that generator start-up times are good enough, without penalizing iterative development.