Tuesday, November 27, 2012

Geotools integration methodology

The Munich codesprint was some time ago and we did not have the opportunity to devote more time to the Geotools integration since then. Finally we have been able to work on it for a week.
As stated from the start, the aim of the 2012 codesprint in Munich was not to finish the Geotools integration but to get an idea of how to tackle the task and estimate the cost of it (see our Release Planning for details on how we want to tie this into the CE release cycle).
First of all, for all that have not dealt with the gvSIG code base before: We have found that this task is not easy at all. The gvSIG 1.x branch (the one CE is based on) has hundreds of thousands of lines of code, organized in ~50 different projects that not only do not completely separate user interface from model but that are also tightly coupled with each other. Little or no documentation, duplicated code, meaningless variable names and 17.000 warnings makes handling the gvSIG source code a really discouraging experience. If you're a developer, I'm sure you get the idea. 
After trying different approaches and restarting from scratch each time, we have finally found a method that has the following advantages:
  • It provides value immediately. We'll be able to produce gvSIG versions working on Geotools very soon. Of course, the initial versions will lack a lot of functionality the current gvSIG CE version has, but:
  1. everybody will be able to see the progress of the integration
  2. at a certain point in time, some people may be content with the status of the integration and start using it
  • It defines several more or less precise steps to follow. This is helpful when dealing with a beast of so many lines of code. The method is explained further down.
  • Can be done in parallel by several people. This is very important since this initiative has received some attention and I would say it is likely that someone will join and share the efforts. The development will take place in a GIT repository in github, so that forks and merges can happen easily. 
This method has been the first one we have considered good enough to make the integration succeed. Of course, suggestions are welcome.
Let's explain the method. In gvSIG, there is a plugin system, called "andami" that hosts about 25 extensions that provide the user interface. These extensions use several libraries to implement their functionalities. Instead of modifying the existing code base, we have created one more extension project called "main" and one more library project called "core".
The main extension will receive one by one every functionality from the old codebase. As the libraries used by the old code are not present in main, there will be some compilation errors. To solve these compilation errors, the code that uses the old libraries will be changed to use the core library, adding the necessary functionality to core if necessary. Of course, core exposes Geotools API, as it was the main aim of the integration. 
So schematically:
  • Move all extensions to main one by one
  • For each extension:
    • Include all classes in the same extension that are necessary to compile. Code can be removed if:
      • Supports old 0.3 version (method name ends with 03)
      • It is not necessary right now and we'll be forced to add it later. For example:
        • Public methods that are not called in main -> Either they will be called with code that will be moved to main and we'll recover the erased method, or they will never be called so it was well removed.
        • Whole classes -> At the end we can compare old and new codebase and we can detect the classes that are missing.
    • Fix the compilation errors of code using old libraries (not accessible anymore from main). This might require adapting the code to the new core library and even adding more functionality to it. In case of new functionality:
      • Discuss in the list the best way to implement
      • Implement
      • Add javadoc
      • Add unit tests 
Some considerations when following the method:
  • Our first priority is integrating Geotools. Please don't try to fix the extension code in main. While following the previous method you will see some things that can easily be fixed. Please don't. You'll realize that there are many of these and fixing them will keep you from the real aim, which is the integration. Let's try to do only minimal changes to the code that goes to main.
  • Do not add TODOs. They are ignored. Assert false:"todo message"; is much more effective. If you don't want to implement a method, try to remove it (following the rules stated before).
  • If you add some comment related to the integration, include the text "gtintegration" in it, so that we can process it later.
  • Please, try to understand core before making changes. Just use the mailing list to agree on some changes before starting to code.