Columbus results

for the Collective Demonstration of Reverse Engineering Tools

Version: September 20, 2001.

Rudolf Ferenc
University of Szeged
ferenc@cc.u-szeged.hu
Arpad Beszedes
University of Szeged
beszedes@cc.u-szeged.hu
Tibor Gyimothy
University of Szeged
gyimi@cc.u-szeged.hu

Section 1. Introduction

Columbus is a reverse engineering framework, which has been developed in cooperation between the Research Group on Artificial Intelligence in Szeged and the Software Technology Laboratory of Nokia Research Center. Columbus is able to analyze large C/C++ projects and to extract their UML class model as well as conventional call graphs. The main motivation for developing the Columbus system has been to create a general framework for combining a number of reverse engineering tasks and to provide a common interface for them. Thus, Columbus is a framework tool which supports project handling, data extraction, data representation, data storage, filtering and visualization.

Team members:

  • Rudolf Ferenc, MSc. - Developer of Columbus and the AST.
  • Arpad Beszedes, MSc. - Developer of CAN (C++ Analyzer).
  • Tibor Gyimothy, PhD. - Supervisor.

Section 2. Experience Report

Columbus can analyze preprocessed C++ source code (for non-preprocessed files it invokes an external preprocessor), so the first thing to do was to get Borland C++ Builder 5. After successfully preprocessing Sortie, we begun the analysis.

The first attempt did not bring the expected results, because Sortie's GUI uses the VCL (Visual Component Library), which heavily uses Borland's C++ extensions (e.g. keywords like '__property' and '__published'). On the other hand, Columbus handles "only" ANSI C++ and Microsoft's extensions. So we had to extend Columbus to handle Borland's dialect.

In the mean time we sustained great efforts to make our C++ schema better. We learned a lot from our common paper with Susan Elliott Sim, Richard C. Holt and Rainer Koschke: "Towards a Standard Schema for C/C++" (to be presented at WCRE 2001). We had also productive discussions with Jürgen Ebert, Andreas Winter and Volker Riediger from the University of Koblenz-Landau. The resulting C++ schema seems to be a lot more useful than the old one (it will be documented and published shortly). We extended Columbus as well, so it can now export its AST into GXL according to our C++ schema.

As soon as the C++ analyzer was ready, we parsed the Sortie source code and exported the results into GXL. The result is a 66MB large file (2546 classes, 15235 functions and 14054 attributes) that contains all information including STL and VCL! (The file was validated according to the GXL DTD.) Because it is very hard to deal with this amount of data, we filtered the AST to include only the classes from Sortie source code. The result is a 3MB large file (69 classes), which is unfortunately not valid, because there are references to types, which come from the standard headers that have been filtered out.

We send these files attached (sortie-gxl-in-columbus-schema-full.zip and sortie-gxl-in-columbus-schema-filtered.zip).

Please note, that Columbus does not yet deal with function bodies (statements and expressions), but it is an ongoing work; and the GXL export is not complete, i.e. it does not export template parameter lists (all type references to template parameters are pointing to a dummy typedef).

Section 3. Collaboration Partners

Because we joined the collaborative demonstration a little bit late, we did not have collaborative partners. We are placing our results at everybody's disposal for further analyses.

Section 4. Solution to tasks

The task of Columbus is to parse the C++ source code and to produce input data for other tools (eventually to filter the data). Therefore, solutions for reengineering the Sortie system comprise of combining Columbus with these tools (e.g. visualisers and remodularisers).

Participation at WCRE 2001

We will demonstrate Columbus on the Tools Fair at WCRE 2001.