Visualisation Research Group Dept. Computer Science, University of Durham Tel: +44 (0)191 374 2554 Fax: +44 (0)191 374 2560 C.R.Knight@durham.ac.uk |
1. IntroductionThe tool that has been used for this work is "GraphTool". This is a graph layout tool that is used internally in Computer Science at the University of Durham. It was (re)written in Java in 1999 by a VRG researcher who now works in industry. Much of this work has involved the incorporation of GXL into the tool so that it can read and display GXL graphs, but also generate GXL files from the graph in memory. The tool already supported three other file formats, all developed at Durham, and the last during the Java rewrite of the tool. This file format is quite verbose and some of the attributes of this format that dealt with graphics were also included into the GXL generation of GraphTool so that they can be used in reloading or in other analysis tools.
Main Functionality
Team She has had various coding experiences. During vacation work this involved the design and development of a works orders and tracking system for a building firm and an invoice and order reconciling software that took UNIX and HPX downloads (results of SQL) and then incorporated them with analysis for a printing firm. She has been using Java since late 1996 and has written two program comprehension parsers for the language. The first was based on LALR and the second on LL(k) technology. The latter of these works with the current release of the language and is the one that is considered "in-use" internally. The various visualisation demonstrators created over the past four years have been written in C and Java. Much of the summer was spent incorporating GXL into GraphTool. This required understanding the existing Java code (over 60 classes) and then making the necessary changes. Some prior user level experience with the tool helped. Having now been through the source code, much of the functionality is understood. 2. Experience ReportMost of the experience of this work to date (since June 2001) has been at a coding level by incorporating GXL into an existing tool and in the process creating code that can be used for (future) various visualisations. Whilst this is very useful longer term work it has meant that the real analysis of SORTIE has suffered. And because of the excessive amount of global conference attendance this summer by the single investigator!
Analysis Because GraphTool is a generic graph application then there is no particular layout or analysis incorporated into the tool for use when looking at software and systems. This meant that the GXL graphs loaded with all nodes defaulted to a position of 0,0. Automatic layout provided a start, but to get some of the better layouts (on the smaller graphs) then hand-layout was used. For the larger graphs it was decided that it was easier to leave the layout as suggested by the automatic algorithms, even if there were lots of crossed lines and so forth. Some of the graphs appeared to have lots of links to nodes that appeared to show that comments were present. However the comment itself was not contained within the GXL and thus not available from the graph. It is considered a possible clean-up analysis task to remove these from the various graphs (one GXL file per source file) but this would have to be done manually. Loading (and saving) graphs in GraphTool is dependent on their size as the number of nodes and edges to be processed obviously dictates the time taken. Some of the larger graphs also cause an out of memory error from the jpeg generation code (from within the library). Whether this is due to the use of com.sun.* classes (jpeg supported is apparently going to be standard API from JDK 1.4...) or for some other reason has not been fully investigated. There are a lot of type nodes in the GXL used for this analysis. Whilst the inmporting of the information makes this distinction from named nodes using colour, it may be that the removal of some of these (or a reduction in edges) may make for much clearer graphs. Because of the size of the SORTIE system (and therefore directly the graphs) and the non-specific nature of GraphTool the analysis is quite time consuming. However, these were both anticipated. It just goes to show that various aggregations and other visualisations are necessary for effectively looking at any software/system, and that multiple views in different styles (i.e. graphs with other things) can be of great use.
GXL Attributes For nodes these attributes are:
<attr name="nodeName">
<string>TComponent</string>
</attr>
<attr name="position">
<string>415 -24 0</string>
</attr>
<attr name="color">
<string>0 255 0</string>
</attr>
<attr name="borderColor">
<string>0 0 0</string>
</attr>
<attr name="textColor">
<string>0 0 0</string>
</attr>
Even though the co-ordinate system works in three integer dimensions it
was considered easier to store the three values in a space separated
string so that only one attribute was required. The same principle applies
to the RGB triplets used to represent the colours.
For edges the following attributes are stored:
<attr name="edgeName">
<string>dmmschema.xsd#Includes</string>
</attr>
<attr name="numArcs">
<int>1</int>
</attr>
<attr name="kink">
<string>-129 153</string>
</attr>
<attr name="color">
<string>0 0 0</string>
</attr>
<attr name="textColor">
<string>0 0 0</string>
</attr>
<attr name="fromColor">
<string>0 0 0</string>
</attr>
<attr name="toColor">
<string>0 0 0</string>
</attr>
<attr name="fromArrow">
<string>5 5</string>
</attr>
<attr name="toArrow">
<string>5 5</string>
</attr>
Again the color fields are RGB triplets. The arrow attributes are
concerned with the size of arrowheads and relates more to GraphTool
specifically than some of the other attributes. The same is true of the
docking values (no example seen). The number of arcs is a GraphTool
mechanism for reducing clutter. If there is an edge with the same
source and destination nodes and the same edgeName as any existing edge,
the edge itself is not added, but the number of arcs along the existing
edge is increased. The kink value is just x,y co-ordinates for putting a
bend into the edge when trying to achieve nicer layouts. By recording the
kink position this can be restored along with the saved node positions to
maintain the layout when a graph is reloaded.
3. CollaborationThere seem to be several tools that are peer to GraphTool based on the participant information on the website. Essentially any visual or visualisation styled tool is a peer. Ideally (from a VRG/Durham perspective) it would have been good to have the time to create much better visualisation capabilities but now there is the necessary parsing/loading code, saving code, data structures, etc. this will hopefully be a future enhancement. This should also include properly (re)implementing VIBRO to include the GXL support. Any tool that produces GXL (visual, data processing, or otherwise) can be considered to be a complementary tool for GraphTool. As discussed above, this work relied on the use of GXL facts about the SORTIE source code obtained from Sergey Marchenko and Tim Lethbridge. That is an evident benefit of having such a file format; it eases some of the issues of tool interoperability. 4. Solution to TasksSince the real analysis of SORTIE has been neglected due to time, very little has been discovered that would provide enough evidence for any recommendations. Mostly superficial information which many tools and analysis will also have discovered is the current main result. Over the next two weeks (between the version date at the top of this report and WCRE 2001) it is hoped more detailed information can be found, perhaps for a subset of the source files. This process will also help to illustrate where GraphTool can be of use for analysis and also to identify shortcomings which future visualisation work should seek to address.
An example of one of the graphs read into GraphTool from the
original GXL provided is included at this point for illustration of
the colours and layout (Figure 1). It is intentionally a small file.
It should make clearer the node/edge attributes discussed above. This
layout has also been preserved when resaving the GXL file.
Figure 1 - about.cpp from GXL files of prior analysis A design decision (based on the existing capabilities and structure of GraphTool) was made based on the GXL definition allowing multiple graphs in one file. This was that because SAX XML parsing was being used it was easier to parse the whole file and create each graph, and then ask the user which they wanted to view. GraphTool only supports viewing of one graph at any time. This worked well for smaller examples but did cause two problems when using the SORTIE analysis file that contained the GXL (graphs) for the entire software system:
nodeName (graphName) The second only occurs when processing very large files and Java reports an out of memory exception. This just causes GraphTool to stop loading the file and is reported in the console window of the application and on the command line (if one is visible). The GXL containing all of the SORTIE analysis is approximately 34Mb in size, and partly because of the object based data structures used by GraphTool (i.e. not as efficient as low level C) and the temporary storage necessary when analsysing a GXL file this fails to finish loading to provide the user with a choice of which graph to view. This experiment did however highlight the need for a "Parsing, please wait...!" dialogue boxfor for GraphTool! A different approach would be to merge multiple graphs in one GXL file (based on user preference or dialogue on load?), but to restore or maintain this level of separation when saving would require an extra field for each node/edge so that they could be saved in the same graph as they originated from. It may be that this alteration is necessary. Given the size of the data and the inherited (no OO pun intended) implementation of GraphTool this may not prevent the memory errors. This would have proved useful from an analysis perspective because none carried out so far has provided a high level view in which to begin to hang any detailed analysis off. An interesting outcome of trying to analyse SORTIE with GraphTool is that because the GXL came from elsewhere (and no source code has been seen) no assumptions can be made. Any analysis relies solely on the GXL provided and any tools which can utilise them. This has proved an interesting exercise from a program comprehension perspective because of the way in which it highlights what tool support is necessary. Much of GXL provided is detailed enough to give enough information for recommendations and so forth, but the tool support does not make this particularly easy.
Having spent sometime tying to clean up some randomly selected graphs
using GraphTool it became obvious that not only were different
visualisations required to complement the graph views, but also some
analysis aids at the graph level. When loading a GXL graph, GraphTool
uses two colours for the node. All nodes are given their type as a name
to start with. If any name is then provided as an attribute this replaces
the type. At this point, the colour of the node is then set to green.
The remaining nodes are left with the default node colour (specified by
user preferences in GraphTool (although on saving a GXL file with
graphical attributes intact this colour is saved for future use). This
was helpful when looking at the graphs because it actually showed
(unintentionally) where collapsing nodes and joining edges, or other
visualisations encoding the information, would be useful. A lot of the
clutter in the graphs stemmed from nodes (and hence edges) that essentially
provided some contains/declares information. Another slight drawback is
that the types of the nodes and edges were all prefixed with
Because many of the graphs, and certainly those of any realistic size, were very clutted when viewed this hindered any real analsyis judgements about the SORTIE system. The only removal of clutter that was made easy was node deletion which deleted edges. This then removed links that did and should have existed - thus making the understanding harder! The traditional phrase at this point is very appropriate; "watch this space...!" ;) |