Visualisation Research Group (VRG) Site

Graph Tool

Using GXL files and GraphTool visualisation for SORTIE analysis

Version: September 20th, 2001.

Dr. Claire Knight
Visualisation Research Group
Dept. Computer Science, University of Durham
Tel: +44 (0)191 374 2554
Fax: +44 (0)191 374 2560
C.R.Knight@durham.ac.uk

1. Introduction

The tool that has been used for this work is "GraphTool". This is a graph layout tool that is used internally in Computer Science at the University of Durham. It was (re)written in Java in 1999 by a VRG researcher who now works in industry.

Much of this work has involved the incorporation of GXL into the tool so that it can read and display GXL graphs, but also generate GXL files from the graph in memory. The tool already supported three other file formats, all developed at Durham, and the last during the Java rewrite of the tool. This file format is quite verbose and some of the attributes of this format that dealt with graphics were also included into the GXL generation of GraphTool so that they can be used in reloading or in other analysis tools.


Main Functionality
GraphTool supports or includes the following features:

  • Reads/Writes four file formats; .2dg, .gxl, .gin, .cll
  • Two layout automatic layout algorithms
  • User controlled dragging of nodes and edges
  • The use of colours for nodes and edges
  • The use of names on nodes and edges
  • Anonymous setting to obscure real names on graphs
  • Generation of postscript output of a graph
  • Generation of jpeg images of the graph
  • Various analysis assistance such as grouping of nodes (only supported by .2dg file format)
  • 100% view of a graph with a zoom/context window to also show the complete graph
GraphTool does not support particularly sophisticated layout automatic layout algorithms at this moment in time. Whilst that is a possible general area of development at Durham, the further focus of this research is to incorporate the GXL code developed and graph data structures into various visualisation tools within a framework. GraphTool itself will then need some modifications to fit into that framework.


Team
The only researcher at Durham who is working on this is Claire Knight. As well as working at Durham, this is where she received her degrees. A BSc (Hons) 1st in 1997 and a PhD in 2000.

She has had various coding experiences. During vacation work this involved the design and development of a works orders and tracking system for a building firm and an invoice and order reconciling software that took UNIX and HPX downloads (results of SQL) and then incorporated them with analysis for a printing firm.

She has been using Java since late 1996 and has written two program comprehension parsers for the language. The first was based on LALR and the second on LL(k) technology. The latter of these works with the current release of the language and is the one that is considered "in-use" internally. The various visualisation demonstrators created over the past four years have been written in C and Java.

Much of the summer was spent incorporating GXL into GraphTool. This required understanding the existing Java code (over 60 classes) and then making the necessary changes. Some prior user level experience with the tool helped. Having now been through the source code, much of the functionality is understood.



2. Experience Report

Most of the experience of this work to date (since June 2001) has been at a coding level by incorporating GXL into an existing tool and in the process creating code that can be used for (future) various visualisations. Whilst this is very useful longer term work it has meant that the real analysis of SORTIE has suffered. And because of the excessive amount of global conference attendance this summer by the single investigator!


Analysis
Obviously the format of the data being used affected the analysis approach. This research has been done without actually seeing the "real" SORTIE source code. Because of the time constraints and not having any general C or C++ analysis tools already in place in Durham the research has used the GXL output of the work by Sergey Marchenko from research by the KBRE Group. This consisted of higher level information about the methods and classes contained in the source code.

Because GraphTool is a generic graph application then there is no particular layout or analysis incorporated into the tool for use when looking at software and systems. This meant that the GXL graphs loaded with all nodes defaulted to a position of 0,0. Automatic layout provided a start, but to get some of the better layouts (on the smaller graphs) then hand-layout was used. For the larger graphs it was decided that it was easier to leave the layout as suggested by the automatic algorithms, even if there were lots of crossed lines and so forth.

Some of the graphs appeared to have lots of links to nodes that appeared to show that comments were present. However the comment itself was not contained within the GXL and thus not available from the graph. It is considered a possible clean-up analysis task to remove these from the various graphs (one GXL file per source file) but this would have to be done manually.

Loading (and saving) graphs in GraphTool is dependent on their size as the number of nodes and edges to be processed obviously dictates the time taken. Some of the larger graphs also cause an out of memory error from the jpeg generation code (from within the library). Whether this is due to the use of com.sun.* classes (jpeg supported is apparently going to be standard API from JDK 1.4...) or for some other reason has not been fully investigated.

There are a lot of type nodes in the GXL used for this analysis. Whilst the inmporting of the information makes this distinction from named nodes using colour, it may be that the removal of some of these (or a reduction in edges) may make for much clearer graphs.

Because of the size of the SORTIE system (and therefore directly the graphs) and the non-specific nature of GraphTool the analysis is quite time consuming. However, these were both anticipated. It just goes to show that various aggregations and other visualisations are necessary for effectively looking at any software/system, and that multiple views in different styles (i.e. graphs with other things) can be of great use.


GXL Attributes
Related to the analysis of the SORTIE GXL files is the use of various graphical attributes within GXL files by GraphTool. Some "known" attributes that are used internally by the application and also saved/loaded by the .2dg format have been utilised for GXL when saving the graphs which include graphical values in the node and edge attribute lists (usually stored as strings with specific names). For example positions in a 3D co-ord space. It is appreciated that GraphTool works on 2D basis (therefore always z = 0) but these same named attributes could be used by any GXL visualisation tool that chooses to recognise them. Also any GXL generated by another tool could have the same properties if they are generated read into GraphTool, although some adaptation to deal with new layout (for example) may be necessary.

For nodes these attributes are:

  • nodeName
  • position
  • color
  • borderColor
  • textColor
An example of this taken from a GXL file generated by GraphTool is:
<attr name="nodeName">
        <string>TComponent</string>
</attr>
<attr name="position">
        <string>415 -24 0</string>
</attr>
<attr name="color">
        <string>0 255 0</string>
</attr>
<attr name="borderColor">
        <string>0 0 0</string>
</attr>
<attr name="textColor">
        <string>0 0 0</string>
</attr>
	  
Even though the co-ordinate system works in three integer dimensions it was considered easier to store the three values in a space separated string so that only one attribute was required. The same principle applies to the RGB triplets used to represent the colours.

For edges the following attributes are stored:

  • edgeName
  • numArcs
  • kink
  • dockFrom
  • dockTo
  • color
  • textColor
  • fromColor
  • toColor
  • fromArrow
  • toArrow
An example of this taken from a GXL file generated by GraphTool is:
<attr name="edgeName">
        <string>dmmschema.xsd#Includes</string>
</attr>
<attr name="numArcs">
        <int>1</int>
</attr>
<attr name="kink">
        <string>-129 153</string>
</attr>
<attr name="color">
        <string>0 0 0</string>
</attr>
<attr name="textColor">
        <string>0 0 0</string>
</attr>
<attr name="fromColor">
        <string>0 0 0</string>
</attr>
<attr name="toColor">
        <string>0 0 0</string>
</attr>
<attr name="fromArrow">
        <string>5 5</string>
</attr>
<attr name="toArrow">
        <string>5 5</string>
</attr>
	  
Again the color fields are RGB triplets. The arrow attributes are concerned with the size of arrowheads and relates more to GraphTool specifically than some of the other attributes. The same is true of the docking values (no example seen). The number of arcs is a GraphTool mechanism for reducing clutter. If there is an edge with the same source and destination nodes and the same edgeName as any existing edge, the edge itself is not added, but the number of arcs along the existing edge is increased. The kink value is just x,y co-ordinates for putting a bend into the edge when trying to achieve nicer layouts. By recording the kink position this can be restored along with the saved node positions to maintain the layout when a graph is reloaded.



3. Collaboration

There seem to be several tools that are peer to GraphTool based on the participant information on the website. Essentially any visual or visualisation styled tool is a peer. Ideally (from a VRG/Durham perspective) it would have been good to have the time to create much better visualisation capabilities but now there is the necessary parsing/loading code, saving code, data structures, etc. this will hopefully be a future enhancement. This should also include properly (re)implementing VIBRO to include the GXL support.

Any tool that produces GXL (visual, data processing, or otherwise) can be considered to be a complementary tool for GraphTool. As discussed above, this work relied on the use of GXL facts about the SORTIE source code obtained from Sergey Marchenko and Tim Lethbridge. That is an evident benefit of having such a file format; it eases some of the issues of tool interoperability.



4. Solution to Tasks

Since the real analysis of SORTIE has been neglected due to time, very little has been discovered that would provide enough evidence for any recommendations. Mostly superficial information which many tools and analysis will also have discovered is the current main result. Over the next two weeks (between the version date at the top of this report and WCRE 2001) it is hoped more detailed information can be found, perhaps for a subset of the source files. This process will also help to illustrate where GraphTool can be of use for analysis and also to identify shortcomings which future visualisation work should seek to address.

An example of one of the graphs read into GraphTool from the original GXL provided is included at this point for illustration of the colours and layout (Figure 1). It is intentionally a small file. It should make clearer the node/edge attributes discussed above. This layout has also been preserved when resaving the GXL file.

Example GXL Graph from SORTIE
Figure 1 - about.cpp from GXL files of prior analysis

A design decision (based on the existing capabilities and structure of GraphTool) was made based on the GXL definition allowing multiple graphs in one file. This was that because SAX XML parsing was being used it was easier to parse the whole file and create each graph, and then ask the user which they wanted to view. GraphTool only supports viewing of one graph at any time. This worked well for smaller examples but did cause two problems when using the SORTIE analysis file that contained the GXL (graphs) for the entire software system:

  • Any cross graph edges are not dealt with
  • Memory problems
The first of these could be dealt with in an enhancement to the code. A probable future solution is to insert a duplicate node into the current graph to enable the edge to exist, with the proviso that the required node already exists in one of the previously parsed graphs from the current GXL file. This can be indicated to users of the graph through highlighting the node by setting its colour to red, and by appending the graph from which it originates to the node name, for example:
nodeName (graphName)
The second only occurs when processing very large files and Java reports an out of memory exception. This just causes GraphTool to stop loading the file and is reported in the console window of the application and on the command line (if one is visible). The GXL containing all of the SORTIE analysis is approximately 34Mb in size, and partly because of the object based data structures used by GraphTool (i.e. not as efficient as low level C) and the temporary storage necessary when analsysing a GXL file this fails to finish loading to provide the user with a choice of which graph to view. This experiment did however highlight the need for a "Parsing, please wait...!" dialogue boxfor for GraphTool!

A different approach would be to merge multiple graphs in one GXL file (based on user preference or dialogue on load?), but to restore or maintain this level of separation when saving would require an extra field for each node/edge so that they could be saved in the same graph as they originated from. It may be that this alteration is necessary. Given the size of the data and the inherited (no OO pun intended) implementation of GraphTool this may not prevent the memory errors. This would have proved useful from an analysis perspective because none carried out so far has provided a high level view in which to begin to hang any detailed analysis off.

An interesting outcome of trying to analyse SORTIE with GraphTool is that because the GXL came from elsewhere (and no source code has been seen) no assumptions can be made. Any analysis relies solely on the GXL provided and any tools which can utilise them. This has proved an interesting exercise from a program comprehension perspective because of the way in which it highlights what tool support is necessary. Much of GXL provided is detailed enough to give enough information for recommendations and so forth, but the tool support does not make this particularly easy.

Having spent sometime tying to clean up some randomly selected graphs using GraphTool it became obvious that not only were different visualisations required to complement the graph views, but also some analysis aids at the graph level. When loading a GXL graph, GraphTool uses two colours for the node. All nodes are given their type as a name to start with. If any name is then provided as an attribute this replaces the type. At this point, the colour of the node is then set to green. The remaining nodes are left with the default node colour (specified by user preferences in GraphTool (although on saving a GXL file with graphical attributes intact this colour is saved for future use). This was helpful when looking at the graphs because it actually showed (unintentionally) where collapsing nodes and joining edges, or other visualisations encoding the information, would be useful. A lot of the clutter in the graphs stemmed from nodes (and hence edges) that essentially provided some contains/declares information. Another slight drawback is that the types of the nodes and edges were all prefixed with
dmmSchema.xsd#
which was not available. This may provide some layout assumptions when using tools that are able to incorporate this xsd file. It highlights the need for specialist processing for some graphs which may only be available in software visualisation specific tools (rather than a general graph oriented tool), or that should be provided in general tools in an abstract manner. It is possible to select nodes by hand and delete them, but what would have have been preferable would have been to be able to do was to search for all (for example) nodes of dmmSchema.xsd#sourcePart and then to remove the node but merge the edges, not remove them as delete does. This could have been achieved with some additional analysis options on the various GraphTool menus. It also highlights the siutations where other visualisations may automatically provide some of that analysis through aggretion of information to generate a particular glyph with colours, sizing, position, and orientation based on connectivity and attributes of that information item.

Because many of the graphs, and certainly those of any realistic size, were very clutted when viewed this hindered any real analsyis judgements about the SORTIE system. The only removal of clutter that was made easy was node deletion which deleted edges. This then removed links that did and should have existed - thus making the understanding harder!

The traditional phrase at this point is very appropriate; "watch this space...!" ;)

To the Collaborative Demo Website