Simplified Syntax Trees

We have created a novel meta model for source code that can be used to capture self-contained snapshots of source-code that preserve typing information. Our design consists of three parts.

Naming Scheme for Code Elements The central part of our meta model is a naming scheme for fully-qualify references to types and type elements that allows us to unambiguously refer to source-code elements in object-oriented programs. For example, a method reference does not only contain the (simple) name, but also its declaring type, its parameter list (incl. types and parameter names), and so forth. Consider the method reference

[p:int] [a.b.c.MyType, MyProject].M([some.framework.IBla, fw, 1.2.3.4] foo)

that illustrates several points. First of all, members like this method follow the structure [<return type>] [<declaring type>].<simple name>(<parameter list>). Applied to the example, we can extract the information that the method M is defined in the "local" type a.b.c.MyType, MyProject. The type is local in the sense that it was defined in MyProject that is defined in the local workspace and is not in a referenced assembly. The method has one parameter foo that has the type some.framework.IBla, fw, 1.2.3.4. This type IBla is an example for a "non-local type" that is defined in the namespace some.framework of assembly fw and has the version 1.2.3.4. The return type of the method is an example for another special case. Predefined types like System.Integer32, mscorlib, 4.0.0.0 are shortened to the simple names that are used in C# code, e.g., int, bool, etc., prefixed by p:.

Type System Recovering typing information is expensive after the fact. We provide a lightweight infrastructure for type resolution that is based on type shapes, an abstraction for the structure and hierarchy of a type. It can be looked up for any type in our type table using a fully-qualified type name.

Simplified Syntax Trees At the lowest level, we capture a source-code representation of type declarations in a simplified syntax tree. The tree starts on the level of a type declaration and contains all declared members. We also store the contents of their bodies and go down to the expression level.

These three parts build the meta model that allowed us to conduct our experiments and the research presented in the application section.