Motivation

Nowadays, APIs are everywhere. Developers use libraries and frameworks every day, but learning a new API is a tedious task for the developer. And even if a developer knows how to use an API, memorizing all the available best-practices is very hard, considering the sheer amount of technologies that is integrated in a typical project.

We are working in the field of recommendation systems for software engineering. With our tools, we make it easier for developers to learn new APIs and to find their way to the API jungle. In this project, we focus on two major research directives.

Revolutionize the Evaluation

So far, most work in the area of recommendation systems for software engineering is evaluated with artificial evaluation strategies. Some works are evaluated or in experiments that include humans, but these experiments do not create artifacts that could be reused for the evaluation of other recommendation systems. We want to change this!

Our core idea is to instrument the integrated development environment and esp. the code completion to capture a lot of information about how developers work. For example, we capture a fine-grained history of the files on which the developer worked, together with information about the usage of the code completion tool, e.g., which proposals have been looked at or which one was selected in the end.

The open research question that we would like to analyze now, is whether this usage data can be conserved in an evaluation dataset that is applicable for different kinds of recommenders. A related question is to compare the results to previous artificial evaluations.

Include Experts

It is a challenge for developers to join an existing project. They are suddenly confronted with an existing stack of technology. Training them is usually very exhausting and tedious, because it is necessary to understand most of these frameworks before it is possible to implement new tasks efficiently.

Every project would like to guarantee a flow of knowledge from the experts to the novices, because an uneven distribution of knowledge among the team members is not desirable. Therefore, experts of a framework or of a project typically provide documentation, examples, or tutorials, which provide necessary information about the correct usage of all technologies. Another option is to transfer the knowledge directly by working on a task together with a novice. Both decrease the start-up effort and help to mitigate problems. However, experts do not have unlimited availability so the created documentation and the available time for pair-programming is naturally limited.

One approach to solve this issue is to support developers during their work directly in their integrated development environment (IDE) with tools that provide help. Training developers becomes easier, but also experienced developers benefit from those tools. Efficiency of developers can be increased with proper support, esp. for re-occurring tasks. Tools like this do already exist that proof the concept. These tools can be assigned into two categories:

  • A classical approach is to provide rule-sets or examples, which are manually curated by experts.
  • Newer approaches analyze large amounts of examples with automated tools and learn in this process how to support the developers.

However, both approaches have significant drawbacks, if used separately: The usefulness of tools from the first category is usually limited by the availability of the experts. This results in a limited scope or coverage of those tools and high maintenance costs, at the same time. Tools of the second category usually gain their knowledge through a statistical analysis. Therefore, they accept anti-patterns that have a significant relevance in the analyzed dataset.

The KaVE project analyses the feasibility of a combination of both categories in a novel approach. The idea is to combine the advantages while, at the same time, avoid or at least mitigate the drawbacks. Experts should play an important role in the learning process. They should be able to directly influence the results. This should reduce the error rates of automated approaches as well as the burden of the expert. Work is done faster and with higher quality.