Introduction

In this notebook, I'll show you a way how to you can connect JavaDoc comments with the Java nodes of jQAssistant's scan result. I'll elaborate the way to the solutions, because I hope that you can do a similar problem solving analysis (aka XML importing and wrangling) on your own.

Context

In a blog post, Yoann Buch got me to thinking about a how to add comments to the already existing class nodes in Neo4j scanned by jQAssistant). I meant it would be possible to do it with the Python library Pygments. I experimented with it a little bit, but it seems that it isn't going well. The main problem is the lack of structural information.

Idea

I thought "wait a minute, what about the JavaDoc that's generated in HTML" and I thought one step further: "If there are generators for HTML, is there a generator for XML, too?". I just googled "javadoc xml" and found it: https://github.com/MarkusBernhardt/xml-doclet, "A doclet to output javadoc as XML" as Maven plugin.

So the journey began...

Implementation

Preparing the code

In this prototype, I work with an old jQAssistant version 1.1.3 (because I haven't figured out how to read XML like described in Dirk's answer at StackOverflow with the new version).

I use an corresponding, old version of jQAssistant's Spring Petclinic demo repo:

git checkout f5811bf2ed9c5369a749cb90ef9e7a261de03760 .

Getting JavaDoc as XML

For getting all the JavaDoc from the source code, I just add the Maven plugin mentioned above additionally to the already existing ones:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-javadoc-plugin</artifactId>
    <version>2.10.4</version>
    <executions>
        <execution>
            <id>xml-doclet</id>
            <phase>process-resources</phase>
            <goals>
                <goal>javadoc</goal>
            </goals>
            <configuration>
                <doclet>com.github.markusbernhardt.xmldoclet.XmlDoclet</doclet>
                <additionalparam>-d ${project.build.directory} -filename javadoc.xml</additionalparam>
                <useStandardDocletOptions>false</useStandardDocletOptions>
                <docletArtifact>
                    <groupId>com.github.markusbernhardt</groupId>
                    <artifactId>xml-doclet</artifactId>
                    <version>1.0.5</version>
                </docletArtifact>
            </configuration>
        </execution>
    </executions>
</plugin>

Scanning the JavaDoc XML

In theory, I now have just to add this file to jQAssistant's scan configuration as <scanInclude>:

<scanInclude>
    <path>${project.build.directory}/javadoc.xml</path>
    <scope>xml:document</scope>
</scanInclude>

I've added the <scope> to adivse jQAssistant to scan the whole XML content (according to this StackOverflow answer).

Note: Unfortunately I haven't got this working as of today, i.e. the javadoc.xml isn't appearing in the database. So I've scanned the XML file in the target folder with an

jqassistant.sh scan -f xml:document::javadoc.xml

manually after scanning the project. This approach won't work with jQAssistant version 1.1.4+, so that's why I'm using an old version. I'll update this notebook when I've got this working with the newest version.

First build

I just built the complete project:

mvn clean install

jQAssistant places some nice graphs into the Neo4j database:


In [3]:
import lib.neo4jupyter_mod as n4j
import py2neo
graph = py2neo.Graph()

n4j.init_notebook_mode()
n4j.draw(graph, 
         n='n:Class { name: "Pet"}',
         r="r:DEPENDS_ON", 
         m="m:Class", 
         options={"Class": "name"}, 
         limit=5)


Out[3]:

Additionaly, the build outputs a nice XML file named javadoc.xml with all the existing comments into the target folder

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root>
    <package name="org.springframework.samples.petclinic">
        <class name="PetclinicInitializer"
               qualified="org.springframework.samples.petclinic.PetclinicInitializer"
               scope="public"
               abstract="false"
               error="false"
               exception="false"
               externalizable="false"
               included="true"
               serializable="false">
            <comment>In Servlet 3.0+ [..]</comment>
            <tag name="@author"
                 text="Antoine Rey"/>
            <class qualified="org.springframework.web.servlet.support.AbstractDispatcherServletInitializer"/>
            <constructor name="PetclinicInitializer"
                         signature="()"
                         qualified="org.springframework.samples.petclinic.PetclinicInitializer"
                         scope="public"
                         final="false"
                         included="true"
                         native="false"
                         synchronized="false"
                         static="false"
                         varArgs="false"/>
            <method name="onStartup"
                    signature="(javax.servlet.ServletContext)"
                    qualified="org.springframework.samples.petclinic.PetclinicInitializer.onStartup"
                    scope="public"
                    abstract="false"
                    final="false"
                    included="true"
                    native="false"
                    synchronized="false"
                    static="false"
                    varArgs="false">
                <parameter name="servletContext">
                    <type qualified="javax.servlet.ServletContext"/>
                </parameter>
                <return qualified="void"/>
                <exception qualified="javax.servlet.ServletException"/>
                <annotation name="Override"
                            qualified="java.lang.Override"/>
            </method>
...

I scanned this file already, i. e. it's also contained in the database. Let's have a look at it!

The corresponding texts for the comments are in the children of the <comment> elements:


In [4]:
n4j.draw(graph, 
         n='n:Element { name: "comment"}',
         m='m:Text',
         options={"Element": "name", "Text": "value"}, 
         limit=10)


Out[4]:

But there is one problem left: The XML scanner scans all elements and places each element into a separate node. So if we have some HTML formatting in the JavaDoc like:

<comment>
    &lt;code&gt;Validator&lt;/code&gt; for &lt;code&gt;Pet&lt;/code&gt; forms.
    &lt;p&gt;
    We're not using Bean Validation annotations here because it is easier to define such validation rule in Java.
    &lt;/p&gt;
</comment>

This leads to an "interesting" looking graph:

There's an easy solution for this, but let's first start with graph database action and solve that problem later.

Data Wrangling with Neo4j


In [5]:
"""
MATCH 
(element:Element)-[:HAS_ELEMENT]->(Element { name : "comment"})-[:HAS_TEXT]->(doc_text:Text),
(element)-[:HAS_ATTRIBUTE]->(building_block:Attribute{name: "qualified"})
OPTIONAL MATCH
(element)-[:HAS_ATTRIBUTE]->(signature:Attribute{name: "signature"}),
(element)-[:HAS_ELEMENT]->(t:Element{name: "return"})-[:HAS_ATTRIBUTE]->(return_type:Attribute{name: "qualified"})
WHERE element.name =~ "(method|class|interface|constructor)"
WITH
  id(element) as id,
  // class and method
  CASE element.name
    WHEN "method"
    THEN return_type.value + " " + SPLIT(building_block.value, ".")[-1] + signature.value
    WHEN "constructor"
    THEN "void <init>()"
  END as signature,

  CASE element.name
    WHEN "method"
    THEN SUBSTRING(building_block.value, 0, $subLength)SPLIT(building_block.value, ".")
    ELSE building_block.value
  END as type_fqn,

  return_type.value as r,
  reduce(s = "", x IN collect(doc_text) | s + x.value) as comment

RETURN id, type_fqn, signature, comment
"""


Out[5]:
'\nMATCH \n(element:Element)-[:HAS_ELEMENT]->(Element { name : "comment"})-[:HAS_TEXT]->(doc_text:Text),\n(element)-[:HAS_ATTRIBUTE]->(building_block:Attribute{name: "qualified"})\nOPTIONAL MATCH\n(element)-[:HAS_ATTRIBUTE]->(signature:Attribute{name: "signature"}),\n(element)-[:HAS_ELEMENT]->(t:Element{name: "return"})-[:HAS_ATTRIBUTE]->(return_type:Attribute{name: "qualified"})\nWHERE element.name =~ "(method|class|interface|constructor)"\nWITH\n  id(element) as id,\n  // class and method\n  CASE element.name\n    WHEN "method"\n    THEN return_type.value + " " + SPLIT(building_block.value, ".")[-1] + signature.value\n    WHEN "constructor"\n    THEN "void <init>()"\n  END as signature,\n\n  CASE element.name\n    WHEN "method"\n    THEN SUBSTRING(building_block.value, 0, $subLength)SPLIT(building_block.value, ".")\n    ELSE building_block.value\n  END as type_fqn,\n\n  return_type.value as r,\n  reduce(s = "", x IN collect(doc_text) | s + x.value) as comment\n\nRETURN id, type_fqn, signature, comment\n'

In [6]:
create_comment_nodes_for_types="""
MATCH 
(element)-[:HAS_ELEMENT]->(class_comment:Element { name : "comment"})-[:HAS_TEXT]->(doc_text:Text),
(element:Element)-[:HAS_ATTRIBUTE]->(qualified:Attribute{name: "qualified"})
OPTIONAL MATCH
(element)-[:HAS_ATTRIBUTE]->(signature:Attribute{name: "signature"})
WHERE element.name =~ "(method|class|interface|constructor)"
WITH
  id(element) as id,
  element.name as type,
  // class and method
  CASE WHEN signature.value IS NULL THEN qualified.value ELSE qualified.value+signature.value END as key,
  reduce(s = "", x IN collect(doc_text) | s + x.value) as text

RETURN id, type, key, text
"""
graph.data(create_comment_nodes_for_types)


Out[6]:
[]

In [7]:
create_comment_nodes_for_classes="""
MATCH 
(package:Element { name : "package"})
-[:HAS_ELEMENT]->
(class:Element {name : "class"})
-[:HAS_ATTRIBUTE]->
(class_fqn:Attribute{name: "qualified"}),
(class_comment:Element { name : "comment"})
-[:HAS_TEXT]->(text:Text),
class-[:HAS_ELEMENT]->(class_comment)

WITH DISTINCT 
  class.name as type_value, 
  class_fqn.value as fqn_value, 
  reduce(s = "", x IN collect(text) | s + x.value) as text_value
  
CREATE (javadoc:JavaDoc { comment: text_value, type: type_value, fqn: fqn_value })

RETURN COUNT(javadoc)
"""
graph.data(create_comment_nodes_for_classes)


Out[7]:
[{'COUNT(javadoc)': 0}]

In [8]:
create_relationship_query="""
MATCH (type:Type), (javadoc:JavaDoc)
WHERE type.fqn = javadoc.fqn
MERGE (javadoc)-[r:COMMENTS]->(type)
RETURN COUNT(r) as rels
"""
graph.data(create_relationship_query)


Out[8]:
[{'rels': 0}]

In [9]:
delete_comments_query="""
MATCH (javadoc:JavaDoc)-[r:COMMENTS]->()
DELETE r, javadoc
RETURN COUNT(r), COUNT(javadoc)
"""
graph.data(delete_comments_query)


Out[9]:
[{'COUNT(javadoc)': 0, 'COUNT(r)': 0}]