Association rules Viewer

This free software was developed in collaboration with 2 students of the DESS IAGL of Lille : D. Delautre and S. Demay.

Software

There exists some representations of association rules in the literature. These representations have the aim to facilitate for the user the exploitation of results given by rule mining algorithms. But the major representations are only available on commercial datamining software.
Our software ``Association rules viewer'' proposes three representations. The idea of such an application is to be able to compare rules obtained according to different criteria, and in our case, according to the optimization criteria J1 but also to other widely used measure: support (supp), confidence (conf), completeness (compl.), ... This application is independent of the rule mining algorithm and takes into input a XML file.

Input Format

For using Association Rules Viewer you must use an XML file in which are the association rules. For opening a file, simply choose "Open" in the "File" menu.

The rules DTD

The XML file must use this DTD:
<?xml version="1.0" encoding="iso-8859-15" ?>
<!ELEMENT RULES (RULE*)>
<!ELEMENT RULE (LHS, RHS, CRITERE*, CARD?)>
<!ELEMENT LHS (ITEM*)>
<!ELEMENT RHS (ITEM*)>
<!ELEMENT CARD (ATTR)>
<!ELEMENT ATTR (VALUE+)>
<!ELEMENT VALUE (ATTR*, DIMENSION?)>
<!ATTLIST CRITERE name CDATA #REQUIRED value CDATA #REQUIRED>
<!ATTLIST ITEM name CDATA #REQUIRED value CDATA #REQUIRED>
<!ATTLIST ATTR name CDATA #REQUIRED>
<!ATTLIST VALUE value CDATA #REQUIRED>
<!ATTLIST DIMENSION confidence CDATA #REQUIRED frequency CDATA #REQUIRED>


You can find some XML examples in the directory "xml" which is in your ARViewer directory.
The file is composed by a root element RULES which contains all the association rules you want to visualize with the software.
For each rule (element RULE) you must specify the description of the rule itself and the values of the criteria associated with it.
And if you want to use the third visualisation (Double Decker Plot), you must specify in the element CARD the specific informations for this visualisation.(as shown below) 

Get the DTD directly

Description of the rules:

The rule is divided in two parts: the left and the right which are represented respectively by the elements LHS and RHS. In the LHS element there can be several ITEM which represent the attributes of the condition of the rule. In the RHS element there can be only one ITEM element. The ITEM element is composed by his name and his value. In the value you must specify if the attribute is equals to, greater than, or lesser than the value by using the right sign ('=', '<', '>', '<=', '>=') like in this example:

Criteria:

The values of the criteria for this rule are represented by the elements CRITERE. For each one, there are the name and the value.

Specific informations for Double Decker Plot:

As it was said before, the informations for the Double Decker Plot visualisation are in the element named CARD. The content of this element is the XML representation of a tree composed by the different values of each attribute implicated in the rule. For example if, in a rule, there are 3 atrributes named attr1, attr2 and attr3 and each one have 2 possible values: val1 and val2, the tree will be the following: So for each possibility we have the frequency of this new rule and its confidence. The frequency is the number of rules in the database which have the same condition (in %) and the confidence is the number of theses rules that satisfy the conclusion (also in %).

An example

The rule IF SNP7=ab AND SNP16=aa AND SNP46=bb THEN Status=Affected IS encoded in XML by
<?xml version="1.0" encoding="ISO-8859-15"?>
<!DOCTYPE INFO SYSTEM "rules.dtd">
<RULES>
<RULE>
    <LHS>
      <ITEM name="SNP7" "value=ab" />
      <ITEM name="SNP16" "value=aa" />
      <ITEM name="SNP46" "value=bb" />
    </LHS>
    <RHS>
      <ITEM name="Status" "value=Affected" />
    </RHS>
    <CRITERE name="J1" value="0.0366" />
    <CRITERE name="supp" value="0.09" />
    <CRITERE name="compl" value="0.25" />
    <CRITERE name="conf" value="0.9167" />
  </RULE>
  </RULES>
  
  

Interface

The selection pane is divided in three parts, one for each visualisation. Its goal is to let you select the rules you want to see in each visualisation. It lets you also sort the rules by different criteria, like the number of atributes involved in the condition of the rule or by conclusion (a lexicographic sort on the conclusion) or by the criteria specified in the input file.

Figure 1. : Example of selection interface.

Like you can see on the screenshot, for the 3D Representation and the N Dimensonal Line, you can select several rules and have two buttons for selecting all the rules or unselecting them. You can sort the rules by selecting the sort under the list of rules. It is a combined sort, so in case of egality on the first sort, the second is used to separate the rules. For the 3D Representation, you must also select the criteria you want to see on the visualisation, because it is limited to two criteria.

The Visualizations

The 3D Visualisation

We have based our 3D representation on the one presented by Wong which is able to visualize ``many-to-one'' association rules. The rows of the matrix floor represent items and the columns represent item associations. The green and red blocks of each column (rule) represent the Condition (or antecedent) and the Prediction (or consequent). Identities of the items are shown along the right side of the matrix. As we want to be able to evaluate a rule not only thanks to the support and the confidence, we add to the 3D representation the opportunity for the user to see the different measures of quality compute by ASGARD. In the 3D representation, the blue and the cyan represent two chosen criteria. Our display system supports several sort functions to ameliorate the readability of the result. The display has a mouse-controlled zooming and rotation capability.


Figure 2. : Example of 3D vizualisation on some discovered rules by ASGARD.

The line visualization

The second visualization allows to compare several rules thanks to all the criteria computed by ASGARD for example. Each line represent a rule and each criteria is a point of the X-axis. The value of the criteria is plotted on the Y-axis.

Figure 3.: Exemple of line visualization of the different criteria
If you have selected more than 14 rules, the lines are all in black and it shows only the general comportement of your rules. 

Figure 3b. : Exemple of line visualization of the different criteria for more than 14 rules.
Like in the 3D Representation there is also a list of rules under the visualisation. You can select only one rule to visualize it on the Double Decker Plot.

The double decker plot Visualization

The third representation is a double-decker representation. It allows to visualize an association rule IF C THEN P by combining all the attributes involved in the left-hand-side selection C as explanatory variables and by drawing them within one mosaic plot and by visualizing the response P by highlighting the corresponding categories in a bar-chart. \\DD
Figure 3. : Example of Double decker vizualisation on some discovered rules by ASGARD

Interesting features

Printing

You can print the visualisation you want by clicking on the "Print" item in the "File" menu. It opens a print dialog depending on your printer. For the N Dimensional Line, the list of the rules represented is also printing for viewing to which line they correspond. This is made only if there are less than 15 rules reprensented.

Saving

ARViewer offers you also the possibility of saving a visualisation in a JPEG image. For that simply choose "Save as" in the "File" menu. You can now select where you want to save your image. For the N Dimensional Line, the list of the rules represented is also drawn on the image for viewing to which line they correspond. This is made only if there are less than 15 rules reprensented.