Leverage

The leverage of a rule is the number of additional cases covered by both the LHS and RHS above those expected if the LHS and RHS were independent of each other. This is a measure of the importance of the rule that reflects both the strength and the coverage of the rule.

The leverage of an itemset is the number of additional cases covered by the itemset above the maximum expected if any disjoint subsets of the itemset were independent of each other.  This is determined by forming all splits of the itemset into two subsets and calculating the expected coverage of the itemset assuming those subsets are independent.  This is the product of the coverage of each subset.  The leverage is the coverage of the itemset minus the maximum of these expected values.  The leverage of a one-element itemset is 0.0.  The leverage of a two-element itemset is the same as the leverage of a rule with one element as the LHS and the other as the RHS.  The leverage of a multi-element itemset can never be higher than the maximum leverage of a rule with one of the elements on the RHS and the remaining elements on the LHS.

Magnum Opus uses both proportional and count representations of leverage. The leverage count is the raw count of these cases. The leverage proportion is the leverage count divided by the total number of cases in the data.

For example, suppose that there are 1000 cases, the LHS or a rule covers 200 cases, the RHS covers 100 cases, and the RHS covers 50 of the cases covered by the LHS. The proportion of examples covered by both the LHS and RHS is 50/1000 = 0.05. The number of examples that would be expected to be covered by both the LHS and RHS if they were independent of each other is 200 * 100 / 1000 = 20. The leverage count is 50 minus 20 = 30. The leverage proportion equals the leverage count divided 1000 which equals 0.03.

For an example of itemset leverage, suppose that there are 1000 cases, A covers 500, B covers 400, C covers 300, A & B 333, A & C covers 300, B & C covers 300 and A & B & C covers 300.  The itemset A & B & C can be divided into each of the three following partitions

             A & B : C, expected coverage = 0.3330 × 0.3000 = 0.0999

             A & C : B, expected coverage = 0.3000 × 0.4000 = 0.1200

             B & C : A, expected coverage = 0.3000 × 0.5000 = 0.1500

The maximum expected coverage is 0.1500.  The actual coverage of A & B & C is 300 / 1000 = 0.3000.  The actual coverage minus the maximum expected coverage is 0.1500, and hence this is the leverage of the itemset.

 

© G I WEBB & ASSOCIATES 1999-2005 Last updated September 2005

home products download evaluations prices purchase contact us