Data Mining with Cubist
Data mining is all about extracting patterns from an organization's
stored or warehoused data.
These patterns can be used to gain insight into aspects of the organization's
operations, and to predict outcomes for future situations as an
aid to decision-making.
Cubist builds rule-based predictive
models that output values, complementing
that predicts categories.
For instance, See5/C5.0 might classify the percentage yield from some
process as "high", "medium", or "low", whereas Cubist
would output a number such as "7.3".
Cubist is a powerful tool for generating rule-based models
that balance the need for accurate prediction against the
requirements of intelligibility. Cubist models generally give better
results than those produced by simple techniques such as multivariate
while also being easier to understand than neural networks.
Some important features:
Cubist has been designed to
analyze substantial databases containing
hundreds of thousands of records and
tens to thousands of numeric or nominal fields.
If you have used neural networks or similar modeling tools,
you'll be surprised by Cubist's speed!
(Cubist also takes advantage of processors with up to eight cores in
one or more CPUs (including Intel Hyper-Threading)
to speed up model-building.)
To maximize interpretability,
Cubist models are expressed as
collections of rules, where each rule has an associated multivariate
linear model. Whenever a situation matches a rule's conditions,
the associated model is used to calculate the predicted value.
Cubist is available for Windows Xp/Vista/7/8 and Linux.
Cubist is easy to use and does not presume advanced knowledge
of Statistics or Machine Learning (although these don't hurt, either!)
RuleQuest provides C source code so that models
constructed by Cubist
can be embedded in your organization's own systems.
If you would like to learn more about Cubist or try out a demonstration
version of the system, here are some useful links:
Source code for a single-threaded Linux version of Cubist is
available under the
Please see the
A few sample applications show
the sorts of results achievable with Cubist.
Links to several publications by Cubist users are available
Tutorials describing and illustrating the use of Cubist
are available for the Windows and
Free demonstration versions (limited to small datasets) and the
public code to read and interpret Cubist models are
available from our downloads page.
If you have tried earlier versions of Cubist, here is a
precis of new features in Release 2.08.
© RULEQUEST RESEARCH 2012
Last updated July 2012