Data Mining with Cubist
Data mining is all about extracting patterns from an organization's
stored or warehoused data.
These patterns can be used to gain insight into aspects of the organization's
operations, and to predict outcomes for future situations as an
aid to decision-making.
Cubist builds rule-based predictive
models that output values, complementing
See5/C5.0
that predicts categories.
For instance, See5/C5.0 might classify the percentage yield from some
process as "high", "medium", or "low", whereas Cubist
would output a number such as "7.3".
Cubist is a powerful tool for generating rule-based models
that balance the need for accurate prediction against the
requirements of intelligibility. Cubist models generally give better
results than those produced by simple techniques such as multivariate
linear regression,
while also being easier to understand than neural networks.
Some important features:
-
Cubist has been designed to
analyze substantial databases containing
hundreds of thousands of records and
tens to thousands of numeric or nominal fields.
If you have used neural networks or similar modeling tools,
you'll be surprised by Cubist's speed!
(Cubist also takes advantage of processors with quad cores, up to
four CPUs,
or Intel Hyper-Threading to speed up model-building.)
-
To maximize interpretability,
Cubist models are expressed as
collections of rules, where each rule has an associated multivariate
linear model. Whenever a situation matches a rule's conditions,
the associated model is used to calculate the predicted value.
-
Cubist is available for Windows 2000/Xp/Vista, Linux,
and Solaris.
-
Cubist is easy to use and does not presume advanced knowledge
of Statistics or Machine Learning (although these don't hurt, either!)
-
RuleQuest provides C source code so that models
constructed by Cubist
can be embedded in your organization's own systems.
If you would like to learn more about Cubist or try out
the system, here are some useful links:
-
A few sample applications show
the sorts of results achievable with Cubist.
-
Links to several publications by Cubist users are available
here.
-
Tutorials describing and illustrating the use of Cubist
are available for the Windows and
Unix versions.
-
Free demonstration versions (limited to small datasets) and the
public code to read and interpret Cubist models are
available from our downloads page.
-
An on-line form is provided for obtaining a
ten-day evaluation licence for Cubist.
-
If you have tried earlier versions of Cubist, here is a
precis of new features in Release 2.05.
|
© RULEQUEST RESEARCH 2008
|
Last updated March 2008
|