Data Mining with Cubist
Data mining is all about extracting patterns from an organization's stored or warehoused data. These patterns can be used to gain insight into aspects of the organization's operations, and to predict outcomes for future situations as an aid to decision-making.
Cubist builds rule-based predictive models that output values, complementing See5/C5.0 that predicts categories. For instance, See5/C5.0 might classify the percentage yield from some process as "high", "medium", or "low", whereas Cubist would output a number such as "7.3".
Cubist is a powerful tool for generating rule-based models that balance the need for accurate prediction against the requirements of intelligibility. Cubist models generally give better results than those produced by simple techniques such as multivariate linear regression, while also being easier to understand than neural networks.
Some important features:
- Cubist has been designed to analyze substantial databases containing hundreds of thousands to millions of records and tens to thousands of numeric or nominal fields. If you have used neural networks or similar modeling tools, you'll be surprised by Cubist's speed! (Cubist also takes advantage of processors with up to eight cores in one or more CPUs (including Intel Hyper-Threading) to speed up model-building.)
- To maximize interpretability, Cubist models are expressed as collections of rules, where each rule has an associated multivariate linear model. Whenever a situation matches a rule's conditions, the associated model is used to calculate the predicted value.
- Cubist is available for Windows 8/10/11 and Linux.
- Cubist is easy to use and does not presume advanced knowledge of Statistics or Machine Learning (although these don't hurt, either!)
- RuleQuest provides C source code so that models constructed by Cubist can be embedded in your organization's own systems.
If you would like to learn more about Cubist or try out a demonstration version of the system, here are some useful links:
- Source code for a single-threaded Linux version of Cubist is available under the Gnu GPL. Please see the downloads page.
- Links to several publications by Cubist users are available here.
- Tutorials describing and illustrating the use of Cubist are available for the Windows and Unix versions.
- Free demonstration versions (limited to small datasets) and the public code to read and interpret Cubist models are available from our downloads page.
- If you have tried earlier versions of Cubist, here is a precis of new features in Release 2.10.
|© RULEQUEST RESEARCH 2020
|Last updated April 2022