Data Mining Tools See5 and C5.0
Data mining is all about extracting patterns from an organization's
stored or warehoused data.
These patterns can be used to gain insight into aspects of the organization's
operations, and to predict outcomes for future situations as an
aid to decision-making.
Patterns often concern the categories to which
situations belong. For example, is a loan applicant creditworthy
or not? Will a certain segment of the population ignore a mailout or
respond to it? Will a process give high, medium, or low yield on
a batch of raw material?
See5 (Windows 2000/Xp/Vista)
and
its Unix counterpart C5.0
are sophisticated data mining tools for
discovering patterns that delineate categories,
assembling them into classifiers,
and using them to make predictions.
Some important features:
-
See5/C5.0 has been designed to
analyze substantial databases containing
thousands to hundreds of thousands of records and
tens to hundreds of numeric, time, date, or nominal fields.
See5/C5.0 also takes advantage of processors with quad cores, up to
four CPUs, or Intel Hyper-Threading to speed up the analysis.
-
To maximize interpretability,
See5/C5.0 classifiers are expressed as
decision trees or sets of if-then rules,
forms that are generally easier to understand than neural networks.
-
See5/C5.0 is available for Windows 2000/Xp/Vista and Linux.
-
See5/C5.0 is easy to use and does not presume any special knowledge
of Statistics or Machine Learning (although these don't hurt, either!)
-
RuleQuest provides C source code so that classifiers constructed
by See5/C5.0 can be embedded in your organization's own systems.
If you would like to learn more about See5/C5.0 or try out
the systems, here are some useful links:
-
You may have used an earlier system of mine called C4.5.
This comparison highlights
the advances embodied in See5/C5.0.
-
A few sample applications show
the sorts of results achievable with See5/C5.0.
-
Links to several publications by See5/C5.0 users are available
here.
-
Tutorials describing and illustrating the use of See5/C5.0
are available for the Windows and
Unix versions.
-
Free demonstration versions (limited to small datasets) and the
public code to read and interpret See5/C5.0 classifiers are
available from our downloads page.
-
An on-line form is provided for obtaining a
ten-day evaluation licence for See5 or C5.0.
-
If you have tried earlier versions of See5/C5.0, here is a
summary of new features in Release 2.06.
|
© RULEQUEST RESEARCH 2008
|
Last updated December 2008
|