Data Mining Tools See5 and C5.0
Data mining is all about extracting patterns from an organization's
stored or warehoused data.
These patterns can be used to gain insight into aspects of the organization's
operations, and to predict outcomes for future situations as an
aid to decision-making.
Patterns often concern the categories to which
situations belong. For example, is a loan applicant creditworthy
or not? Will a certain segment of the population ignore a mailout or
respond to it? Will a process give high, medium, or low yield on
a batch of raw material?
See5 (Windows Xp/Vista/7/8)
and
its Unix counterpart C5.0
are sophisticated data mining tools for
discovering patterns that delineate categories,
assembling them into classifiers,
and using them to make predictions.
Some important features:
-
See5/C5.0 has been designed to
analyze substantial databases containing
thousands to millions of records and
tens to hundreds of numeric, time, date, or nominal fields.
See5/C5.0 also takes advantage of computers with up to eight cores in
one or more CPUs (including Intel Hyper-Threading) to speed up the analysis.
-
To maximize interpretability,
See5/C5.0 classifiers are expressed as
decision trees or sets of if-then rules,
forms that are generally easier to understand than neural networks.
-
See5/C5.0 is available for Windows Xp/Vista/7/8 and Linux.
-
See5/C5.0 is easy to use and does not presume any special knowledge
of Statistics or Machine Learning (although these don't hurt, either!)
-
RuleQuest provides C source code so that classifiers constructed
by See5/C5.0 can be embedded in your organization's own systems.
If you would like to learn more about See5/C5.0 or try out
the systems, here are some useful links:
-
Source code for a single-threaded version of C5.0 (Linux) is
available under the
Gnu GPL.
Please see the
downloads page.
-
A few sample applications show
the sorts of results achievable with See5/C5.0.
-
Links to several publications by See5/C5.0 users are available
here.
-
Tutorials describing and illustrating the use of See5/C5.0
are available for the Windows and
Unix versions.
-
Free demonstration versions (limited to small datasets) and the
public code to read and interpret See5/C5.0 classifiers are
available from our downloads page.
-
If you have tried earlier versions of See5/C5.0, here is a
summary of new features in
Release 2.10.
|
© RULEQUEST RESEARCH 2013
|
Last updated March 2013
|