Release 2.02 also recovers all previous settings when an application is re-run. Some of the previous settings revert to defaults in 2.01.
cases file.
These errors are now reported via pop-up messages.
On an oceanographic application with 178,000 cases and 12 attributes, for instance, the model produced by Release 1.11 has 58 rules, Release 1.12's has 46, and Release 1.13's has only 38.
.names file now has a facility to restrict
the attributes that can appear in models.
This allows
attributes to be used in formulas defining other attributes
but not directly in a model.
For example, suppose that the data contain two numeric attributes
A and B but background knowledge
suggests that only their difference is important.
It is now possible to define a new attribute
without allowing Diff := A - B.
A or B themselves
to appear in any model.
This same facility makes it much easier to experiment with restricted subsets of the attributes.
time'
takes values in the form HH:MM:SS. As with dates, attributes
defined by formulas can subtract one time from another to give
an interval (in seconds).
Committee models are of most value in applications for which single Cubist models are already pretty accurate.
Dates can now be entered as either
YYYY/MM/DD or
YYYY-MM-DD.
.model files have been changed to
ASCII format, so that models generated on one machine type may be
deployed on machines of another type. The source code that
facilitates such deployment has also changed substantially.
To ease the changeover, Cubist and the new public code will still read model files generated by Release 1.07.
xval
script. The -X option invokes cross-validation
and specifies the number of folds.
The xval script is still used for multiple
cross-validations. However, the option +d that
preserves detailed outputs now saves one file for each
cross-validation rather than one file for each Cubist run.
YYYY/MM/DD
and can be used with implicitly defined attributes to determine,
for instance, the number of days between two dates or the day of the
week on which a date falls.
Ordered discrete values are nominal values that have
a natural ordering, such as small, medium, large, XL, XXL.
When an attribute's discrete values are noted as ordered, Cubist
exploits this information to
test subranges of the values,
e.g. [large-XXL].
This tends to produce more compact models with higher predictive accuracy.
A new button on the toolbar allows the previous output to be redisplayed.
Release 1.06a contains a new parameter that influences this tradeoff. When the brevity factor is set to a high value, Cubist will emphasize simplicity (usually at some expense to accuracy). Similarly, a low value puts a premium on accuracy, but may substantially increase model complexity. The choice is now yours!
| © RULEQUEST RESEARCH 2010 | Last updated February 2010 |
| home | products | download | evaluations | prices | purchase | contact us |