GritBot icon Notes from Previous Releases

Release 1.06

Faster!
Release 1.06 processes larger datasets quite a bit faster than 1.05. A mid-sized application (100,000 cases, 15 attributes) should be finished in about three-quarters of the time required by the previous release.

Bug fix
Release 1.05 introduced the re-use of a saved analysis to inspect new cases. Unfortunately, the results could be incorrect if the application used defined attributes with discrete values. This bug has been corrected in 1.06.


Release 1.05

Inspecting new data
The most important change in the new release is the ability to save an analysis for later use. By default, GritBot writes a sift file that collects all the checks that were made while analyzing the data. The sift file allows GritBot to apply the same checks to new data in a cases file; checking new data in this way is much faster than analyzing it ab initio.

The sift file can be quite large, so an option is provided to prevent GritBot from saving this information.

Option to control the number of anomalies reported
Since large datasets (hundreds of thousands of cases) can produce many possible anomalies, a new option restricts the number of them reported. If this option is invoked, the actual number of potential anomalies found by GritBot is still shown, but only the most interesting are displayed.

Generating lists of possible anomalies
Another option generates a simple list of case numbers of possible anomalies found. The list can be used for follow-up actions such as separating possible anomalies from the rest of the data.


Release 1.04

New data type
Timestamps are read and written in the form YYYY-MM-DD HH:MM:SS using a 24-hour clock. (Recall that GritBot already has data types for times and for dates.) A timestamp is rounded to the nearest minute and implicitly defined attributes can be used to compute functions of timestamps such as the number of minutes between two of them.

Improved detection and screening of possible anomalies
Release 1.04 incorporates improved mechanisms for detecting potential anomalies and for filtering out those that may well be spurious. For example, Release 1.04 checks more carefully for inappropriate N/A ("not applicable") values in the data. The minimum number of cases in a subset that contains a potential anomaly has been increased to the larger of 35 or 0.5% of the data.

Faster checking
GritBot is now considerable faster for larger datasets containing tens of thousands of records. This has been achieved by the selective use of sampling to estimate important properties of subsets of cases.


Release 1.03

Selective checking
By default, GritBot examines the values of all attributes in its search for possible anomalies. Release 1.03 enables the user to indicate (in the .names file) the attributes that GritBot should examine. This can speed up checking and can also restrict the possible anomalies reported to those more likely to be of interest.

Time attributes
An attribute declared to be a `time' takes values in the form HH:MM:SS. As with dates, attributes defined by formulas can subtract one time from another to give an interval (in seconds).

Faster checking
GritBot now processes large datasets much more quickly. For example, a dataset of almost a quarter of a million records with more than two hundred attributes takes Release 1.02 about 30 hours on a 500MHz PC; on the same machine Release 1.03 completes the job in four hours.

More effective checking
Some core heuristics of GritBot have been tuned to improve the detection of possible anomalies. At the same time, better `sanity checks' remove more potential false positives.


Release 1.02

More extensive filtering
GritBot finds anomalies by identifying subsets of cases in which one value for one case stands out. As a further filter, GritBot now checks that all other values for the highlighted case are compatible with other values in the set. The rationale for this is that two anomalous values might be mutually self-explanatory.

Batch mode
The Windows version now includes GritBotX, a batch mode version of GritBot.

Data formats
Two extensions of the data description language have been incorporated in Release 1.02.

Improved error messages
Errors in your .data or .names files are now pinned down to a specific line number in the relevant file. Of course, if your data never contain errors, you won't notice this ...

© RULEQUEST RESEARCH 2007 Last updated April 2007


home products download evaluations prices purchase contact us