Notes from Previous Releases
- Modified heuristics
The heuristics that guide the search for anomalies have been polished,
as have the heuristics that are used to weed out false positives.
These changes principally affect larger datasets containing hundreds
of thousands of cases.
- (Slightly) improved checking of new cases
GritBot can save its analysis so that new cases from the same application
can be quickly screened for possible anomalies. Release 2.01 includes some
additional tests for irregularities that did not arise in the original data.
- Many of the computation-intensive aspects of GritBot are now multi-threaded, so GritBot can use up to four processors. As a result, GritBot 2.01 will run significantly faster on the newer dual-core and quad-core computers.
Release 1.06 processes larger datasets quite a bit faster than 1.05.
A mid-sized application (100,000 cases, 15 attributes) should
be finished in about three-quarters of the time required by the
- Bug fix
- Release 1.05 introduced the re-use of a saved analysis to inspect new cases. Unfortunately, the results could be incorrect if the application used defined attributes with discrete values. This bug has been corrected in 1.06.
- Inspecting new data
The most important change in the new release is the ability to save
an analysis for later use. By default, GritBot writes a sift
file that collects all the checks that were made while analyzing
the data. The sift file allows GritBot to apply the same checks
to new data in a cases file; checking new data in this way is
much faster than analyzing it ab initio.
The sift file can be quite large, so an option is provided to prevent GritBot from saving this information.
- Option to control the number of anomalies reported
Since large datasets (hundreds of thousands of cases) can produce
many possible anomalies, a new option restricts
the number of them reported. If this option is invoked, the
actual number of potential anomalies found by GritBot is still
shown, but only the most interesting are displayed.
- Generating lists of possible anomalies
- Another option generates a simple list of case numbers of possible anomalies found. The list can be used for follow-up actions such as separating possible anomalies from the rest of the data.
- New data type
Timestamps are read and written in the form
YYYY-MM-DD HH:MM:SSusing a 24-hour clock. (Recall that GritBot already has data types for times and for dates.) A timestamp is rounded to the nearest minute and implicitly defined attributes can be used to compute functions of timestamps such as the number of minutes between two of them.
- Improved detection and screening of possible anomalies
Release 1.04 incorporates improved
mechanisms for detecting potential anomalies and for filtering
out those that may well be spurious.
For example, Release 1.04 checks more carefully for
inappropriate N/A ("not applicable") values in the data.
The minimum number of cases in a subset that contains a potential
anomaly has been increased to the larger of 35 or 0.5% of the data.
- Faster checking
- GritBot is now considerable faster for larger datasets containing tens of thousands of records. This has been achieved by the selective use of sampling to estimate important properties of subsets of cases.
- Selective checking
By default, GritBot examines the values of all attributes in
its search for possible anomalies.
Release 1.03 enables the user to indicate (in the
.namesfile) the attributes that GritBot should examine. This can speed up checking and can also restrict the possible anomalies reported to those more likely to be of interest.
- Time attributes
An attribute declared to be a `
time' takes values in the form
HH:MM:SS. As with dates, attributes defined by formulas can subtract one time from another to give an interval (in seconds).
- Faster checking
GritBot now processes large datasets much more quickly. For
example, a dataset of almost a quarter of a million records with
more than two hundred attributes takes Release 1.02 about 30 hours
on a 500MHz PC; on the same machine
Release 1.03 completes the job in four hours.
- More effective checking
- Some core heuristics of GritBot have been tuned to improve the detection of possible anomalies. At the same time, better `sanity checks' remove more potential false positives.
- More extensive filtering
GritBot finds anomalies by identifying subsets of cases in which
one value for one case stands out. As a further filter, GritBot
now checks that all other values for the highlighted case
are compatible with other values in the set.
The rationale for this is that two anomalous values might be
- Batch mode
The Windows version now includes GritBotX, a batch mode version
- Data formats
Two extensions of the data description language have been incorporated
in Release 1.02.
- A new value `
N/A' can be used when the value of an attribute is not relevant to a case.
- Dates may now be written as either
- A new value `
- Improved error messages
Errors in your
.namesfiles are now pinned down to a specific line number in the relevant file. Of course, if your data never contain errors, you won't notice this ...
|© RULEQUEST RESEARCH 2015||Last updated September 2015|