Checking Data Quality with GritBot
"Data mining" consists of a family of techniques for extracting valuable information from an organization's stored or warehoused data. Data mining methods search for patterns and can be compromised if the data contain corrupted values that obscure these patterns. As the saying goes, "Garbage in, garbage out."
GritBot is an automatic tool that tries to find anomalies in data as a precursor to data mining. It can be thought of as an autonomous data quality auditor that hunts for records having "surprising" values of nominal (discrete) and/or numeric (continuous) attributes.
Values need not stand out in the complete dataset -- GritBot searches for subsets of records in which the anomaly is apparent. In one of the sample applications referenced below, GritBot identifies the age of two women in their seventies as being anomalous. Such ages are not surprising in the whole population, but they certainly are in this case because the women are noted as being pregnant.
Some important features:
- GritBot has been designed to analyze substantial databases containing tens or hundreds of thousands of records and many numeric or nominal fields.
- Possibly anomalous values that GritBot identifies are reported, together with an explanation of why each value seems surprising.
- The patterns found by GritBot can be saved and used to check new data. Potential anomalies found in new data can differ from the types of anomalies originally identified.
- GritBot is virtually automatic -- the user does not require a knowledge of Statistics or Data Analysis.
- GritBot is available for Windows 7/8/10 and Linux.
If you would like to learn more about GritBot, here are some useful links:
- Source code for a single-threaded Linux version of GritBot release 2.01 is available under the Gnu GPL. Please see the downloads page.
- A few sample applications show off the capabilities of GritBot. You may be surprised at what it has uncovered in well-studied datasets!
- Tutorials on using GritBot are available for the Windows and Unix versions.
- If you have tried earlier versions of GritBot, here is a summary of new features in Release 2.02.
|© RULEQUEST RESEARCH 2018||Last updated January 2018|