PAG-VIII: REPLICATOR NETWORK ARRAYS FOR AUTOMATED DATA ANALYSIS AND CLASSIFICATION

PAG-VIII   Plant & Animal Genome VIII Conference

Town & Country Hotel, San Diego, CA, January 9-12, 2000.


P

REPLICATOR NETWORK ARRAYS FOR AUTOMATED DATA ANALYSIS AND CLASSIFICATION

WASYL MALYJ, David P. Mannion

Computational Science & Advanced Technologies Group Veterinary Genetics Laboratory University of California One Shields Avenue Davis, CA 95616-8744

We are developing new informatic tools to automate the classification and analysis of large complex datasets that have proven resistant to traditional artificial intelligence, neural network, and data mining approaches. So today these elaborate problems still require the attention of skilled human analysts. Our new method can be implemented in unsupervised or supervised modes, as appropriate to the specific application. The supervised mode is discussed here. It requires access to human-validated categorized reference datasets that are presented to informatic components known as basis vector extractors, which extract custom basis sets for each individual data category. These basis sets are then used to build arrays of associated replicator networks, which we have dubbed Adaptive Focused Replicator Networks or AFRNs. Once extracted, the sets of basis vectors are frozen, so that each AFRN has its own custom-tailored basis vector set for its reference category. In production use, the array of AFRNs is presented with new data and each AFRN attempts to reconstruct a faithful copy of the new exemplar using its own associated basis set. A higher-level informatic critic reviews the replication fidelity of the AFRN array and decides which AFRN has most faithfully reproduced the novel exemplar. If none of the AFRNs has performed adequately or if multiple AFRNs have inappropriately performed well, the exemplar is tagged for human interpretation and possible inclusion into an auxiliary reference database that can itself be used to build additional AFRNs in future. We demonstrate the method with two applications: Fisher's classic Iris dataset and exemplars from our laboratory's STR genotyping electropherogram database. Our pilot studies indicate that greater than 90% of our STR electropherograms can be scored accurately by AFRN arrays – completely bypassing the manual double-checking step required by other scoring methods in current use.


Return to Previous Page or Intl-PAG Homepage