Date of Award


Degree Type


Degree Name

Bachelor of Applied Sciences Honours


Faculty of Science and Technology

First Advisor

A. Watson

Second Advisor

T Haines


The threat to computers worldwide from computer viruses is increasing as new viruses and variants proliferate. Availability of virus construction tools to facilitate 'customised' virus production and wider use of more sophisticated means of evading detection, such as encryption, polymorphic transformation and memory resident 'stealth' techniques increase this problem. Some viruses employ methods to guard against their own eradication from an infected computer, whilst other viruses adopt measures to prevent disassembly of the virus for examination and analysis. Growth in computer numbers and connectivity provide a growing pool of candidate hosts for infection. Standardised and flexible systems for classification and naming are needed to eliminate ambiguity and to promote effective identification of viruses. This study is an examination of one candidate classification method. A depth-mediated variation of monothetic analysis has been developed to classify a database of virus information stored in binary variables. The method trialled in this research is suitable for use, although generalised application of monothetic analysis is limited, as only binary (Boolean) variables may be analysed, whilst some pertinent virus information may be of a numeric or descriptive type. The storage of the virus information in a database allows for flexibility in both data volume (new virus reports) and virus characteristics (new variables). Items in both of these categories may be easily added to previously stored information. The data which was used for this study, however, although suitable as test data for the proposed classification technique, is inadequate for taxonomic classification purposes, being highly variable in format, content, and completeness. Several questions also arose regarding accuracy. Such deficiencies were disregarded for the purpose of this study as it was possible to verify in all cases that no current category of virus was missed (omission of which would have made the trial data incomplete). Secondary objectives for this study were the consideration of a suitable nomenclature, resolution methods for delimitation conflicts, and a classification encoding method. Currently, the name of a new virus frequently includes the name of the perceived parent virus. The solution to the problem of variations in naming will depend on whether this 'patronymic' system is continued. Increases in variability and identification problems caused by encryption and particularly polymorpism may make long term continuation of this approach impractical. Mediation for delineation conflicts, is met by the classification system itself, as the group into which a virus falls is determined by its possession of the requisite characteristics. An encoding method for virus classification details has been provided by the progressive building, during classification, of a node identifier for each virus record, which identifies the branch conditions carried out to group that virus. This provides the variable names on which the virus has been grouped, and together With the values for each of the variables used, summarises the virus characteristics in terms of the classification variables and the depth to which classification has proceeded.