Spider is an application written at Cornell University for the specific purpose of searching computers for sensitive data (see official website at: http://www.cit.cornell.edu/computer/security/tools/ ). Cornell has created versions for Windows (works under Windows 2000, XP, and Server 2003), Mac OS X and Linux, but this documentation focuses on the Windows version.
Spider searches through the content of all files looking for matches to the regular expressions it is given (see http://en.wikipedia.org wiki/Regular_expressions for more information on regular expressions). This can be a very time, disk and CPU intensive activity (i.e. it might consume your computers resources for several hours), so configuring Spider to be most efficient is important.
With the permission of Cornell, ITS has repackaged and redistributed Spider with a custom default configuration. This configuration includes a list of file types to skip (that either would not contain sensitive data, or which format data in a way that Spider cannot interpret), some improved search parameters, and changes to the way log files are created. The ITS configuration is designed with end-users and “one-click” use in mind, but the packaged settings are also useful for IT administrators.
Background
Effectiveness
Installation
Running Spider
Using the graphical interface
Running from the command line
Configuration
Runtime configuration
File handling configuration
Regex (search) configuration
Logging configuration
Advanced configuration
Variables available within Spider
