Ten Forensics Toolkit

CS470 Final Report

Frederick J Polsky
University of Alaska Anchorage
18 November 2002


ABSTRACT


    Computer forensic packages provide investigators means of gathering digital evidence from computer systems using legally established procedures of forensic science. These include ensuring evidence is not altered, conducting investigations in forensically sterile environments, documenting chains of custody, and logging investigative actions. Several proprietary forensics packages exist for Microsoft Windows based systems. Ten is a free-software forensics package for the GNU/Linux system. It provides a GUI frontend to standard command-line utilities to assist the Anchorage Police Department in the forensic examination of GNU/Linux systems using the ext2 and derived filesystems. Ten will be released to the public under the terms of the GNU General Public License.


TABLE OF CONTENTS


  • Title
  • Abstract
    1. Introduction
      1. Definition
      2. Problem Statement
      3. The Project
      4. Standards
      5. Principals
    2. Procedures
      1. Suspect Computer Systems
      2. The Examination Process
    3. Software Design and Implementation
      1. Current Features
      2. Future Development
      3. Architecture
      4. Design
      5. Application and Window Management
      6. Signals and Slots
    4. Ten User Experience
      1. The Main Window
      2. New Case Creation
      3. Reopening of Existing Case
      4. Image Browser
      5. Hexadecimal File Viewer
      6. Disk Image Extraction
      7. Help
      8. Exiting Ten
    5. Data Structures
      1. Case Directory
      2. Case File
      3. Log File
      4. Image Files
    6. Development Process Commentary
    7. Conclusions
  • Summary
  • References
    1. INTRODUCTION
      1. DEFINITION

            Forensic science is the application of scientific methodologies to legal processes. Computer forensic science is the intersection of computer and forensic science; the preservation, identification, extraction, documentation and interpretation of computer data as evidence [KH2002]. Evidentiary data may be either direct evidence (e.g. illegal pornographic content or stolen credit information) or circumstantial evidence (e.g. a heated email exchange which may help establish intent in a premeditated murder case). The evidence must be gathered consistent with established legal procedures. A chain of custody (a log of all evidence handling) must be maintained. Evidence may not be altered or tampered with. Evidence must be accurately and completely described.

      2. PROBLEM STATEMENT

            Presently, the Anchorage Police Department has commercial software for forensic inspection of Microsoft Windows based systems, EnCase from Guidance Software, Inc. Current versions of the EnCase software support UNIX and GNU/Linux filesystems, however their current version is quite old and upgrades have not been budgeted. Therefore, they presently have no means by which to inspect GNU/Linux systems except by direct use of command-line utilities, which they do not possess sufficient Unix or Linux expertise to accomplish.

      3. THE PROJECT

            This project is the design, implementation, and documentation of software to solve the stated problem. It was undertaken at the request of an Anchorage Police Department computer forensic investigator. It provides a graphical interface to standard system utilities. It ensures evidentiary data is preserved, extracted, and documented. Future enhancements will be made to automate evidence identification and cataloging. Interpretation will remain in the domain of trained investigators, legal analysts, judges and juries. The software is to be released under the terms of the GNU General Public License. Development will continue and the software will be the cornerstone of a forensic inspection operating system distribution and workstation.

      4. STANDARDS

            This software is wholly developed from computer forensic specifications from recognized professional organizations including the National Institute of Standards and Technology Computer Forensics Tool Testing Project and the International Association of Computer Investigative Specialists. In the interest of intellectual property and look-and-feel concerns, no specific software package has been used as a model for this project.

      5. PRINCIPALS

            Frederick Polsky is the primary developer and project manager. There is presently one other active developer, Thomas Kircher. Field testing will be conducted by APD investigators Det. Sgt. Ross Plummer and Det. Glen Klinkhart. Development testing has been conducted by Frederick Polsky and Thomas Kircher. The software's copyright is assigned to Screaming Genius Meta Labs, a sole proprietorship licensed in the state of Alaska, owned by Polsky and named for Kircher.

    2. PROCEDURES
      1. SUSPECT COMPUTER SYSTEMS

            This software is designed for the inspection of suspect computer systems which have been seized and delivered to a forensic computer laboratory; it is not designed to conduct investigations on running production servers. The system will be handled in accordance with agency procedures regarding evidence collection and handling, then delivered to the forensic lab. The storage media will be removed from the suspect system and installed in a forensic analysis workstation for extraction and analysis using Ten. Legal procedures and issues regarding the handling of the physical evidence are beyond the scope of this project; an extensive discussion is available from the US Department of Justice Computer Crime and Intellectual Property Section's report Searching and Seizing Computers and Obtaining Electronic Evidence in Criminal Investigations.

      2. THE EXAMINATION PROCESS

            When the suspect media is installed in the inspection workstation and the Ten software invoked, the forensic analysis of the suspect media may be performed. Within Ten, a new case is created or an existing case reopened. A bitstream image copy is made of the suspect media. The integrity of the imaging process is verified using the SHA-1 message digest algorithm [NIST1995]. The image file is scanned for Linux filesystem partitions and each located partition is mounted using the Linux loop (lo(4)) device. A directory browser is invoked to allow the investigator to examine the contents of the filesystem. The browser displays file metadata including creation, modification and access times, and MIME type. A file may be viewed in a hex viewer. Log files are kept current during all activity; it is unnecessary for the investigator to save the case manually. All case data is saved in a single folder on the investigation workstation.

    3. SOFTWARE DESIGN AND IMPLEMENTATION
      1. CURRENT FEATURES

        The Ten software currently performs the following functions:

        • Maintains case information
        • Maintains a chain of custody log for each case
        • Performs a bitstream copy of a hard disk to an image file
        • Verifies copy integrity using a message-digest algorithm
        • Mounts Linux filesystems contained within the file
        • Provides a tree-view directory browser for the mounted filesystems
        • Identifies creation, access and modification times for the files within the filesystems
        • Identifies the file type of each file within the file system based on magic number and not filename extension
        • Provides a hex viewer for viewing of individual files

      2. FUTURE DEVELOPMENT

        Functions to be completed during future ongoing development include:

        • Providing file previews based on MIME type and invocation of external viewers
        • Providing keyword and regular expression searches for individual files and the entire filesystem
        • Support for other filesystems
        • Hashing of individual files and testing against a standard library of file hashes

      3. ARCHITECTURE

            The Ten software is written using the C++ programming language and the Qt/X11 Free Edition toolkit by Trolltech AG. Most program code was originally written by Polsky and Kircher. External code used (with modifications) in Ten include freely distributable code from the Qt Example Programs (the help viewer and directory browser) and GPL-licensed code used in the hex viewer (derived from xhexscope, Goldseal Studios, Inc.). Additionally, code is automatically generated during the build process by the Qt Meta-Object Compiler. Standard system utilities called directly from Ten are dd(1), file(1), mount(8), sha1sum(1), sfdisk(8). Support software used in the development of Ten include the Concurrent Versioning System for version control and collaborative development and the Doxygen documentation generation system for API documentation generation.

      4. DESIGN

            The program structure is defined by object-oriented design principles and the facilities provided by C++ and Qt. Each user dialog is a single class derived from a Qt widget. Where a particular functionality necessarily affects the behavior of the widget, the functionality is included in the class. Container objects are created where multiple objects may be needed to be accessed as a single unit. Each object is defined in a C++ source and header file; one per file (except as derived from external sources). The naming convention for classes is fFooBar. The naming convention for member data elements is mFooBar. Member functions do not have a specific naming convention, except that accessors are preceded by 'get'.

      5. APPLICATION AND WINDOW MANAGEMENT

            The program excecution is controlled by the QApplication class, an instance of which is executed from main(). QApplication manages the event loop, application settings, command-line argument parsing and other features less relevant to the needs of Ten. The main program window, fMainWindow, is defined as the main widget. Thus, when the fMainWindow object is destroyed, the QApplication terminates.

      6. SIGNALS AND SLOTS

        Communication between classes (e.g. returning values entered by users in input forms) is performed using Qt's signal and slot mechanisms. Signals and slots are macros placed in the code to perform the functions of callbacks or events in other toolkits. Any class inheriting QObject may use the signal/slot mechanism. A Qt signal exists as a function definition in the class header and is generated with the 'emit' keyword in Qt. A slot is a regular function, defined as a slot in the class header. Signals and slots are connected to one another using the QObject::connect() method. When a signal is emitted, the connected slot is invoked, and the argument values (if any) passed from the signal to the slot. The signal/slot mechanism is built automatically by the Qt build process. Qt classes are meta-objects, denoted by presence of the Q_OBJECT macro; real object code is constructed by the Qt meta-object compiler (files denoted moc_foobar.cpp). The convenience of the signal/slot mechanism comes at a small cost of performance, generally unnoticeable to users.

    4. TEN USER EXPERIENCE
      1. THE MAIN WINDOW

            The main window has the following menu options: File->New, Open, Quit; Actions->Extract; Options->Preferences; Help->Index, About. Until a case is open, Actions are disabled. The main window contains at first a blank application area, and a status bar. The status bar indicates if a case is open, or the status of various operations. At present only a single case may be open at a time; it is presently unknown whether or not the ability to work on multiple cases simultaneously is a desireable option. When a case is opened a directory browser and log view window are added to the main window.

        Figure 1. Main window on initial invocation

      2. NEW CASE CREATION

            When a case is created new, a dialog requests a user enter an agency-assigned case number, the name of the examiner, and general case-related comments. A save-file dialog then requests the name of the case folder; that folder is created, case data saved to a case file and a log file created, logging the newly created case information. Actions are then enabled.

        Figure 2. New case information dialog

      3. REOPENING OF EXISTING CASE

            When a case is reopened, the logger logs the reopening of the case, any saved images are mounted, and the directory view is invoked to browse the mounted images. Due to the on-the-fly mounting it is currently required to run Ten as the superuser.

      4. IMAGE BROWSER

            The directory tree view provided by the fImageBrowser class displays filenames, MIME types, creation, modification, and access dates/times. MIME types are determined by invoking file(1) which determines file types by checking for magic numbers. Presently this feature makes opening a directory with a large number of files rather sluggish. Right-clicking a file presently invokes the hex viewer on the file. It will eventually present a context menu with options for hex viewing, searching, or opening with native viewers.

        Figure 3. Main window with directory tree view and log viewer

      5. HEXADECIMAL FILE VIEWER

            The hex viewer displays a hexadecimal representation of the file along with displayable ASCII characters and file offsets. At present it attempts to load an entire file into the memory buffer which is problematic. Needed enhancements are to buffer the hex viewer I/O and to invoke the hex viewer within a file at a certain offset (to be tied into a search feature.)

        Figure 4. Hexadecimal file viewer

      6. DISK IMAGE EXTRACTION

            The Actions->Extract option invokes the image extraction utility. A process determines the available disk drives from which extraction is possible, and the user selects the drive based on device path. A necessary enhancement is to translate the device path into human-readable form. Once the device for extraction is selected, an information dialog is invoked to gather an item number and description. Once these are entered, a save dialog requests the name to save the image file to, then the image extraction process begins. The main window status bar then displays a progress indicator for the duration of the extraction and hashing processes.

        Figure 5. Device selection dialog


        Figure 6. Image information dialog

      7. HELP

            The Help->Index option invokes an HTML-aware help viewer. It is invoked non-modally, so it may remain open while the application is in use. It loads an index.html file containing links to the user documentation, the API documentation, and the GNU GPL. The Help->About option displays the application version, authoring and license information.

        Figure 6. Help Viewer

      8. EXITING TEN

            The File->Quit option saves all current case state and closes the application. Closing the application window directly accomplishes the same thing. The closure of the application is logged if a case is open. If the application is closed externally such as with SIGINT. the closing of the application is not logged to the open case. This is presently an open bug.

    5. DATA STRUCTURES

      1. CASE DIRECTORY

            Data stored is stored in case directories; all the files associated with one case are stored in one directory. Presently, these are case files, log files, and image files. Evidence files will be stored in the future to denote items marked as evidence. These are collectively managed via the fCase class.

      2. CASE FILE

            The case file, handled by the fCaseFile class, contains case data and meta-data including case information and a list of extracted image files. The extension of the case file is .10C. The case file is stored using XML-style tags and attributes. The format, informally, is as follows:

             <CASE VERSION=(string) ID=(string)>
        
             <NUMBER>(string)</NUMBER>
        
             <EXAMINER ID=(int)>(string)</EXAMINER>
        
             <COMMENTS>(text)</COMMENTS>
        
             <DATA>
        
             <IMAGE SRC=(string) ID=(string) HASH=(hex)>(string)</HASH>
        
             ...
        
             </DATA>
        
             </CASE>
        
        Values:
          CASE
            VERSION
            The Ten software version used to create the case
            ID
            A unique case identifier (presently unused, therefore set to the case number)
          NUMBER
          The user or agency-assigned case number
          EXAMINER
          The lead examiner's name
            ID
            Reserved for future expansion
          COMMENTS
          The case comments
          IMAGE
          The description of the suspect media.
            SRC
            The fully-qualified path to an extracted image
            ID
            Reserved for future expansion
            HASH
            The SHA-1 hash value of the disk image

            The case file is represented as a tree via the Qt DOM classes. Class fCaseFile itself inherits QDomDocument (in addition to QObject to enable signals/slots); individual tags, attributes and values are instances of the appropriate Qt DOM classes. This greatly simplifies processing of the case file as the entire case information can be stored in memory and easily written any time a change is made or when the fCaseFile class is destroyed. The QDomDocument class provides a toString() method which is passed to the appropriate text stream.

      3. LOG FILE

            The log file is represented by the fLogFile class. It provides a method to automatically prepend a time/date stamp to a string, which may be a format string (there is presently a bug preventing more than one '%' argument from being parsed). The stamped line of text is then written to a plaintext log file. It will likely be a desirable future enhancement to write the logfile as rich text or to provide pretty-printing facilities; this must be performed in a manner which does not alter the authenticity of the log file. No built-in controls exist to maintain validity of the log file; the log file will be subject to the same legal scrutiny as any other handwritten or computer generated logs and reports, and will be sworn under penalty of purjury to be correct if entered into legal proceedings. Future enhancements, if necessary, may include digital signature capabilities and seperate logs for individual sessions.

      4. IMAGE FILES

            Bitstream copies of image files are extracted using the standard utility dd(1). dd as an imaging tool has been extensively tested by the Computer Forensics Tool Testing project of the National Institute of Standards and Technology [NIJ2002]. The image is created by assigning the input of dd to the device node of the suspect media and assigning the output to a regular file. Image file metadata is maintained by the fImageFile class.

            Once the disk image is extracted, sha1sum(1) is used to generate SHA-1 hash values for both the source device and the copied image. Identical SHA-1 hashes verify correctness of the image copy. Presently, the copy is accepted if the hash values match and rejected if the hash values differ. Future enhancements would be to copy and hash individual blocks as well as entire devices, and to copy specific filesystem partitions from suspect devices. The SHA-1 hash algorithm is specified by the National Institute of Standards and Technology in [NIST1995] and is also published as RFC 3174. It is valid for inputs of up to 2 exabytes and has a keyspace of 2^160. It is considered infeasible to alter an input in such a manner as to generate a hash collision.

    6. DEVELOPMENT PROCESS COMMENTARY

          The computer science program during my lengthy undergraduate career required little work in software engineering or project management, so I cannot describe in those terms any particular "method" by which development proceeded. Indeed it would more accurately be described a "madness". The development process for this project is best described as postmodern [NB2002]. I had much to overcome in feelings of sheer dread and inadecquacy in tackling my first "real" project of any significant complexity (in comparison to the toy programs assigned as homework, which I could hack out in not more than a night of work in most cases). Adding to the confusion is my scant GUI programming experience previous to this project. It took some time to locate the proper combination of toolkit and language suitable for the task at hand and which I could work with comfortably. Finally despite all best efforts of employers and employer-hired training consultants, I have not been able to force upon myself the habit of time management. I can guess I estimate 15% of the time in design, 15% in research, 20% in testing, 15% in documentation, 15% in infrastructure metawork, 10% learning tool usage, and perhaps 10% in actual coding. If I were to make a GANTT chart (and to be honest I didn't know what one was until within the last year) it would be purely contrived. The CVS log will betray my bursts of productivity during the development process.

          Given a basic description of the needed features and conducting my own research on forensic software requirements, I identified those features which were necessary for the rudiments of a computer forensic software package. I identified system software which was either necessary or helpful to use, discussed design and implementation ideas with Mr. Kircher, located Qt classes and widgets to simplify the workload, searched for code online, debugged by compiler far too often, wasted time reading news sites, suffered anxiety attacks which blocked my ability to work, fielded interruptions to go fix a client's printer, and, somehow, ended up with a working program at the end of the cycle.

          Small pieces of functionality were crafted and debugged in a lab testing environment. When these were shown to be functional, more coding would occur on new functionalities. This process of building the program occurred in a similar manner to the way a maze is discovered in a game of Nethack. During the course of this process loose ends in previously written functions would be tied up, other desireable or undesirable features would be added or eliminated (these feature sets not being mutually exclusive). Due to the regular investigative workload of the APD investigators beyond computer forensics, an opportunity for field testing has not yet arisen. The current state of the project has been demonstrated on a laptop , the basic functionality was deemed acceptable.

          Further development on this project will occur once an opportunity for field testing arises and feedback is received. It is expected that future design and real-world testing will present many design and implementation issues which could not have been foreseen. Problems which are not apparent in reading the code will appear in re-reading, or re-re-reading, etc. Release as free software will enable many more developers to review the code allowing ample opportunity for enhancement, debugging and optimization. As well, outside code review will be useful as the program itself is inevitably challenged during the legal process.

    7. CONCLUSIONS

          I did gain real-world experience in design and programming in developing this project. I learned that the success or failure of a project has much to do with the choice of tools and project as the actual implementation itself. I believe the Ten software, documentation and presentation adequately satisfy the catalog requirements of CS470 to present a realistic system of moderate complexity. I believe my skills in research, software design and programming have been shown to be to the level expected of a technically competent computer science graduate. I believe I have applied technical, managerial, communications and interpersonal skills, where any exist, to this project as required by the current CS470 syllabus. I therefore propose this project as delivered receive a passing grade.


    SUMMARY


        Ten is a package designed for assisting computer forensic investigators conduct inspections on GNU/Linux based systems. It is intended to be useful either to a trained computer forensic investigator with a limited knowledge of Linux command facilities, or to a UNIX/Linux expert with some knowledge of legal and forensic practices. The software will continue in development as it is used and tested by the Anchorage Police Department and the free software community. The software will serve as the basis for a customized OS distribution and forensic inspection workstation solution. This project fulfills, in the author's opinion, the requirements set forth by the UAA catalog description of the CS470 Applied Software Development Project.


    REFERENCES


    [KH2002]
    Kruse, W., Heiser, J.; Conputer Forensics: Incident Response Essentials (Addison-Wesley, ISBN 0-201-70719-5).

    [NIST1995]
    National Institute of Standards and Technology. Secure Hash Algorithm. Federal Information Processing Standards Publication 180-1.

    [NIJ2002]
    National Institute of Justice. Test Results for Disk Imaging Tools: dd GNU fileutils 4.0.36, Provided with Red Hat Linux 7.1. NIJ Publication 196352.

    [NB2002]
    Noble, J., Briddle, R.; Notes on Postmodern Programming. Victoria University of Wellington, New Zealand. Technical Report CS-TR-02-9.


    Frederick J Polsky v1.0
    Last modified: Mon Nov 18 16:27:40 AKST 2002