Autopsy • 0xnhl

Autopsy ↗ is a premier open-source, graphical digital forensics platform that serves as the visual interface for The Sleuth Kit (TSK).
It is widely utilized by law enforcement, corporate incident response teams, and military investigators to analyze hard drives, smartphones, and memory dumps.
The Sleuth Kit is a C library and collection of open source command line tools for the forensic analysis of NTFS, FAT, EXT2FS, and FFS file systems.

Capabilities#

Automated Ingest Modules: These run in the background to extract user activity from web browsers, calculate file hashes, parse emails, and extract EXIF data from images.
Case Management: It allows investigators to organize data sources by hosts and persons, making it easier to manage large-scale multi-device investigations.
File System Analysis: It parses major file systems (NTFS, FAT, ExFAT, Ext2/3/4, HFS+) to locate hidden partitions, catalog files, and examine logical structures.
Timeline Analysis: The software generates advanced graphical event viewing interfaces, placing file modifications, web searches, and user activity into a chronological timeline to map out exactly what occurred on a system.
Autopsy is written in Java and uses SQLite and PostgreSQL databases under the hood, offering a highly modular architecture where custom plugins can be written in Python or Java to parse new file types or automate specific investigative tasks.
Usage
Windows: Autopsy is primarily distributed as a Windows installer and is designed to run efficiently on this platform.
Linux/UNIX: Both tools are open source and run on Linux and Mac OS X, though Autopsy may require manual setup or a Java-based version on these systems.

Ingest Modules#

In Autopsy, Ingest Modules are background plug-ins that analyze the data in your image (disk, folder, etc.). They run in “pipelines,” meaning files are passed through these modules one by one (or in parallel threads) to extract metadata, text, and evidence.

1. Recent Activity#

Description: Extracts user activity data, such as web browsing history, recent documents, and installed programs.
How It Works: This module relies heavily on specific file parsers and the RegRipper tool. It scans specific locations (like C:\Users\[User]\AppData or NTUSER.DAT) to parse:
- Registry Hives: Extracts USB device history, recent file lists (MRUs), and OS install dates.
- Browser History: Parses SQLite databases and index files from Chrome, Firefox, Edge, and Safari to reconstruct history, cookies, and downloads.

2. Hash Lookup#

Description: Identifies “Known” (safe/system) files and “Notable” (bad/evidence) files using hash values.
How It Works:
1. The module calculates the MD5 or SHA-256 hash of every file in the data source.
2. It compares this hash against configured databases (like the NIST NSRL for known software, or a custom “Bad Hash” set).
3. Result: If a match is found, the file is tagged. “Known” files can be hidden from view to reduce clutter, while “Notable” files alert the investigator immediately.

3. File Type Identification#

Description: Determines the true type of a file based on its binary signature (magic number), rather than its file extension.
How It Works: It uses the Apache Tika library. Instead of trusting that image.jpg is a JPEG, it reads the first few bytes of the file (the header) to verify the “magic bytes” (e.g., FF D8 for JPEG). It assigns a standard MIME type (e.g., image/jpeg) to the file in the database.

4. Extension Mismatch Detector#

Description: Flags files where the extension does not match the actual file type (e.g., a malware executable renamed to .txt to hide it).
How It Works: This module runs after the File Type Identification module. It compares the detecting MIME type (from the previous module) against the file’s actual extension. If presentation.pdf is detected as application/x-executable, it flags it as a mismatch.

5. Embedded File Extractor#

Description: Extracts files hidden inside other files, such as ZIP archives or images embedded in Word documents.
How It Works: It scans files to see if they are archive formats (ZIP, RAR, TAR, 7z) or compound documents (Doc, Docx, PPT). It “unips” or extracts the internal contents and feeds those extracted files back into the beginning of the ingest pipeline so they can be analyzed by all other modules (hashed, keyword searched, etc.).

6. Picture Analyzer (formerly EXIF Parser)#

Description: Extracts metadata from image files, specifically focusing on geolocation and camera details.
How It Works: It parses the EXIF (Exchangeable Image File Format) headers in JPEG, TIFF, and other image formats. It extracts data like specific camera make/model, date taken, and—most critically—GPS latitude and longitude coordinates.

7. Keyword Search#

Description: Indexes all text found in the image to allow for instant searching (Google-style) and runs automated searches for things like IP addresses, emails, and phone numbers.
How It Works:
1. Text Extraction: It uses Apache Tika to strip formatting from documents (PDF, Word, HTML) and extract raw text.
2. Indexing: This text is sent to an embedded Solr/Lucene server, which builds a searchable index.
3. Searching: It automatically runs “Ingest Lists” (preset lists of keywords or Regular Expressions) against the index to flag email addresses, URLs, or credit card numbers.

8. Email Parser#

Description: Identifies and extracts individual emails and attachments from email archives.
How It Works: It looks for archive files like MBOX, PST (Outlook), and OST. It parses the internal structure of these containers, extracts individual messages as separate artifacts, and extracts attachments (sending the attachments back through the pipeline for analysis).

9. Encryption Detection#

Description: Flags files that are password-protected or encrypted.
How It Works: It checks files for two criteria:
1. Entropy: It calculates the entropy (randomness) of the file data. Encrypted files have very high entropy (near-random noise).
2. Encryption Headers: It looks for specific file headers associated with BitLocker, TrueCrypt, VeraCrypt, or password-protected Office/PDF documents.

10. Interesting Files Identifier#

Description: A rules-based module that alerts you to files or directories that match a specific name or path.
How It Works: You configure “Interesting File Sets” (e.g., “Cloud Storage Apps”). The module checks every file path against these rules. If it sees Dropbox.exe or a folder named Bitcoin, it creates an alert in the “Interesting Items” section of the dashboard.

11. Central Repository (Correlation Engine)#

Description: Correlates data across different cases.
How It Works: It stores hashes and identifiers in a central database (SQLite or PostgreSQL) distinct from the current case database. If you see a file in “Case A,” and later that same file hash appears in “Case B,” this module alerts you that the artifact was “Previously Seen,” helping link suspects or devices.

12. PhotoRec Carver#

Description: Recovers deleted files from unallocated space (space on the drive not currently used by the active file system).
How It Works: It runs the separate open-source tool PhotoRec in the background. PhotoRec ignores the file system and looks at the raw data blocks, searching for headers and footers of known file types (like JPEGs or Docs). If it finds a contiguous block of data matching a file signature, it carves it out and adds it to the case.

13. Android Analyzer (aLEAPP)#

Description: specialized parsing for Android device dumps.
How It Works: It looks for common Android structures (SQLite DBs, XML files) usually found in data/data or SD card backups. It extracts call logs, SMS, contacts, and Wi-Fi profiles. (Note: Newer versions often integrate aLEAPP, a Python-based parser, for this).

14. iOS Analyzer (iLEAPP)#

Description: specialized parsing for iOS (iPhone/iPad) device dumps.
How It Works: Similar to the Android analyzer, it looks for Property Lists (PLISTs) and SQLite databases specific to iOS (like the sms.db or CallHistory.storedata) to extract user data.

15. E01 Verifier#

Description: Verifies the integrity of the disk image if it is in the E01 (EnCase) format.
How It Works: It recalculates the hash of the E01 file stream as it is being read and compares it to the hash embedded inside the E01 file footer. If they differ, the image is corrupted.

16. Drone Analyzer#

Description: Analyzes data from DJI drones.
How It Works: It parses specific DAT and TXT flight log files generated by DJI drones to extract flight paths (GPS), timestamps, and pilot inputs.

17. Plaso (Timeline)#

Description: Creates a “Super Timeline” of all events.
How It Works: This is a Python wrapper for the Plaso/Log2Timeline tool. It parses virtually every file with a timestamp (logs, file system metadata, browser history, etc.) and outputs a single chronological list of every event that happened on the system.