Recovery of Deleted Data and Associated Metadata from XFS and Btrfs Filesystems
Abstract
Background : Digital evidence has become increasingly crucial in forensic investigations. The recovery of deleted data from storage devices is essential for reconstructing timelines, identifying suspects, and uncovering critical information. Traditional file systems like FAT and NTFS have been extensively studied, and tools for recovering deleted data from them are relatively mature. However, modern file systems like XFS and Btrfs, designed for performance and reliability, employ complex data structures that pose signifycant challenges for data recovery. Forensic investigations often involve recovering various file types, including documents files, log files, and system files. These files contain valuable information about system activities, user behaviour, and potential criminal activities. The ability to recover deleted files along with their complete metadata, such as creation, access, modification, and deletion timestamps, is crucial for establishing timelines and corroborating evidence. Detailed Description :XFS and Btrfs file systems offer advanced features like journaling, copy-on-write, and efficient data allocation. While these features enhance system performance and data integrity, they also complicate the process of recovering deleted data. When a file is deleted in these file systems, the data itself is not immediately erased; instead, the file system marks the allocated blocks as free for reuse. This delayed overwriting of data presents an opportunity for recovery, but it also requires specialized techniques to extract and analyse the deleted data. Moreover, recovering accurate metadata associated with deleted files is equally challenging. Metadata is critical for establishing the context of the recovered data and determining its relevance to the investigation. Extracting metadata from XFS and Btrfs file systems requires a deep understanding of their internal structures and data allocation mechanisms. Expected Solution: An ideal solution would be a comprehensive data recovery technique specifically designed for XFS and Btrfs file systems. These techniques should able to: 1. Efficiently recover deleted data: Develop algorithms and techniques to identify and extract deleted files from the complex data structures of XFS and Btrfs. 2. Support a wide range of file types: Recover Text-Based Document Formats(doc,docx, rtf, pdf, txt, odt, html, xml, ppt, odp, xls, ods, log, csv, tsv, txt, conf, ini, cfg etc), archives file(zip, tar, rar, iso, rpm, deb etc), Image-Based Document Formats(jpg, jpeg, png, gif, tif etc), executables binaries(.elf, .so, .a, exe, dll, bat, cmd) scripts files(ps, ps1, sh, bash, zsh, py etc), database file(.db etc) and other relevant data formats. 3. Extract complete metadata: Recover accurate creation, access, modification, and deletion timestamps, file names, and other essential metadata associated with deleted files. 4. Provide user-friendly interface: Offer an intuitive interface (GUI/CLI) for easily navigate recovered data and generate reports. 5. Ensure data integrity: Implement robust data validation and verification mechanisms to maintain the integrity of recovered data.
Existing System
To compensate for the aforementioned case, a file recovery method using the journal area of the XFS file system has been examined [8]. The recovery of a deleted file is achieved by leveraging the feature that information, such as the size and offset of the deleted file, remains in the journal area of the XFS file system. However, it is very difficult to apply because the experimental environment that can affect the file system metadata is not detailed, and the current XFS version v5 is newer than tested. In the case of UFS Explorer, which is a commercial tool, it has been found that it is impossible to recover files less than 1 KB compared to supporting the recovery of files having 3 GB or more [9]. Accordingly, using the advantages of the open-source project, file system forensic researchers have conducted studies to overcome the limitations of TSK. In 2017 and 2018, the data structures of ZFS and Btrfs were analyzed, and a data extraction tool based on TSK was implemented [12,13]. Both papers proposed a data extraction model based on pool storage, which was also used in this work to extract data. However, the foregoing model did not properly follow the development framework of TSK and only employed certain functions, resulting in insufficient extensibility.
Disadvantages
Limited Metadata Recovery: Metadata Complexity: XFS uses a complex data structure for metadata, including inodes, allocation groups, and log files. Recovering metadata accurately can be challenging. Log-Structured Approach: XFS maintains a log (journal) of file operations which can be useful for recovery, but if the log is overwritten or corrupted, it complicates the process. Complex Snapshot Mechanism: Snapshot Management: Btrfs supports snapshots, which can complicate recovery. Snapshots might retain deleted files or metadata, but managing and recovering from them can be complex. Recovery Success Rate: Partial Recovery: Depending on how much data has been overwritten or how the filesystem was used after deletion, the recovery might only be partial, leaving gaps or corruption in the restored files.
Proposed System
The file extraction and recovery tools developed for specific versions of file systems may not work normally if the file system version has been updated. This is because the metadata structure employed for file extraction and recovery has been changed by the update. In particular, TSK, a well-known file system forensic tool, does not consider the journal area in the Ext4 file system and has no function for recovering deleted files. In the case of the XFS file system, file extraction and recovery are impossible because it is not supported by TSK. This section presents the analysis of the Ext4 file system journal checksum v3 and XFS file system v5 to identify the modified metadata structures and propose a file extraction and recovery framework based on TSK. The proposed framework is based on TSK, which is well-known to users as opensource digital forensics and is easily extensible. The TSK-based framework architecture for recovering deleted files from an Ext4 file system and extracting files from an XFS file system is shown in Figure 1. The proposed framework operates based on the file extraction and recovery command in TSK (i.e., tsk_recover); it does not affect other TSK functions.
Advantages
Journaling and Logging: Transactional Integrity: XFS uses a journaling mechanism that logs changes before they are committed. This can be advantageous for recovery because the journal can provide a record of operations leading up to and beyond the point of deletion, potentially aiding in reconstructing lost files or metadata. Snapshot and Subvolume Features: Snapshot Recovery: Btrfs supports snapshots, which can retain a consistent view of the filesystem at a particular point in time. This can be advantageous for recovery as snapshots might still contain the deleted files or metadata, allowing for straightforward restoration from a snapshot. Advanced Features: File System Design: Both XFS and Btrfs offer advanced filesystem features (like journaling in XFS and snapshots in Btrfs) that can be leveraged during recovery to improve the chances of successfully restoring deleted data.
