Skip to content
← Back to the blog

How pagefile.sys forensics actually works

5/20/2026 · 3 min read

There is no documented binary format for pagefile.sys. Every tool that analyzes it — page_brute, bulk_extractor, X-Ways, Magnet Axiom, this one — falls back to the same family of techniques.

1. Per-page signature carving

The file is split into 4 KB pages and each page is matched against a catalog of file-format signatures. The most useful magic bytes for a pagefile:

SignatureWhat it tells you
MZPE\0\0An executable image was paged out
regfRegistry hive base block (SECURITY, SAM, NTUSER.DAT)
hbinRegistry hive bin — a 4 KB chunk of a hive
FILE / BAADNTFS MFT record (active or damaged)
SQLite format 3SQLite database header (browser history, mail clients)
<?xmlXML — often Windows event log fragments, manifests, configs
ElfFile / ElfChnkEVTX event log file / chunk
SCCAPrefetch file
%PDFPDF
PK\x03\x04ZIP / DOCX / XLSX
\x89PNG / \xFF\xD8\xFFPNG / JPEG

A page that matches a signature usually represents the start of a much larger artifact — but the rest of that artifact lives elsewhere in RAM (and may have been paged out to non-adjacent slots, or not paged out at all). So a single hit is a strong forensic lead, not a recoverable file.

2. String extraction

Most of the value in a pagefile is strings — fragments of text that software was working with when it got swapped out. Extraction is conceptually the GNU strings tool:

  • ASCII: contiguous runs of printable bytes (0x20–0x7E plus tab) of length ≥ 6.
  • UTF-16LE: contiguous runs of (printable, 0x00) pairs. Windows is Unicode-first internally, so the bulk of useful text is here.

Each emitted string is paired with its absolute file offset so you can correlate it with the page it came from.

3. Regex artifact sweeps

Strings then feed a fixed catalog of regexes:

  • URLs: https?://…
  • E-mails, IPv4, IPv6
  • Windows paths (C:\…) and UNC paths (\\server\share\…)
  • Registry keys (HKLM\…, HKCU\…)
  • GUIDs (a strong indicator near MSI installers, COM, event tracing)
  • Command-line indicatorscmd.exe, powershell.exe, mshta.exe, rundll32.exe, certutil.exe
  • Credentialspassword=, Authorization: Bearer …, JWT-shaped tokens.

These are the patterns that have historically yielded most of the actionable intelligence in pagefile triage.

4. Statistical fallback

For pages that match no signature, basic statistics help you decide what kind of content it was:

  • Shannon entropy above ~7.5 → likely encrypted, compressed, or encoded. On Windows 10+, this often means a page that CompressionStoreManager wrote with Xpress-Huffman before flushing to the pagefile.
  • High proportion of null bytes → unused / scrubbed slot.
  • High proportion of printable bytes → text-heavy content that didn't trip a signature.

What you cannot do

Standalone pagefile analysis cannot tell you which process owned a given page or which virtual address it lived at. That requires a corresponding RAM dump and parsing the Page Table Entries (PTEs) — which is what Volatility and MemProcFS do. The pagefile alone gives you content, not context.