There is no documented binary format for pagefile.sys. Every tool that
analyzes it — page_brute, bulk_extractor, X-Ways, Magnet Axiom, this one —
falls back to the same family of techniques.
1. Per-page signature carving
The file is split into 4 KB pages and each page is matched against a catalog of file-format signatures. The most useful magic bytes for a pagefile:
| Signature | What it tells you |
|---|---|
MZ … PE\0\0 | An executable image was paged out |
regf | Registry hive base block (SECURITY, SAM, NTUSER.DAT) |
hbin | Registry hive bin — a 4 KB chunk of a hive |
FILE / BAAD | NTFS MFT record (active or damaged) |
SQLite format 3 | SQLite database header (browser history, mail clients) |
<?xml | XML — often Windows event log fragments, manifests, configs |
ElfFile / ElfChnk | EVTX event log file / chunk |
SCCA | Prefetch file |
%PDF | |
PK\x03\x04 | ZIP / DOCX / XLSX |
\x89PNG / \xFF\xD8\xFF | PNG / JPEG |
A page that matches a signature usually represents the start of a much larger artifact — but the rest of that artifact lives elsewhere in RAM (and may have been paged out to non-adjacent slots, or not paged out at all). So a single hit is a strong forensic lead, not a recoverable file.
2. String extraction
Most of the value in a pagefile is strings — fragments of text that
software was working with when it got swapped out. Extraction is conceptually
the GNU strings tool:
- ASCII: contiguous runs of printable bytes (0x20–0x7E plus tab) of length ≥ 6.
- UTF-16LE: contiguous runs of
(printable, 0x00)pairs. Windows is Unicode-first internally, so the bulk of useful text is here.
Each emitted string is paired with its absolute file offset so you can correlate it with the page it came from.
3. Regex artifact sweeps
Strings then feed a fixed catalog of regexes:
- URLs:
https?://… - E-mails, IPv4, IPv6
- Windows paths (
C:\…) and UNC paths (\\server\share\…) - Registry keys (
HKLM\…,HKCU\…) - GUIDs (a strong indicator near MSI installers, COM, event tracing)
- Command-line indicators —
cmd.exe,powershell.exe,mshta.exe,rundll32.exe,certutil.exe… - Credentials —
password=,Authorization: Bearer …, JWT-shaped tokens.
These are the patterns that have historically yielded most of the actionable intelligence in pagefile triage.
4. Statistical fallback
For pages that match no signature, basic statistics help you decide what kind of content it was:
- Shannon entropy above ~7.5 → likely encrypted, compressed, or
encoded. On Windows 10+, this often means a page that
CompressionStoreManagerwrote with Xpress-Huffman before flushing to the pagefile. - High proportion of null bytes → unused / scrubbed slot.
- High proportion of printable bytes → text-heavy content that didn't trip a signature.
What you cannot do
Standalone pagefile analysis cannot tell you which process owned a given page or which virtual address it lived at. That requires a corresponding RAM dump and parsing the Page Table Entries (PTEs) — which is what Volatility and MemProcFS do. The pagefile alone gives you content, not context.