0xnhl

Author on the Run

/ Update
5 min read

ApoorvCTF Forensics Writeup - Keyboard Audio Leakage#

Challenge summary#

Description:
No time to explain! The organizers are after me — I stole the flag for you, by sneakily recording their keyboard.
I managed to capture their keyboard keypresses before the event— every key (qwertyuiopasdfghjklzxcvbnm) pressed 50 times—don’t ask how. Then, while they were uploading the real challenge flag to CTFd, I left a mic running and recorded every keystroke.
Now I’m on the run If the organizers catch you with this, you never saw me. Good luck — and hurry!

We are given two WAV files:

  • Reference.wav (training capture)
  • flag.wav (the real typed message)

Story hint says the attacker recorded each key from qwertyuiopasdfghjklzxcvbnm 50 times, then recorded the organizer typing the flag.

Expected format: apoorvctf{decoded_text}


Objective#

Recover the text typed in flag.wav using Reference.wav as labeled training audio.


Initial triage#

I first verified basic metadata.

file "Reference.wav" "flag.wav"
bash

Output:

Reference.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
flag.wav:      RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz
text

Then checked durations and audio parameters in Python:

import wave

for f in ["Reference.wav", "flag.wav"]:
    w = wave.open(f, "rb")
    print(
        f,
        "channels", w.getnchannels(),
        "rate", w.getframerate(),
        "width", w.getsampwidth(),
        "frames", w.getnframes(),
        "duration", w.getnframes() / w.getframerate(),
    )
python

Observed:

  • Reference.wav is long (~304.6 s), consistent with many sample keypresses.
  • flag.wav is short (~12.25 s), consistent with a short typed message.

Attack plan#

  1. Detect keypress onsets in both files using short-term energy.
  2. Build per-letter templates from the reference file.
  3. Classify each keypress in flag.wav by similarity to templates.
  4. Wrap decoded text as apoorvctf{...}.

Important assumption (from prompt):

  • The 1300 reference keypresses are in blocks of 50 per letter in keyboard-order string:
    qwertyuiopasdfghjklzxcvbnm

So labels are:

  • first 50 onsets -> q
  • next 50 -> w
  • last 50 -> m

Solver script (full code used)#

Save as solve.py in the same directory as the WAV files:


Running it#

python3 solve.py
bash

Observed decode:

decoded_raw: ohyougotthisfzrdzmn
text

Interpreting the decode#

Raw acoustic decode is very close to readable English and strongly suggests:

  • ohyougotthisfardamn

Why this is reasonable:

  • Most characters decode cleanly.
  • The uncertain positions are from neighboring keys with similar acoustic signatures.
  • ohyougotthisfardamn is a coherent phrase, while ohyougotthisfzrdzmn is not.

Final flag:

apoorvctf{ohyougotthisfardamn}
text

Notes on robustness#

  • I tested multiple feature-window sizes and pre-onset offsets; the prefix ohyougotthis stayed stable.
  • Using too-small inter-peak gap on flag.wav can create duplicate detections for a single keypress; increasing min_gap_s from 0.12 to 0.16 fixed that.
  • Reference onset detector returned a few extras, so keeping the strongest 1300 events aligns exactly with the expected 26 * 50 samples.
Author on the Run
https://nahil.xyz/vault/writeups/apoorvctf2026/forensics/author-on-the-run/
Author Nahil Rasheed
Published at March 24, 2026
Disclaimer This content is provided strictly for educational purposes only.