Mirrorfall Writeup
Project Mirrorfall - Detailed Writeup#
Challenge#
- Name: PROJECT MIRRORFALL: The Exquisite Dilemma of Offence vs Defence
- Given file:
qn.md - Expected flag format:
apoorvctf{...}
The challenge gives three linked objectives:
- Find the correct Snowden archive PDF and extract a file-specific commit fragment (Variable X).
- Parse the PDF and identify the second ECI codeword after
APERIODIC. - Embed that codeword using
all-MiniLM-L6-v2and extract/round the first value (Variable Y).
Final answer found:
apoorvctf{7d88323_0.0245}
Step 0 - Read the prompt#
Read qn.md:
read qn.mdbashKey clues:
- “public archive serving as an archival mirror for the 2013 intelligence disclosures”
- “raw PDF classification guide dated September 5, 2013”
- “overarching US encryption defeat program”
- “first ECI listed is APERIODIC; find second ECI”
- “use all-MiniLM-L6-v2 and take embedding[0], round 4 decimals”
This strongly points to Snowden document mirrors and specifically the NSA BULLRUN classification guide.
Step 1 - Locate the public archive + target PDF#
Source used#
- GitHub repository:
https://github.com/iamcryptoki/snowden-archive
Repository description matches the prompt: “A collection of all documents leaked by former NSA contractor and whistleblower Edward Snowden.”
Commands#
Search repository candidates:
gh search repos snowden --limit 100bashClone likely mirror:
git clone --depth 1 https://github.com/iamcryptoki/snowden-archive /mnt/Nahil/apoorvctf/ai/snowden-archivebashFind PDFs on the exact target date:
glob "**/20130905*.pdf" /mnt/Nahil/apoorvctf/ai/snowden-archivebashRelevant hits:
documents/2013/20130905-theguardian__sigint_enabling.pdfdocuments/2013/20130905-theguardian__cryptanalysis_classification.pdfdocuments/2013/20130905-theguardian__bullrun.pdf
The “overarching US encryption defeat program” clue maps to BULLRUN.
Step 2 - Extract Variable X (file-specific latest commit SHA prefix)#
The prompt explicitly says not to use repo HEAD, but the latest commit for the exact PDF file.
Used GitHub API by file path:
gh api "repos/iamcryptoki/snowden-archive/commits?path=documents/2013/20130905-theguardian__bullrun.pdf&per_page=5"bashRelevant output field:
sha:7d88323521194ed8598624dc3a932930debdde1d
So:
- Variable X = first 7 chars =
7d88323
Step 3 - Parse PDF and recover second ECI after APERIODIC#
Convert PDF to text and inspect appendix/remarks sections:
pdftotext "/mnt/Nahil/apoorvctf/ai/snowden-archive/documents/2013/20130905-theguardian__bullrun.pdf" -bashImportant extracted lines:
- “Appendix A lists specific BULLRUN capabilities…”
- “Related ECIs include, but are not limited to:”
APERIODIC, AMBULANT, AUNTIE, PAINTEDEAGLE, ...
From the ordered list:
- first ECI =
APERIODIC - second ECI (immediately after) =
AMBULANT
Normalize per prompt:
- normalized codeword =
ambulant
Step 4 - Compute Variable Y using all-MiniLM-L6-v2#
Model requirement#
Prompt requires semantic embedding with all-MiniLM-L6-v2 and:
- input = normalized 8-letter codeword (
ambulant) - output =
embedding[0] - round to 4 decimals
Practical environment note#
sentence-transformers+ fulltorchinstall failed due disk quota.- Used a lighter runtime (
fastembed) that serves the same model family (sentence-transformers/all-MiniLM-L6-v2) and returns the embedding vector directly.
Install:
python3 -m pip install --user fastembedbashCompute embedding:
from fastembed import TextEmbedding
model = TextEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")
vec = next(model.embed(["ambulant"]))
print(vec[0])
print(round(float(vec[0]), 4))pythonObserved value:
vec[0] = 0.024466823750619482- Rounded 4 dp -> Variable Y =
0.0245
Step 5 - Construct and verify flag#
Using X = 7d88323 and Y = 0.0245:
apoorvctf{7d88323_0.0245}
This was accepted by the platform.
Reproducible End-to-End Script#
#!/usr/bin/env python3
import json
import subprocess
from pathlib import Path
from fastembed import TextEmbedding
PDF_PATH = "documents/2013/20130905-theguardian__bullrun.pdf"
REPO = "iamcryptoki/snowden-archive"
def sh(cmd: list[str]) -> str:
return subprocess.check_output(cmd, text=True)
def get_variable_x() -> str:
out = sh([
"gh",
"api",
f"repos/{REPO}/commits?path={PDF_PATH}&per_page=1",
])
data = json.loads(out)
sha = data[0]["sha"]
return sha[:7]
def get_second_eci_from_pdf(local_pdf: Path) -> str:
text = sh(["pdftotext", str(local_pdf), "-"])
# Find the line that starts with APERIODIC and parse comma-separated ECIs.
lines = [ln.strip() for ln in text.splitlines() if "APERIODIC" in ln]
if not lines:
raise RuntimeError("Could not find ECI line containing APERIODIC")
# Example segment: APERIODIC, AMBULANT, AUNTIE, ...
parts = [p.strip() for p in lines[0].replace(".", "").split(",")]
idx = parts.index("APERIODIC")
return parts[idx + 1].lower()
def get_variable_y(codeword: str) -> float:
model = TextEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")
vec = next(model.embed([codeword]))
return round(float(vec[0]), 4)
def main():
x = get_variable_x()
local_pdf = Path("snowden-archive") / PDF_PATH
eci = get_second_eci_from_pdf(local_pdf)
y = get_variable_y(eci)
flag = f"apoorvctf{{{x}_{y:.4f}}}"
print("X:", x)
print("ECI:", eci)
print("Y:", f"{y:.4f}")
print("FLAG:", flag)
if __name__ == "__main__":
main()pythonSources#
- Challenge prompt:
qn.md - Snowden archive mirror:
https://github.com/iamcryptoki/snowden-archive - Target PDF path in mirror:
documents/2013/20130905-theguardian__bullrun.pdf - GitHub commits API for file history:
https://api.github.com/repos/iamcryptoki/snowden-archive/commits?path=documents/2013/20130905-theguardian__bullrun.pdf&per_page=1
- Embedding model reference:
sentence-transformers/all-MiniLM-L6-v2