
Penetration testing is one of the most misunderstood roles in security hiring: many guides list tools and certifications, but far fewer define what “good” looks like in observable, job-relevant behaviors—and almost none show you how to assess those behaviors consistently.
This assessment package helps fill that gap with a skills-first framework you can use to benchmark yourself, prepare for interviews, or evaluate candidates using shared criteria.This page combines three things in one place: (1) a structured competency model for penetration tester skills (core, specialty, and senior-level expectations), (2) realistic scenario prompts that mimic day-to-day engagement decisions (scoping, enumeration, validation, reporting), and (3) a transparent scoring rubric that organizes responses into development bands you can use for discussion and planning. It is methodology-first—anchored to widely referenced practices such as PTES phases and OWASP WSTG categories—so it can translate across tools and environments.
If you’re an aspiring or working pen tester, you’ll come away with a clearer view of strengths and gaps by domain, plus a tailored 30/60/90-day plan that prioritizes fundamentals over shortcuts. If you’re hiring, you’ll get a lab-safe evaluation blueprint designed to support structured interviews and consistent reviewer alignment while avoiding unethical or illegal testing.Use the content in order (matrix → scenarios → scoring → roadmap) or jump straight to the scenarios and score yourself. Either way, the goal is a more structured view of where you stand than a generic “tools you should know” checklist—and a set of concrete prompts to guide next steps.
This assessment measures penetration tester skills as they show up in real engagements: decision-making, methodology, technical depth, and professional communication. It is designed to be job-relevant, time-boxable, and structured.
Intentionally avoided:- “How to hack X” instructions, operational exploit steps, or production-impact techniques.- Any guidance that enables wrongdoing. All practical evaluation is lab/sandbox framed.
Use this matrix as a taxonomy (what to assess) and a rubric anchor (what “proficient” looks like). It separates core skills from specialty depth and makes senior expectations explicit.
Level 1 — Foundation (Entry / Intern scope):
Explains concepts accurately; can follow safe procedures in a lab.
Produces basic notes and can reproduce steps with guidance.
Recognizes common vulnerability classes and can validate low-risk findings.
Level 2 — Practitioner (Junior Pen Tester scope):
Independently executes a scoped workflow in a lab.
Prioritizes enumeration based on evidence and constraints.
Writes clear technical findings with reproducible evidence and practical remediation.
Level 3 — Consultant (Mid-level scope):
Adapts methodology to novel environments; balances depth with timeboxing.
Produces client-ready deliverables: executive summary, risk narrative, and retest results.
Coaches stakeholders on remediation feasibility and verification.
Level 4 — Lead / Red Team (Senior scope):
Designs engagements, leads scoping/ROE, and manages risk to production.
Conducts attack-path analysis and designs objective-driven testing.
Improves team practices: playbooks, reporting standards, and ethical governance.
Below are evidence artifacts (what a strong candidate can show or produce) without requiring unsafe activity.
This package supports two use cases:
Design principles (why this is useful):- Job-relevant coverage: scenarios reflect common work outputs (scoping, enumeration decisions, reporting).- Structured scoring: anchored behaviors support more consistent reviewer calibration.- Work-sample realism: candidates produce artifacts you’d expect on the job.
Important: Use assessment results as one input for structured interviews and calibration. Do not use this content as the sole basis for employment decisions; humans make the final call.
Use these as a 45–75 minute assessment. For hiring, pick 6–8 to fit a 60-minute screen.
Context: You are contracted to test “the customer portal” for AcmeCo. The statement of work says “external web application penetration test,” but the portal uses SSO and third-party payment processing.
Prompt: List the top 8 clarifying questions you would ask before testing begins. Include at least: authentication/SSO, third parties, rate limits, data handling, and safety constraints.
Context: You have a single public IP, one domain, and a 5-day test window. You can run safe scanning.
Prompt: Outline your first 90 minutes of activity: what you collect, what you defer, and what “stop conditions” look like (e.g., signs of production fragility).
Context: You receive the following abbreviated scan summary from a lab segment:- 10.0.2.10: 22/ssh, 80/http, 443/https- 10.0.2.15: 445/smb, 3389/rdp- 10.0.2.20: 53/dns, 88/kerberos, 389/ldap- 10.0.2.25: 1433/mssql
Prompt: Prioritize what you would validate next and why. What are the top 5 risks suggested by this layout, assuming weak hardening?
Context: In the portal, a user can view invoices at /invoice?id=18421. You notice invoice IDs are sequential.
Prompt: Describe a safe validation approach for an IDOR/access control issue, including what evidence you would capture and how you would avoid unauthorized data exposure.
Context: You observe session tokens are long random strings, but the app does not invalidate sessions on logout and allows sessions to persist for 30 days.
Prompt: How would you explain this risk to (a) a product manager and (b) an engineer? Provide remediation guidance and retest criteria.
Context: A scanner flags 40 findings including missing headers, old jQuery, and “potential SQL injection” on three parameters.
Prompt: Describe your triage method. Which items do you validate manually first, and what evidence qualifies as “confirmed” vs “informational”?
Context: In a lab, you are given read-only directory data showing:- A service account is a local admin on multiple servers.- Several users have passwords that never expire.- A helpdesk group can reset passwords for a privileged group.
Prompt: Explain the most likely attack paths and which single control improvement would reduce risk fastest. Justify your choice.
Context: A storage bucket is publicly readable and contains application logs with email addresses and password reset tokens.
Prompt: Write a short finding summary (6–10 sentences) including impact, exposure, evidence expectations, remediation, and logging/IR considerations.
Context: You found three issues: (1) IDOR exposing invoices, (2) weak session invalidation, (3) verbose error messages.
Prompt: Draft a 5–7 sentence executive summary that a CIO would understand, including prioritization and a recommendation.
Context: Mid-test, you discover a path that appears to allow downloading a full customer export, but you are not sure if downloading it violates ROE.
Prompt: What do you do next? Provide the steps you’d take to confirm risk while minimizing harm and staying authorized.
This rubric is built for structured calibration. Score each scenario 0–4 and sum for a total out of 40.
If you want closer job alignment, apply weights:- Fundamentals + Methodology: 30%- Web or Internal/AD (choose based on role): 30%- Reporting/communication: 25%- Ethics/legal safety: 15%
This helps teams balance technical depth with communication and safety signals.
Use these bands as discussion prompts and development planning guidance, not as automatic hiring outcomes:- 0–14 — Foundation focus: prioritize fundamentals + methodology.- 15–24 — Entry scope: can contribute with guidance; tighten reporting and prioritization.- 25–32 — Practitioner scope: can run discrete workstreams with limited oversight; deepen specialization.- 33–37 — Consultant scope: strong end-to-end execution and client-facing reporting.- 38–40 — Advanced indicator: follow up with deeper work samples (scoping, full report review, stakeholder simulation).
Minimum bars (recommended):- Any role: Ethics/ROE questions must average ≥ 2/4.- Client-facing consultant: Reporting questions must average ≥ 3/4.
What it often suggests: you may know tools, but need more consistent reasoning about protocols, evidence, and safe validation.
Focus next:- TCP/IP + DNS + HTTP fundamentals (be able to explain, not just run commands)- Linux navigation and scripting basics- “Methodology muscle memory”: scoping, enumeration, validation, documentation
Career guidance: Consider roles that build operational context (internships, IT support, junior security analyst) while building lab-based evidence.
What it often suggests: you can follow a workflow and identify common issues, but consistency and reporting maturity are uneven.
Focus next:- Evidence quality: reproducible steps, screenshots/requests, exact affected scope- Risk articulation: separate technical severity from business impact- Timeboxing and prioritization: show why you chose next steps
What it often suggests: you can execute meaningful testing without constant direction.
Focus next:- Specialize: Web, Internal/AD, or Cloud.- Retesting discipline: define “fixed means verified” criteria.- Stakeholder communication: deliver clear narratives under time pressure.
What it often suggests: you can produce deliverables that engineers can fix and leaders can act on.
Focus next:- Engagement leadership: scoping calls, ROE negotiation, expectation management- Repeatable quality: templates, checklists, consistent severity rationale- Broader coverage: AD + cloud + modern auth flows
What it often suggests: you demonstrate strong judgment, safety, and communication maturity.
Follow up with:- A full work sample: executive summary + findings from a provided case pack.- A stakeholder simulation: explain tradeoffs to engineering and leadership.
Choose the plan that matches your score band.
Next 30 days:- Master: subnetting basics, DNS flow, HTTP request/response anatomy, cookies/sessions.- Daily Linux reps: files, permissions, processes, networking commands.- Write one script: parse logs or scan output into a table.
Next 60 days:- Build a mini methodology: recon → enumerate → validate → document (on intentionally vulnerable labs).- Practice evidence capture: screenshots, request/response, exact reproduction notes.
Next 90 days:- Produce 2 sanitized findings using a consistent template.- Get peer feedback on clarity and severity rationale.
Next 30 days:- Reporting upgrade: rewrite two findings until they’re readable by engineers.- Learn risk language: impact, likelihood, exposure, compensating controls.
Next 60 days:- Pick a track (Web or Internal/AD) and deepen core patterns.- Practice “triage discipline”: scanner output → validation plan → confirmed vs informational.
Next 90 days:- Build a portfolio: 1 executive summary + 4 findings (sanitized) + a retest note.
Next 30 days:- Timeboxed case practice: 90-minute scenarios where you must prioritize and justify.- Communication reps: explain one issue in 60 seconds (exec) and 3 minutes (engineer).
Next 60 days:- Specialization depth: - Web: access control + auth/session + API testing patterns - Internal/AD: identity misconfig reasoning, segmentation and credential hygiene narratives - Cloud: IAM misconfig patterns and detection-aware write-ups
Next 90 days:- Simulate a full engagement deliverable: scope assumptions → findings → executive summary → retest plan.
Next 30 days:- Standardize quality: templates/checklists for findings, severity rationale, evidence, remediation, retest.
Next 60 days:- Lead-level skills: scoping call script, ROE negotiation checklist, deconfliction playbook.
Next 90 days:- Mentorship and calibration: run a mock assessment with juniors; align scoring across reviewers.
Hiring managers routinely look for signals that your penetration tester skills align to recognized standards—without requiring you to name-drop them.
Methodology alignment (how you work):- PTES-style phases: pre-engagement → intelligence gathering → threat modeling → vulnerability analysis → exploitation → post-exploitation → reporting.- OWASP WSTG mindset: test categories systematically; validate with evidence.
Risk and vulnerability language:- CVE/CVSS for technical severity context, but translate into business impact (data exposure, fraud, downtime).- “Exploitability conditions” and “affected scope” are expected components in strong reports.
Selection best practices (for hiring teams):- Structured rubrics and work samples can improve consistency versus unstructured interviews.- Keep assessments time-bounded and transparent.
These are intentionally “means to mastery,” not a shopping list.
1) 10 min: ROE/scoping questions (Scenario #1 + #10)2) 20 min: enumeration and prioritization (Scenario #2 + #3 + #6)3) 20 min: web or AD depth (choose #4/#5/#7/#8)4) 10 min: executive summary writing (#9)
Decision guidance: Use the outputs to structure follow-up questions on judgment, safety, and reporting clarity. Combine results with additional signals (portfolio/work sample, references, and role-specific interviews) and apply the same criteria consistently across candidates.
If you want to operationalize this internally, convert the scenarios into a shared scorecard, require two independent raters, and review a short writing sample. Used together, those artifacts can support more consistent evaluation and clearer interview conversations.