1) What “data analysis exam” usually means (and how to prepare for the right one)

Before you practice, clarify which exam you’re actually facing. Most frustration comes from preparing for the wrong format.

A. Pre-employment data analyst skills assessment (most common)

Typical focus: SQL + interpretation + stats fundamentals + business judgment (sometimes Excel).
Typical format: 45–90 minutes, auto-graded sections plus 1–2 open-ended prompts.
What’s typically reviewed: correctness, efficiency, and the quality of the reasoning (not just calculations).

B. Certification-style exam

Typical focus: broader coverage, more definitions and governance, sometimes tool-specific.
Typical format: 90–180 minutes, many multiple-choice questions, strict proctoring.
What’s typically reviewed: breadth and standard terminology.

C. Course final / academic exam

Typical focus: theory + methods, often heavier stats and formal proofs/derivations.
Typical format: varies; may include long-form responses.
What’s typically reviewed: methodological correctness and explanation.

If you don’t know which one it is: assume pre-employment and prepare for (1) SQL under time pressure, (2) interpretation and metric choice, (3) experimentation basics.

2) Skills covered (aligned to common role expectations)

This assessment focuses on skill areas that show up frequently in analytics roles and in analytics screening tests. It’s designed as a practical, job-relevant work sample with a structured scoring approach.

Skill domains and weights (recommended)

SQL & Data Retrieval (30%)

Joins, aggregation, HAVING, subqueries/CTEs
Window functions (rank/top-N, rolling metrics)
Date/time bucketing, NULL handling, deduplication
Sanity checks: grain, join explosion, filter logic

Spreadsheet Analysis (15%)

Cleaning, text/date functions
XLOOKUP/INDEX-MATCH, SUMIFS/COUNTIFS
PivotTables, basic chart selection

Statistics & Experimentation (20%)

Distributions, sampling bias, confidence intervals
Hypothesis tests, p-values vs practical significance
A/B testing interpretation, guardrail metrics
Correlation vs causation, basic regression intuition

Business Interpretation & Communication (25%)

KPI definitions, metric selection
Funnel/cohort reasoning, anomaly investigation
Writing: assumptions → method → insight → recommendation

Data Quality, Governance & Ethics (10%)

Data validation checks, reconciliation
Access control and appropriate use
Documenting assumptions and limitations

Why this approach is useful: A structured, rubric-based assessment can make review more consistent and transparent than unstructured evaluation alone.

3) Assessment methodology: the “Archetypes-First” framework

Most exams recycle a small number of patterns. We teach and test those patterns explicitly.

The 12 question archetypes you must master

Top-N per group (e.g., top products per category)
Rolling metrics (7-day moving average)
Funnel conversion (view → add-to-cart → purchase)
Cohort retention (week 0, 1, 2…)
Anomaly/root-cause (drop in conversion)
Deduplication (latest record per entity)
Join grain control (avoiding join explosions)
Segmentation (new vs returning, region, channel)
A/B interpretation (lift, confidence, guardrails)
Metric definition traps (active users, revenue, churn)
Data cleaning triage (missing values, outliers)
Visualization critique (misleading axes, wrong chart)

How we score (transparent and repeatable)

Auto-gradable items: SQL outputs, calculations, MCQ.
Rubric-graded items: written recommendations and critiques.
Partial credit: awarded for correct setup even if the final number is off.

4) Exam blueprint (timed mock structure)

You can run this as a 60–75 minute mock.

SectionTimeQuestion typesWeight
A. SQL25–30 min4 questions30%
B. Excel/Sheets10–12 min2 questions15%
C. Stats/Experimentation15–18 min2 questions20%
D. Business Case (written)12–15 min1 scenario25%
E. Data Quality & Ethics5 min1 scenario10%

Allowed tools (typical in pre-employment tests): basic calculator, scratch paper; sometimes a SQL editor/sandbox; usually no internet. If you’re unsure, practice without external lookup.

5) Sample questions (realistic, exam-style)

Dataset context (lightweight schema)

Assume an e-commerce product with tables:

users(user_id, created_at, country)
events(user_id, event_time, event_type) where event_type in ('view','add_to_cart','purchase')
orders(order_id, user_id, order_time, revenue, coupon_code)
experiments(user_id, experiment_name, variant, assigned_at)

You may assume timestamps are UTC.

Q1 (SQL – join + aggregation)

Prompt: Compute daily revenue and daily purchasing users for the last 14 days.

Expected approach (high level):

Filter by order_time >= current_date - interval '14 days'
Group by date bucket
Count distinct users and sum revenue

Answer format: columns: order_date, revenue, purchasing_users

Q2 (SQL – top-N per group with window function)

Prompt: For each country, return the top 3 coupon codes by total revenue in the last 30 days (exclude NULL coupon codes).

Key skills tested:

Proper date filtering
Excluding NULLs
Window ranking per partition
Avoiding mistakes where LIMIT 3 incorrectly applies globally

Q3 (SQL – funnel conversion with grain control)

Prompt: For users created in the last 60 days, compute the funnel conversion rates:

view → add_to_cart
add_to_cart → purchase
view → purchase

Trap to avoid:

Counting events instead of users
Joining events to users incorrectly (duplicating rows)

Answer format: one row with three rates.

Q4 (SQL – rolling metric)

Prompt: Create a 7-day rolling average of daily revenue for the last 30 days.

Key skills tested:

Build daily revenue first (CTE)
Apply window frame: ROWS BETWEEN 6 PRECEDING AND CURRENT ROW

Q5 (Excel/Sheets – cleaning + formula)

Prompt: You receive a column order_id that sometimes includes leading/trailing spaces and a revenue column imported as text with commas (e.g., "1,234.50").

Provide the formulas to clean both fields.
Provide a formula to compute total revenue for orders where coupon_code = "WELCOME" and country = "US".

Key skills tested: TRIM/CLEAN, VALUE/SUBSTITUTE, SUMIFS.

Q6 (Excel/Sheets – Pivot + chart critique)

Prompt: A dashboard shows a line chart of revenue with the y-axis starting at 950,000 (not zero) and claims “Revenue skyrocketed 40%.”

Explain what’s misleading and how you would redesign it.

Key skills tested: visualization integrity, communication.

Q7 (Stats – confidence interval interpretation)

Prompt: In an A/B test, Variant B shows a +1.2% absolute lift in purchase conversion. The 95% CI for lift is [-0.3%, +2.7%].

What should you conclude?
What would you recommend next?

Key skills tested: statistical significance vs uncertainty, decision framing.

Q8 (Stats – guardrails and multiple comparisons)

Prompt: You tested 10 UI variants and picked the one with the lowest p-value (0.04). Revenue increased slightly, but customer support tickets rose 8%.

Name the statistical and product risks here.
What controls or analysis would you apply?

Key skills tested: multiple testing, guardrails, practical significance.

Q9 (Business case – end-to-end reasoning, written)

Scenario: Yesterday conversion dropped from 3.1% to 2.4%. Marketing says traffic quality didn’t change. Engineering deployed a checkout change.

Prompt: Write a short analysis plan and recommendation (8–12 sentences). Include:

Which metrics you’ll check first (and why)
How you’ll segment
What data-quality checks you’ll run
What action you’d recommend if checkout is the likely root cause

Key skills tested: structured thinking, triage, stakeholder communication.

Q10 (Data governance & ethics)

Prompt: You’re asked to join employee performance ratings to badge-swipe data to “predict who’s likely to quit” and share results with managers.

What questions do you ask before proceeding?
What safeguards would you require?

Key skills tested: appropriate use, privacy, dignity/oversight, governance.

6) Scoring system (transparent rubric)

Total: 100 points. Use this scoring to self-grade or to support a more consistent review process.

Section A: SQL (30 points)

Q1: 7 pts
Q2: 8 pts
Q3: 8 pts
Q4: 7 pts

SQL rubric (per question):

Correctness (60%): right joins, filters, grouping, output
Grain control (20%): avoids duplication, counts correct entity (users vs events)
Clarity (10%): readable CTEs, clear aliases
Efficiency (10%): avoids unnecessary subqueries; reasonable window use

Section B: Excel/Sheets (15 points)

Q5: 9 pts (cleaning + SUMIFS correctness)
Q6: 6 pts (critique + better alternative)

Section C: Stats (20 points)

Q7: 10 pts
Q8: 10 pts

Stats rubric:

Interpretation accuracy (50%)
Decision implications (30%)
Risks/assumptions (20%)

Section D: Business case (25 points)

Write-up rubric (0–5 each):

Problem framing & KPI choice (what matters and why)
Hypothesis quality (plausible causes; not just one guess)
Method & segmentation (how you’ll isolate impact)
Data quality checks (instrumentation, logging, missingness)
Recommendation quality (actionable, proportional, includes next steps)

Section E: Data governance & ethics (10 points)

Rubric:

Identifies sensitive nature and stakeholders (3)
Asks for purpose limitation, access controls, oversight (4)
Proposes safeguards: aggregation, minimization, retention, audit trails (3)

Consistency notes (for employers)

If you use this in hiring, treat results as one input:

Use standardized prompts and time limits.
Score with rubrics, not “gut feel.”
Pair results with structured interviews and relevant work samples.

7) Results interpretation (skill levels that map to roles)

Tier 1: 0–49 (Foundational Gap)

What it means: You can follow tutorials, but you’re not yet reliable under exam pressure. Common issues: join explosions, incorrect grains, shaky experiment interpretation.

Action plan (next 2 weeks):

SQL: master joins + GROUP BY + HAVING; practice 20 problems focused on grain.
Stats: focus on CI interpretation and false positives.
Communication: practice 5 short “insight → recommendation” write-ups.

Role fit today: internship/entry-level only with strong support.

Tier 2: 50–69 (Job-Ready Junior Analyst)

What it means: You can solve standard problems, but may struggle with advanced windows, ambiguous prompts, or clear recommendations.

Action plan (7–14 days):

SQL: top-N per group, rolling metrics, cohort queries.
Excel: SUMIFS/XLOOKUP fluency + PivotTables.
Stats: A/B test interpretation with guardrails.

Role fit today: Junior Data Analyst, Reporting Analyst.

Tier 3: 70–84 (Strong Mid-Level Analyst)

What it means: You’re reliable across domains and can communicate tradeoffs. You likely perform well in many pre-employment screens.

Action plan (7 days):

Focus on speed: reduce time-to-solution with templates.
Add depth: instrumentation checks, bias risks, segmentation strategy.

Role fit today: Product Analyst, BI Analyst, Analytics Engineer (light) depending on tooling.

Tier 4: 85–100 (Senior/Lead-Caliber Signal)

What it means: You show strong analytical judgment, anticipate pitfalls, and produce stakeholder-ready recommendations.

Action plan (ongoing):

Demonstrate impact: publish 2–3 portfolio case studies (clean → analyze → recommend).
Add governance: document assumptions and ethics in projects.

Role fit today: Senior Analyst, Analytics Lead, Audit/Finance Analytics specialist.

8) Professional development roadmap (by tier)

If you scored 0–49: rebuild fundamentals (10 hours/week)

SQL core (week 1): joins, aggregation, distinct counting, date buckets
SQL patterns (week 2): window functions basics, dedup (ROW_NUMBER)
Stats basics: CI intuition, p-values, practical significance
Practice routine: 30-minute timed drills; review mistakes with a checklist

If you scored 50–69: convert competence into consistency

Build an archetype playbook:

top-N per group template
rolling average template
funnel users template
cohort retention template

Start writing 1-page analyses: define metric, method, result, decision.

If you scored 70–84: become faster and more business-relevant

Time-boxed mocks weekly.

Add “sanity checks” to every query:

row counts before/after joins
sum of revenue vs baseline
distribution checks for outliers

Improve executive communication: headline + 3 bullets + risk/next step.

If you scored 85–100: build proof of seniority

Own ambiguity: propose instrumentation improvements, define guardrails.
Mentor others: create reusable SQL snippets and documentation.
Tie analysis to decisions and value (cost, revenue, risk reduction).

9) Benchmarks and standards (what “good” looks like)

Candidate benchmarks (practical)

Passing signal for many screens: ~70+ with no critical weaknesses (e.g., cannot interpret CI).

SQL is a common filter: Many hiring funnels overweight SQL because it’s easy to auto-grade and shows up frequently in day-to-day work.

Employer standards (selection best practice)

Use job-relevant work samples where possible.

Keep scoring consistent with a rubric.

Pair results with structured, role-relevant interviews to support more consistent decision-making.

10) Curated resources (high ROI)

SQL (practice + reference)

Practice: LeetCode SQL (easy→medium), StrataScratch (analytics-focused)

Reference topics to master:

window functions: ROW_NUMBER, RANK, LAG/LEAD, rolling frames
date handling: DATE_TRUNC, EXTRACT, interval arithmetic
NULL behavior: COALESCE, COUNT vs COUNT(*)

Excel/Sheets

Skills to drill:

XLOOKUP (or INDEX/MATCH), SUMIFS/COUNTIFS
PivotTables, calculated fields
Text/date functions: TRIM, SUBSTITUTE, DATEVALUE

Stats/Experimentation

Core concepts:

confidence intervals, statistical vs practical significance
randomization checks, sample ratio mismatch
multiple testing and guardrails

Communication

Framework to use in write-ups:

Claim → Evidence → Impact → Next step

Keep a visible “Assumptions & Limitations” section.

11) Career advancement strategies based on your results

If you’re below 70: win by narrowing the role target

Target roles emphasizing reporting and stakeholder support while you upskill (Reporting Analyst, Junior BI).

Build one portfolio project per archetype (funnel + cohort + A/B interpretation) using a public dataset.

If you’re 70–84: optimize for pass rate and leveling

Apply to Product Analyst / BI Analyst roles.

Prepare a 2-minute narrative for each project:

problem, metric, method, result, decision, business value.

If you’re 85+: signal leadership

Publish a case study that includes:

data quality checks
segmentation strategy
decision recommendation + monitoring plan

In interviews, proactively discuss risks: bias, instrumentation, governance.

12) Exam-day strategy (the difference between “knows” and “passes”)

Time management rules

First pass: answer what you can in <90 seconds per question.
Second pass: tackle multi-step SQL/windows.
Final 5 minutes: sanity checks and formatting.

Quick validation checklist (use every time)

Did I define the grain (per user, per order, per day)?
Did any join increase row counts unexpectedly?
Are NULLs handled intentionally?
Are dates bucketed correctly (off-by-one is a top failure mode)?
Do results look plausible vs baseline?

FAQ (optimized for common queries)

What’s typically on a data analysis exam?

SQL joins/aggregation/window functions, Excel cleaning and pivoting, basic stats and A/B testing interpretation, plus business case reasoning and chart critique.

Is the data analysis exam hard?

It’s hard if you rely on memorization. It’s manageable if you practice the recurring archetypes under time pressure and use a validation checklist.

What do employers look for in open-ended analysis?

Clear metric definitions, correct methods, explicit assumptions, sanity checks, and a recommendation with next steps—not just numbers.

What’s a good score?

For many pre-employment assessments, aim for 70+ with no domain at “critical weakness” (especially SQL grain control and stats interpretation).

‍

Data Analysis Exam Prep: What to Expect + Full Practice Assessment

Why we created this assessment