30 SQL Interview Questions Every Data Analyst Should Know
SQL remains the single most important technical skill for data analysts in 2026. Every major database, data warehouse, and analytics platform speaks SQL. Cloud platforms like Snowflake, BigQuery, and Databricks all use it as their primary query language. If you are interviewing for any role that involves data, you will face SQL questions, and the bar has risen considerably in recent years.
This guide covers 30 questions organized from foundational queries through advanced window functions and optimization. Each question includes the SQL code an interviewer expects to see, an explanation of the correct answer, and insight into why the question is asked in the first place. Work through them sequentially if you are building your skills from scratch, or jump to the sections where you need the most practice.
Basic Queries (Questions 1-8)
1. What is the difference between WHERE and HAVING?
WHERE filters rows before grouping. HAVING filters groups after aggregation.
-- WHERE: filters individual rows
SELECT * FROM orders
WHERE order_date >= '2026-01-01';
-- HAVING: filters aggregated groups
SELECT customer_id, COUNT(*) AS order_count
FROM orders
WHERE order_date >= '2026-01-01'
GROUP BY customer_id
HAVING COUNT(*) > 5;
Why interviewers ask this: It tests whether you understand the SQL execution order. Confusing WHERE and HAVING is one of the most common mistakes analysts make, and it produces silently wrong results rather than errors.
2. What is the difference between DISTINCT and GROUP BY?
Both can remove duplicates, but they serve different purposes. DISTINCT removes duplicate rows from the result set. GROUP BY groups rows for aggregation.
-- DISTINCT: unique values
SELECT DISTINCT department FROM employees;
-- GROUP BY: aggregation per group
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
Why interviewers ask this: They want to see that you use each tool for its intended purpose rather than using GROUP BY as a clumsy substitute for DISTINCT.
3. How do NULL values behave in SQL?
NULL represents the absence of a value. It is not zero, not an empty string, and not false. Any comparison with NULL returns NULL (not true or false).
-- This returns NO rows, even if some salaries are NULL
SELECT * FROM employees WHERE salary = NULL;
-- Correct way to check for NULL
SELECT * FROM employees WHERE salary IS NULL;
SELECT * FROM employees WHERE salary IS NOT NULL;
-- NULL in aggregations: most functions ignore NULLs
SELECT AVG(salary) FROM employees; -- NULLs excluded from calculation
SELECT COUNT(salary) FROM employees; -- Counts non-NULL values only
SELECT COUNT(*) FROM employees; -- Counts all rows including NULLs
Why interviewers ask this: Mishandling NULLs is a leading cause of incorrect query results in production. If you do not understand NULL behavior, your analyses cannot be trusted.
4. What is the order of execution in a SQL query?
SQL does not execute in the order you write it. The logical execution order is:
- FROM and JOIN — Identify the source tables
- WHERE — Filter individual rows
- GROUP BY — Create groups
- HAVING — Filter groups
- SELECT — Choose columns and compute expressions
- DISTINCT — Remove duplicates
- ORDER BY — Sort the result
- LIMIT / OFFSET — Restrict rows returned
SELECT department, AVG(salary) AS avg_salary -- Step 5
FROM employees -- Step 1
WHERE hire_date >= '2024-01-01' -- Step 2
GROUP BY department -- Step 3
HAVING AVG(salary) > 70000 -- Step 4
ORDER BY avg_salary DESC -- Step 7
LIMIT 10; -- Step 8
Why interviewers ask this: Understanding execution order explains why you cannot use a column alias in WHERE, why HAVING exists separately from WHERE, and how to debug queries that produce unexpected results.
5. What is the difference between DELETE, TRUNCATE, and DROP?
- DELETE removes specific rows and can be rolled back. It fires triggers.
- TRUNCATE removes all rows but keeps the table structure. It is faster and typically cannot be rolled back.
- DROP removes the entire table including its structure, data, and indexes.
-- Remove specific rows
DELETE FROM orders WHERE status = 'cancelled';
-- Remove all rows, keep table structure
TRUNCATE TABLE temp_results;
-- Remove the entire table
DROP TABLE temp_results;
Why interviewers ask this: They want to know you understand the severity of each operation before running it on a production database.
6. Explain the difference between UNION and UNION ALL.
UNION combines result sets and removes duplicates. UNION ALL combines result sets and keeps all rows including duplicates.
-- UNION: removes duplicates (slower, requires sorting)
SELECT city FROM customers
UNION
SELECT city FROM suppliers;
-- UNION ALL: keeps duplicates (faster)
SELECT city FROM customers
UNION ALL
SELECT city FROM suppliers;
Why interviewers ask this: Using UNION when you mean UNION ALL wastes performance. Using UNION ALL when you need UNION produces incorrect counts. Both mistakes are common.
7. How do you use CASE statements?
CASE adds conditional logic to SQL queries, functioning like if/else statements.
SELECT
employee_name,
salary,
CASE
WHEN salary >= 120000 THEN 'Senior'
WHEN salary >= 80000 THEN 'Mid-Level'
WHEN salary >= 50000 THEN 'Junior'
ELSE 'Entry Level'
END AS salary_band,
CASE department
WHEN 'Engineering' THEN 'Tech'
WHEN 'Data Science' THEN 'Tech'
ELSE 'Non-Tech'
END AS division
FROM employees
ORDER BY salary DESC;
Why interviewers ask this: CASE statements are essential for creating calculated fields, bucketing data, and building reports. Analysts use them constantly.
8. How do you find duplicate records?
-- Find duplicate emails
SELECT email, COUNT(*) AS occurrence_count
FROM customers
GROUP BY email
HAVING COUNT(*) > 1;
-- Find and display the full duplicate rows
SELECT c.*
FROM customers c
JOIN (
SELECT email
FROM customers
GROUP BY email
HAVING COUNT(*) > 1
) dupes ON c.email = dupes.email
ORDER BY c.email;
Why interviewers ask this: Data quality is a core analyst responsibility. Identifying duplicates is one of the first things you do when exploring a new dataset.
Joins (Questions 9-14)
9. Explain the different types of JOINs.
- INNER JOIN — Returns only rows that have matches in both tables.
- LEFT JOIN — Returns all rows from the left table and matching rows from the right. Non-matches get NULL.
- RIGHT JOIN — Returns all rows from the right table and matching rows from the left.
- FULL OUTER JOIN — Returns all rows from both tables. Non-matches on either side get NULL.
- CROSS JOIN — Returns the Cartesian product of both tables (every combination).
-- INNER JOIN
SELECT o.order_id, c.customer_name
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id;
-- LEFT JOIN
SELECT c.customer_name, o.order_id
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;
-- FULL OUTER JOIN
SELECT c.customer_name, o.order_id
FROM customers c
FULL OUTER JOIN orders o ON c.customer_id = o.customer_id;
Why interviewers ask this: Choosing the wrong join type is a silent error that produces incorrect results without any error message. This is the single most important SQL concept to master.
10. How do you write a self-join?
A self-join joins a table to itself. It is commonly used for hierarchical data or comparing rows within the same table.
-- Find employees and their managers
SELECT
e.employee_name AS employee,
m.employee_name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id;
-- Find customers in the same city
SELECT
a.customer_name AS customer_1,
b.customer_name AS customer_2,
a.city
FROM customers a
JOIN customers b ON a.city = b.city AND a.customer_id < b.customer_id;
Why interviewers ask this: Self-joins reveal whether you can think about a single table in two roles simultaneously, a skill that reflects deeper SQL fluency.
11. What is the difference between a JOIN and a subquery?
Both can achieve similar results, but they have different performance characteristics and readability tradeoffs.
-- Subquery approach
SELECT customer_name
FROM customers
WHERE customer_id IN (
SELECT customer_id
FROM orders
WHERE total > 1000
);
-- JOIN approach (often more efficient)
SELECT DISTINCT c.customer_name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.total > 1000;
Why interviewers ask this: The ability to choose between joins and subqueries based on readability and performance shows analytical maturity.
12. How do you handle many-to-many relationships?
Many-to-many relationships require a junction table (also called a bridge table or association table).
-- Students and courses (many-to-many)
-- Junction table: enrollments
SELECT s.student_name, c.course_name
FROM students s
JOIN enrollments e ON s.student_id = e.student_id
JOIN courses c ON e.course_id = c.course_id
WHERE c.department = 'Computer Science';
-- Count courses per student
SELECT s.student_name, COUNT(e.course_id) AS course_count
FROM students s
LEFT JOIN enrollments e ON s.student_id = e.student_id
GROUP BY s.student_name;
Why interviewers ask this: Understanding junction tables is fundamental to working with relational databases. Every non-trivial schema has at least one many-to-many relationship.
13. What is the difference between EXISTS and IN?
IN checks if a value is in a list of values. EXISTS checks if a subquery returns any rows.
-- IN: checks membership in a result set
SELECT customer_name
FROM customers
WHERE customer_id IN (
SELECT customer_id FROM orders WHERE total > 1000
);
-- EXISTS: checks for existence (often faster with large subqueries)
SELECT customer_name
FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.customer_id AND o.total > 1000
);
Why interviewers ask this: EXISTS often outperforms IN with large subqueries because it can short-circuit after finding the first match. Knowing when to use each shows optimization awareness.
14. How do you write an anti-join (find rows with no match)?
An anti-join finds rows in one table that have no corresponding rows in another.
-- Method 1: LEFT JOIN with NULL check (most common)
SELECT c.customer_name
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.customer_id IS NULL;
-- Method 2: NOT EXISTS
SELECT c.customer_name
FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id
);
-- Method 3: NOT IN (be careful with NULLs)
SELECT customer_name
FROM customers
WHERE customer_id NOT IN (
SELECT customer_id FROM orders WHERE customer_id IS NOT NULL
);
Why interviewers ask this: Finding "customers who never ordered" or "products never sold" is a common business question. The LEFT JOIN + IS NULL pattern is something you will write frequently.
Aggregation and Grouping (Questions 15-20)
15. How do you calculate a running total?
SELECT
order_date,
daily_revenue,
SUM(daily_revenue) OVER (ORDER BY order_date) AS running_total
FROM daily_sales
ORDER BY order_date;
Why interviewers ask this: Running totals are fundamental to financial reporting and trend analysis. This question also tests whether you know window functions or would resort to a clumsy self-join.
16. How do you find the top N rows per group?
-- Top 3 highest-paid employees per department
WITH ranked AS (
SELECT
employee_name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
)
SELECT employee_name, department, salary
FROM ranked
WHERE rn <= 3;
Why interviewers ask this: This is one of the most frequently asked SQL interview questions because it combines grouping, ranking, and filtering in a way that requires CTEs or subqueries.
17. What is the difference between COUNT(*), COUNT(column), and COUNT(DISTINCT column)?
SELECT
COUNT(*) AS total_rows, -- Counts all rows (including NULLs)
COUNT(email) AS non_null_emails, -- Counts rows where email is NOT NULL
COUNT(DISTINCT email) AS unique_emails -- Counts unique non-NULL email values
FROM customers;
Why interviewers ask this: Choosing the wrong COUNT variant silently produces incorrect numbers. This distinction is critical for accurate reporting.
18. How do you calculate percentage of total within groups?
SELECT
department,
employee_name,
salary,
ROUND(100.0 * salary / SUM(salary) OVER (PARTITION BY department), 2) AS pct_of_dept,
ROUND(100.0 * salary / SUM(salary) OVER (), 2) AS pct_of_total
FROM employees
ORDER BY department, salary DESC;
Why interviewers ask this: Percentage-of-total calculations appear in virtually every business report. The window function approach is cleaner and more performant than using a subquery.
19. How do you pivot data in SQL?
-- Pivot monthly sales into columns
SELECT
product_name,
SUM(CASE WHEN month = 'Jan' THEN revenue ELSE 0 END) AS jan_revenue,
SUM(CASE WHEN month = 'Feb' THEN revenue ELSE 0 END) AS feb_revenue,
SUM(CASE WHEN month = 'Mar' THEN revenue ELSE 0 END) AS mar_revenue
FROM monthly_sales
GROUP BY product_name;
-- Some databases support PIVOT syntax directly
-- SQL Server example:
-- SELECT * FROM monthly_sales
-- PIVOT (SUM(revenue) FOR month IN ([Jan], [Feb], [Mar])) AS pvt;
Why interviewers ask this: Pivoting data is a common reporting requirement that tests your ability to use CASE with aggregation creatively.
20. How do you calculate year-over-year growth?
WITH yearly AS (
SELECT
EXTRACT(YEAR FROM order_date) AS year,
SUM(total) AS annual_revenue
FROM orders
GROUP BY EXTRACT(YEAR FROM order_date)
)
SELECT
year,
annual_revenue,
LAG(annual_revenue) OVER (ORDER BY year) AS prev_year_revenue,
ROUND(
100.0 * (annual_revenue - LAG(annual_revenue) OVER (ORDER BY year))
/ LAG(annual_revenue) OVER (ORDER BY year),
2
) AS yoy_growth_pct
FROM yearly
ORDER BY year;
Why interviewers ask this: Year-over-year metrics are among the most important KPIs in any business. This question tests CTEs, window functions, and your ability to handle NULL for the first year.
Window Functions (Questions 21-25)
21. Explain the difference between ROW_NUMBER, RANK, and DENSE_RANK.
All three assign a number to each row within a partition, but they handle ties differently.
SELECT
employee_name,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num, -- 1, 2, 3, 4, 5
RANK() OVER (ORDER BY salary DESC) AS rank_val, -- 1, 2, 2, 4, 5
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_val -- 1, 2, 2, 3, 4
FROM employees;
- ROW_NUMBER — Always unique, arbitrary for ties.
- RANK — Same rank for ties, skips subsequent numbers.
- DENSE_RANK — Same rank for ties, does not skip numbers.
Why interviewers ask this: This tests precision in your thinking. Choosing the wrong ranking function changes your results in subtle ways that are difficult to catch.
22. How do you use LAG and LEAD?
LAG accesses a row before the current row. LEAD accesses a row after the current row.
SELECT
order_date,
daily_revenue,
LAG(daily_revenue, 1) OVER (ORDER BY order_date) AS prev_day_revenue,
LEAD(daily_revenue, 1) OVER (ORDER BY order_date) AS next_day_revenue,
daily_revenue - LAG(daily_revenue, 1) OVER (ORDER BY order_date) AS day_over_day_change
FROM daily_sales;
Why interviewers ask this: LAG and LEAD are essential for time-series analysis, calculating changes between periods, and identifying trends, all core analyst tasks.
23. What is a window frame specification?
A window frame defines which rows relative to the current row are included in the window calculation.
SELECT
order_date,
daily_revenue,
-- 7-day moving average
AVG(daily_revenue) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7d,
-- Cumulative sum
SUM(daily_revenue) OVER (
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_revenue
FROM daily_sales;
Frame options include:
ROWS BETWEEN n PRECEDING AND CURRENT ROWROWS BETWEEN CURRENT ROW AND n FOLLOWINGROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWRANGE BETWEEN(value-based rather than row-based)
Why interviewers ask this: Moving averages and rolling calculations are critical for dashboards and trend analysis. Understanding frame specifications separates intermediate analysts from advanced ones.
24. How do you use NTILE?
NTILE(n) divides rows into n approximately equal groups and assigns a group number.
-- Divide customers into quartiles by total spending
SELECT
customer_name,
total_spending,
NTILE(4) OVER (ORDER BY total_spending DESC) AS spending_quartile
FROM (
SELECT customer_name, SUM(total) AS total_spending
FROM orders
GROUP BY customer_name
) customer_totals;
Why interviewers ask this: Segmenting customers, products, or regions into percentile-based tiers is a standard analytics task.
25. How do you calculate a median in SQL?
SQL does not have a built-in MEDIAN function in standard SQL, so you need a workaround.
-- Method 1: Using PERCENTILE_CONT (available in PostgreSQL, SQL Server, Oracle)
SELECT
department,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary
FROM employees
GROUP BY department;
-- Method 2: Using ROW_NUMBER (works everywhere)
WITH ranked AS (
SELECT
salary,
ROW_NUMBER() OVER (ORDER BY salary) AS rn,
COUNT(*) OVER () AS total
FROM employees
)
SELECT AVG(salary) AS median_salary
FROM ranked
WHERE rn IN (FLOOR((total + 1) / 2.0), CEIL((total + 1) / 2.0));
Why interviewers ask this: Median is resistant to outliers, making it more useful than mean for skewed distributions. The fact that SQL does not have a standard MEDIAN function tests your problem-solving ability.
Query Optimization (Questions 26-30)
26. What is an execution plan and how do you read one?
An execution plan shows the steps the database engine takes to execute your query.
-- PostgreSQL
EXPLAIN ANALYZE
SELECT c.customer_name, COUNT(o.order_id) AS order_count
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_name;
-- MySQL
EXPLAIN
SELECT c.customer_name, COUNT(o.order_id) AS order_count
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_name;
Key things to look for in an execution plan:
- Seq Scan vs. Index Scan — Sequential scans read every row; index scans are faster.
- Estimated vs. actual rows — Large discrepancies indicate stale statistics.
- Sort operations — These are expensive and might indicate missing indexes.
- Nested loops vs. hash joins — The join algorithm choice affects performance dramatically.
Why interviewers ask this: Analysts who can read execution plans can troubleshoot slow queries instead of escalating to engineering.
27. How do indexes work and when should you create them?
An index is a data structure that speeds up data retrieval at the cost of slower writes and additional storage.
-- Create an index on a frequently filtered column
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
-- Composite index for queries that filter on multiple columns
CREATE INDEX idx_orders_date_status ON orders(order_date, status);
-- Check existing indexes (PostgreSQL)
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'orders';
Create indexes on columns that appear in:
- WHERE clauses
- JOIN conditions
- ORDER BY clauses
- GROUP BY clauses (sometimes)
Do not over-index. Every index slows down INSERT, UPDATE, and DELETE operations.
Why interviewers ask this: Understanding indexes is the most impactful optimization knowledge an analyst can have.
28. What are CTEs and why use them over subqueries?
A Common Table Expression (CTE) is a named temporary result set defined with the WITH keyword.
-- CTE approach: readable and maintainable
WITH monthly_revenue AS (
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(total) AS revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
),
revenue_with_growth AS (
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue
FROM monthly_revenue
)
SELECT
month,
revenue,
ROUND(100.0 * (revenue - prev_month_revenue) / prev_month_revenue, 2) AS mom_growth
FROM revenue_with_growth
WHERE prev_month_revenue IS NOT NULL;
CTEs are preferred over nested subqueries because they are easier to read, debug, and maintain. Some databases also support recursive CTEs for hierarchical data.
Why interviewers ask this: CTEs are the standard way to write complex analytical queries in modern SQL. Interviewers want to see that you write clean, maintainable SQL.
29. How do you optimize a slow query?
A systematic approach to query optimization:
- Read the execution plan using EXPLAIN ANALYZE.
- Check for missing indexes on columns used in WHERE, JOIN, and ORDER BY.
- **Avoid SELECT * ** — only retrieve the columns you need.
- Reduce the dataset early — apply filters in WHERE before joining.
- Replace correlated subqueries with JOINs or window functions.
- Use EXISTS instead of IN for large subqueries.
- Avoid functions on indexed columns in WHERE clauses.
-- Slow: function on indexed column prevents index usage
SELECT * FROM orders WHERE YEAR(order_date) = 2026;
-- Fast: range comparison uses the index
SELECT * FROM orders
WHERE order_date >= '2026-01-01' AND order_date < '2027-01-01';
-- Slow: SELECT *
SELECT * FROM orders WHERE customer_id = 42;
-- Fast: only needed columns
SELECT order_id, order_date, total FROM orders WHERE customer_id = 42;
Why interviewers ask this: Slow queries cost money (compute resources) and hurt user experience. An analyst who can optimize their own queries is significantly more valuable.
30. What is the difference between a materialized view and a regular view?
A regular view is a saved query that executes every time you reference it. A materialized view stores the query results physically and must be refreshed to reflect new data.
-- Regular view: runs the query each time
CREATE VIEW customer_summary AS
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(total) AS lifetime_value
FROM orders
GROUP BY customer_id;
-- Materialized view: stores results, must be refreshed (PostgreSQL)
CREATE MATERIALIZED VIEW customer_summary_mv AS
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(total) AS lifetime_value
FROM orders
GROUP BY customer_id;
-- Refresh when data changes
REFRESH MATERIALIZED VIEW customer_summary_mv;
Use regular views for queries that must always return current data. Use materialized views for expensive queries that do not need real-time accuracy, such as dashboards that refresh hourly.
Why interviewers ask this: This tests your understanding of the tradeoff between query performance and data freshness, a fundamental concept in analytics engineering.
Preparation Tips
SQL interviews for data analyst roles in 2026 typically involve a live coding component where you write queries against a sample database or in a shared editor. Here is how to prepare effectively.
- Practice on real datasets. Use platforms like LeetCode, HackerRank, or StrataScratch for SQL-specific problems.
- Learn your target database. PostgreSQL, MySQL, SQL Server, and BigQuery all have syntax differences. Know which one the company uses and practice with it.
- Write clean, readable SQL. Use uppercase for keywords, consistent indentation, and meaningful aliases. Interviewers judge style as well as correctness.
- Explain your approach before writing. Talk through the tables you need, the join strategy, and the expected output before you start typing.
- Test edge cases mentally. What happens with NULLs? What if there are duplicates? What if a group has zero rows?
Learn data skills in our free Python for Business Beginners textbook.