跳到主要内容

SQL Interview Questions - Answers & Explanations


🟢 1. Database Design & Keys

What is a primary key?

Interview Answer: A primary key is a column (or combination of columns) that uniquely identifies each row in a table. It enforces UNIQUE and NOT NULL constraints. A table can have only one primary key.

Rich Explanation: The primary key is the cornerstone of relational database design. It serves as the unique identifier for each record, enabling referential integrity through foreign key relationships. In SQL Server, creating a primary key automatically creates a unique clustered index by default (unless specified otherwise), which physically orders the data on disk. This means lookups by primary key are extremely fast.


What is a foreign key?

Interview Answer: A foreign key is a column (or set of columns) in one table that references the primary key of another table. It maintains referential integrity by ensuring that the value in the foreign key column exists in the referenced table.

Rich Explanation: Foreign keys create relationships between tables and enforce referential integrity. For example, an Orders table might have CustomerId as a foreign key referencing Customers(CustomerId). You cannot insert an order with a non-existent CustomerId. You can configure cascade behaviors (CASCADE, SET NULL, NO ACTION) for updates and deletes to define what happens when the parent row is modified.


Can a table exist without a PK?

Interview Answer: Yes, technically a table can exist without a primary key. Such tables are called heap tables. However, it's a bad practice for most scenarios because you lose uniqueness guarantees, referential integrity, and query optimization benefits.

Rich Explanation: Without a primary key, the table becomes a "heap"—data is stored without any logical ordering. Heap tables can be useful for staging tables, bulk imports, or logging tables where uniqueness isn't critical. But for transactional/business tables, always define a primary key for data integrity and performance.


Composite key — pros/cons?

Interview Answer: A composite key uses multiple columns together to uniquely identify a row. Pros: Naturally models many-to-many relationships (e.g., StudentId + CourseId in enrollments), avoids surrogate keys when natural keys exist. Cons: Larger index size, slower joins, harder to reference in foreign keys, more complex to use in application code.

Rich Explanation: Use composite keys when the combination of columns naturally defines uniqueness—like (OrderId, LineItemId) or junction tables. For high-traffic tables, a single-column surrogate key (e.g., Id) is often preferred for simplicity and performance.


Why should PK not change?

Interview Answer: Because the primary key is referenced by foreign keys in other tables. Changing it would require cascading updates to all dependent tables, causing locks, potential downtime, and risk of data inconsistency.

Rich Explanation: Primary keys should be immutable (unchanged after creation). Use surrogate keys (auto-increment IDs or GUIDs) rather than natural keys (e.g., SSN, email) when the natural key might change. Changing a PK in production is a major migration effort.


Difference between PK and UK?

Interview Answer: A Primary Key uniquely identifies each row, enforces NOT NULL, and a table can have only one PK. A Unique Key (UK) also enforces uniqueness but allows NULL (in SQL Server, one NULL is allowed per unique constraint). A table can have multiple unique keys.

Rich Explanation: Use UK for alternate identifiers (e.g., Email, SSN) that must be unique but aren't the main identifier. Both PK and UK create indexes that support fast lookups.


Can PK be non-clustered?

Interview Answer: Yes. In SQL Server, you can create a primary key as non-clustered with PRIMARY KEY NONCLUSTERED. You'd do this when you want a different column to be the clustered index (e.g., a heavily queried date column for range scans).

Rich Explanation: The clustered index determines physical data order. If your main access pattern benefits from a different column order (e.g., CreatedDate for time-based queries), you might make the PK non-clustered and create a clustered index on that column instead.


🔵 2. Constraints & Data Types

Constraints in SQL Server?

Interview Answer: Constraints enforce data integrity. Main types: PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK (e.g., Age > 0), NOT NULL, DEFAULT.

Rich Explanation: Constraints protect data at the database level. CHECK constraints validate conditions. DEFAULT provides values for null inserts. Constraints are evaluated on INSERT/UPDATE; violations cause the statement to fail.


Types in SQL Server?

Interview Answer: Exact numerics: INT, BIGINT, SMALLINT, DECIMAL, NUMERIC. Approximate: FLOAT, REAL. Character: CHAR, VARCHAR, NVARCHAR (Unicode). Date/Time: DATE, TIME, DATETIME, DATETIME2. Binary: BINARY, VARBINARY. Other: UNIQUEIDENTIFIER, XML, JSON.

Rich Explanation: Prefer NVARCHAR for user text (Unicode). Use DECIMAL for money—never FLOAT. DATETIME2 is preferred over DATETIME for precision and range.


What is IDENTITY column?

Interview Answer: An IDENTITY column auto-generates sequential values (1, 2, 3, ...) on insert. Defined as Id INT IDENTITY(1,1) PRIMARY KEY. The database manages the value; you don't insert it.

Rich Explanation: IDENTITY is commonly used for surrogate primary keys. Use SCOPE_IDENTITY() to get the last inserted value in the current scope. For distributed systems, consider sequences or GUIDs instead, as IDENTITY can cause gaps and isn't suitable for merging data across servers.


🟣 3. SQL Query Basics

Written order for SQL query?

Interview Answer:
SELECTFROMWHEREGROUP BYHAVINGORDER BY

Rich Explanation: This is the logical order you write the query. Keywords must appear in this sequence (with optional DISTINCT, TOP, etc. in SELECT).


Execution order for SQL query?

Interview Answer:
FROMJOINWHEREGROUP BYHAVINGSELECTDISTINCTORDER BYTOP

Rich Explanation: The database executes in this order. That's why you cannot use column aliases from SELECT in WHERE (WHERE runs before SELECT). You can use them in ORDER BY, which runs after SELECT.


What is DISTINCT?

Interview Answer: DISTINCT removes duplicate rows from the result set. It applies to all columns in the SELECT list and is evaluated after SELECT.

Rich Explanation: SELECT DISTINCT Country FROM Customers returns unique countries. DISTINCT can be expensive on large datasets—ensure you're not masking design issues (e.g., unexpected duplicates).


Difference between WHERE and HAVING?

Interview Answer: WHERE filters rows before aggregation (GROUP BY). HAVING filters after aggregation. You use HAVING when the filter condition involves aggregate functions like COUNT, SUM, AVG.

Rich Explanation:

  • WHERE Age > 18 — filters individual rows.
  • HAVING COUNT(*) > 5 — filters groups.
    You cannot use aggregate functions in WHERE; you must use HAVING for that.

Difference between DELETE vs TRUNCATE vs DROP?

Interview Answer:

DELETETRUNCATEDROP
ScopeRows (can have WHERE)All rowsTable + structure
TriggersFiresDoes NOT fireN/A
IdentityResets? NoResetsN/A
LoggingRow-by-rowPage deallocationsDDL
RollbackPossible (with transaction)PossiblePossible (before commit)

Rich Explanation: Use DELETE for selective removal. Use TRUNCATE to quickly remove all rows (faster, minimal logging). Use DROP to remove the entire table. TRUNCATE cannot be used if the table is referenced by a foreign key.


🟡 4. Normalization & Data Modeling

3NF (Third Normal Form)

Interview Answer: 3NF means: (1) Table is in 2NF. (2) No transitive dependency—every non-key attribute must depend only on the primary key, not on other non-key attributes.

Rich Explanation: In 2NF, non-key attributes depend on the whole primary key. In 3NF, they must not depend on other non-key attributes. Example: If CustomerCity depends on CustomerId, and CityTaxRate depends on CustomerCity, then CityTaxRate has a transitive dependency. Move it to a Cities table. 3NF reduces redundancy and update anomalies.


🔴 5. Indexing

What is an index?

Interview Answer: An index is a data structure (typically B-tree) that speeds up data retrieval by maintaining a sorted copy of key columns. Think of it like a book index—you find the page number instead of scanning every page.

Rich Explanation: Indexes trade write performance (slower INSERT/UPDATE/DELETE) for read performance. Use them on columns in WHERE, JOIN, and ORDER BY clauses. Too many indexes can slow down writes.


Clustered vs non-clustered index

Interview Answer: A clustered index determines the physical order of data in the table; there is one per table. A non-clustered index is a separate structure pointing to the data; a table can have many. Non-clustered indexes include the clustered key (or row ID for heaps) to locate the full row.

Rich Explanation: Clustered index = the table is sorted by that key. Non-clustered = separate B-tree with pointers to the data. Range scans (e.g., WHERE Date BETWEEN ...) benefit from clustered index on that column.


Is PK always clustered?

Interview Answer: No. In SQL Server, the primary key creates a clustered index by default, but you can define it as PRIMARY KEY NONCLUSTERED to make it non-clustered.

Rich Explanation: Some designs use a non-clustered PK (e.g., GUID) and a clustered index on a different column (e.g., CreatedDate) for better range query performance.


What is a heap table?

Interview Answer: A heap is a table without a clustered index. Data is stored in no particular order. SQL Server uses a heap when no clustered index exists.

Rich Explanation: Heaps can be acceptable for staging or log tables. For transactional tables, a clustered index is usually better for performance.


What happens without clustered index?

Interview Answer: The table becomes a heap. Full table scans are required for unsorted access. Non-clustered indexes use a row identifier (RID) instead of the clustered key to find rows.

Rich Explanation: Without a clustered index, range scans and ordered retrieval are less efficient. Heaps can also lead to forwarding pointers when rows are updated and no longer fit on the original page.


What is composite index?

Interview Answer: A composite index includes multiple columns (e.g., (LastName, FirstName)). The column order matters—the index is sorted by the first column, then the second within each first-column value.

Rich Explanation: A composite index on (A, B) can be used for queries filtering on A or (A, B) but not efficiently for B alone.


Why column order matters?

Interview Answer: The leftmost column has the highest selectivity in the index. Put the most selective, most frequently filtered column first. The index (Country, City) helps WHERE Country = 'US' and WHERE Country = 'US' AND City = 'NYC' but not WHERE City = 'NYC' alone.

Rich Explanation: Follow the "leftmost prefix" rule. Order columns by equality filters first, then range filters, then columns for covering/sorting.


What is covering index?

Interview Answer: A covering index includes all columns needed by a query in the index itself. The query can be satisfied without going back to the table (no "key lookup").

Rich Explanation: Use INCLUDE to add columns to the leaf level of a non-clustered index: CREATE INDEX IX ON Orders(CustomerId) INCLUDE (OrderDate, Total). If a query only needs these columns, it's a "covered" query—very fast.


What is index fragmentation?

Interview Answer: Fragmentation occurs when logical order of index pages doesn't match physical order, or when pages are not full. It happens after many INSERT/UPDATE/DELETE operations and can slow down scans.

Rich Explanation: Use ALTER INDEX ... REBUILD or REORGANIZE to reduce fragmentation. Rebuild is more thorough but requires more resources. Monitor with sys.dm_db_index_physical_stats.


What situation would disable index?

Interview Answer: An index may not be used (effectively disabled) in situations where the query does not match the index's columns or order, when the SQL optimizer decides a full table scan is cheaper, or when functions/wildcards are applied to indexed columns (e.g., WHERE LOWER(Name) = 'bob'). Indexes can also become unusable if they are disabled or dropped by an administrator.

Rich Explanation: Indexes are not used when:

  • The WHERE condition cannot leverage the index (e.g., functions, wildcards, or columns not in the index).
  • The table is too small—scanning is faster than using the index.
  • Statistics are out of date or missing, causing poor optimizer choices.
  • The index is explicitly disabled (ALTER INDEX ... DISABLE) or not maintained (fragmented).
  • For filtered indexes, when query conditions don't match the filter. Monitor the actual execution plan to confirm when indexes are being used or skipped.

Why GUID is bad as clustered index?

Interview Answer: GUIDs are random (with NEWID()), causing constant page splits and fragmentation. Inserts go to random pages instead of the end, leading to poor write performance and wasted space.

Rich Explanation: Use NEWSEQUENTIALID() for GUIDs if you must use them as clustered keys, or make the PK non-clustered and cluster on a sequential column like CreatedDate.


How to design indexes?

Interview Answer: (1) Index columns in WHERE, JOIN, ORDER BY. (2) Prefer narrow, selective columns. (3) Use covering indexes for hot queries. (4) Avoid over-indexing (balance reads vs writes). (5) Monitor with execution plans and wait stats.

Rich Explanation: Design indexes based on actual query patterns. Use Query Store and execution plans to identify missing indexes. Consider filtered indexes for subset queries (e.g., WHERE Status = 'Active').


🟠 6. Transactions & Concurrency

ACID

Interview Answer: ACID guarantees for transactions: Atomicity (all or nothing), Consistency (valid state before and after), Isolation (transactions don't see each other's uncommitted changes), Durability (committed data persists after crash).

Rich Explanation: These properties ensure reliability. Isolation is configurable via isolation levels (READ UNCOMMITTED to SERIALIZABLE).


What is a transaction?

Interview Answer: A transaction is a logical unit of work—one or more SQL statements that either all succeed (COMMIT) or all fail (ROLLBACK). It ensures data consistency.

Rich Explanation: BEGIN TRANSACTION ... COMMIT or ROLLBACK. Use transactions for operations that must be atomic (e.g., transferring money between accounts).


COMMIT vs ROLLBACK

Interview Answer: COMMIT saves changes permanently. ROLLBACK undoes all changes since the last COMMIT or BEGIN TRANSACTION, restoring the previous state.

Rich Explanation: After COMMIT, changes are durable. After ROLLBACK, the transaction's changes are discarded. Use ROLLBACK in exception handlers to ensure consistency.


What is isolation level?

Interview Answer: Isolation level controls how transactions see each other's data. Levels (low to high): READ UNCOMMITTED, READ COMMITTED (default), REPEATABLE READ, SNAPSHOT, SERIALIZABLE. Higher isolation = less concurrency, fewer anomalies.

Rich Explanation: READ COMMITTED prevents dirty reads. REPEATABLE READ prevents non-repeatable reads. SERIALIZABLE prevents phantom reads but can cause blocking. SNAPSHOT uses row versioning to reduce blocking.


🟢 7. Joins & Query Logic

INNER JOIN, LEFT JOIN, RIGHT JOIN

Interview Answer:

  • INNER JOIN: Returns only matching rows from both tables.
  • LEFT JOIN: Returns all rows from the left table, matching rows from the right; right columns are NULL where no match.
  • RIGHT JOIN: Same as LEFT but from the right table's perspective.

Rich Explanation: Use LEFT JOIN when you need all rows from one side (e.g., all customers with their orders, or NULL if no orders). RIGHT JOIN is less common; you can usually rewrite with LEFT JOIN by swapping table order.


EXISTS vs IN

Interview Answer: Both check for the existence of values. EXISTS returns true if the subquery returns at least one row; it can short-circuit (stops at first match). IN checks if a value is in a list from a subquery. EXISTS is often better for correlated subqueries and large sets.

Rich Explanation: EXISTS typically performs better because it doesn't need to materialize the full result set—it stops at the first match. Use EXISTS for "has any" semantics; IN for "value in list" when the list is small or from a subquery.


Difference between EXISTS and IN

Interview Answer: IN compares a value to a set; NULL in the set can change results (NULL = anything yields UNKNOWN). EXISTS returns true/false and treats NULLs differently. EXISTS is preferred for performance with subqueries and often handles NULLs more predictably.

Rich Explanation: With IN, col IN (1, 2, NULL) can produce UNKNOWN due to three-valued logic. EXISTS doesn't have this issue. For NOT IN with possible NULLs, use NOT EXISTS to avoid unexpected results.


🔵 8. Performance & Query Optimization

What is query optimization?

Interview Answer: Query optimization is the process by which the database engine chooses the best execution plan—which indexes to use, join order, etc.—to execute a query efficiently.

Rich Explanation: The optimizer uses statistics, cardinality estimates, and cost-based reasoning. Poor statistics lead to bad plans. Keep statistics updated.


Why is a query slow?

Interview Answer: Common causes: missing indexes, index not used (non-SARGable), table scans, blocking/locks, parameter sniffing, outdated statistics, excessive I/O, network latency, or complex logic.

Rich Explanation: Use execution plans, wait statistics, and Query Store to diagnose. Look for "Index Scan" where "Index Seek" is possible, high estimated vs actual rows, and blocking.


What is execution plan?

Interview Answer: An execution plan shows how SQL Server will execute a query—operations (Scan, Seek, Join, Sort), order, estimated cost, and row counts. Use it to identify bottlenecks.

Rich Explanation: Enable "Include Actual Execution Plan" in SSMS or use SET STATISTICS XML ON. Look for high-cost operators, warnings (e.g., implicit conversions), and missing index suggestions.


What is SARGable query?

Interview Answer: SARGable = Search ARGument ABLE. A predicate is SARGable when it allows the optimizer to use an index. Examples of non-SARGable: WHERE YEAR(DateColumn) = 2024, WHERE Column + 1 = 5, WHERE UPPER(Column) = 'X'.

Rich Explanation: Write WHERE DateColumn >= '2024-01-01' AND DateColumn < '2025-01-01' instead of YEAR(DateColumn) = 2024. Functions on columns prevent index use.


Why functions in WHERE are bad?

Interview Answer: Applying functions to columns (e.g., UPPER(Email), CAST(Id AS VARCHAR)) forces a scan because the optimizer cannot use indexes—every row must be evaluated.

Rich Explanation: Store data in the format you query (e.g., pre-compute UPPER(Email) in a persisted column), or use computed columns with indexes.


Why SELECT * is bad?

Interview Answer: (1) Returns unnecessary columns—more data over the network. (2) Breaks if table schema changes (new columns). (3) Prevents covering indexes from being used. (4) Hurts maintainability.

Rich Explanation: Always specify needed columns. In views or ORM-generated SQL, SELECT * can cause unexpected performance and behavior changes.


When is index not used?

Interview Answer: When the predicate is non-SARGable, when statistics suggest a table scan is cheaper, when the index is fragmented or has outdated statistics, or when the query returns a large percentage of rows (optimizer may choose scan).

Rich Explanation: Small tables often get scanned regardless. Use index hints sparingly; usually fixing the query or statistics is better.


Why is LIKE '%abc%' slow?

Interview Answer: The leading wildcard % prevents the index from being used. The optimizer cannot seek—it must scan every row and check the pattern.

Rich Explanation: LIKE 'abc%' can use an index (prefix match). LIKE '%abc' or LIKE '%abc%' cannot. For full-text search, use Full-Text Search or dedicated search engines (e.g., Elasticsearch).


How to optimize a slow query?

Interview Answer: (1) Add or fix indexes. (2) Write SARGable predicates. (3) Avoid SELECT *, use covering indexes. (4) Update statistics. (5) Simplify logic, break into smaller queries. (6) Consider query hints only as last resort.

Rich Explanation: Start with the execution plan. Identify the highest-cost operator. Fix the root cause (index, predicate, or design) before tuning with hints.


🟣 9. Advanced / Real-world Issues

What is N+1 problem?

Interview Answer: N+1 occurs when you execute 1 query to get a list, then N additional queries (one per item) to load related data. Example: 1 query for customers, then 100 queries for each customer's orders = 101 queries.

Rich Explanation: Common with ORMs when lazy loading is enabled. Causes massive over-querying and poor performance. Fix with eager loading (Include/Join) or batch loading.


How to fix N+1?

Interview Answer: Use eager loading: load related data in the initial query with JOIN or Include. In EF Core: context.Customers.Include(c => c.Orders). Or use a single query with explicit JOIN and projection.

Rich Explanation: Fetch the main entity and related entities in one (or few) round-trips. Avoid lazy loading in loops.


Include vs Select?

Interview Answer: Include loads related entities into the context (full entities). Select projects to a DTO/shape—only the columns you need. Select can be more efficient (less data, no change tracking) and avoids over-fetching.

Rich Explanation: Use Include when you need full entities and change tracking. Use Select with projections for read-only scenarios—often better performance and memory.


When to use AsEnumerable?

Interview Answer: Use AsEnumerable() when you need to switch from database execution (IQueryable) to in-memory execution (IEnumerable). After AsEnumerable, further operations (e.g., custom C# logic) run in memory, not in SQL.

Rich Explanation: Be careful—AsEnumerable can force loading the entire result set into memory before filtering. Use it only when you need client-side logic that can't be translated to SQL.


IQueryable vs IEnumerable?

Interview Answer: IQueryable builds an expression tree and executes on the server (database). IEnumerable executes in memory. IQueryable defers execution until enumeration; operations are translated to SQL.

Rich Explanation: Use IQueryable for database queries so filtering/sorting happens in SQL. Converting to IEnumerable (e.g., ToList) too early pulls data into memory and prevents further SQL optimization.


What is deferred execution?

Interview Answer: Deferred execution means the query is not run when it's defined—it runs when the result is enumerated (e.g., foreach, ToList, Count). This allows query composition and a single database round-trip.

Rich Explanation: var query = db.Users.Where(u => u.Active); — no DB call yet. var list = query.ToList(); — now it executes. Enables chaining and optimization.


🟡 10. Database Objects

What is a view?

Interview Answer: A view is a virtual table defined by a SELECT query. It doesn't store data; it runs the query when accessed. Used for simplifying queries, security (column/row-level), and abstraction.

Rich Explanation: Simple views can sometimes be updatable. Indexed views (materialized) store data and can improve performance for complex aggregations.


What is a stored procedure?

Interview Answer: A stored procedure is a precompiled set of SQL statements stored in the database. It can accept parameters, contain logic, and be called by name. Benefits: performance (execution plan cached), security, and reduced network traffic.

Rich Explanation: Good for complex business logic, batch operations, and when you want to encapsulate SQL on the server. Can return result sets, output parameters, or nothing.


What is a function?

Interview Answer: A function returns a value (scalar) or a table. Unlike a procedure, it can be used in SELECT. Types: scalar (single value), inline table-valued, multi-statement table-valued. Cannot modify data (no INSERT/UPDATE/DELETE in function body).

Rich Explanation: Use functions for reusable calculations and table-valued results. Be cautious with scalar functions in queries—they can prevent parallelism and cause poor performance (use inline TVFs when possible).