10 Must have SQL questions

What are the different Analytic functions available in SQL Server?

  • FIRST_VALUE(): Returns the first value in an ordered set of values. If Partition By clause is specified
    then it returns First Value in each partition after ordering the partition by Order By Clause.
  • LAST_VALUE(): Returns the Last value in an ordered set of values. If Partition By clause is specified
    then it returns Last Value in each partition after ordering the partition by Order By Clause.
  • LAG(): Provides access to a row at a given physical offset that comes before the current row. Use this function in a SELECT statement to compare values in the current row with values in a previous row as
    specified by offset. Default offset is 1 if not specified. If Partition By clause is specified then it returns the offset Value in each partition after ordering the partition by Order By Clause.
  • LEAD(): Provides access to a row at a given physical offset that comes after the current row. Use this function in a SELECT statement to compare values in the current row with values in a subsequent row
    as specified by offset. Default offset is 1 if not specified. If Partition By clause is specified then it returns the offset Value in each partition after ordering the partition by Order By Clause.
  • PERCENT_RANK(): Used to evaluate the relative standing of a value within a query result set or partition. The range of values returned by PERCENT_RANK is greater than 0 and less than or equal to 1.

2. What are the different Ranking Window functions in SQL Server?

SYNTAX:

 ROW_NUMBER|RANK|DENSE_RANK|NTILE () OVER ([PARTITION BY ]
 ORDER BY )

The PARTITION BY clause is optional. If not used, data will be ranked based on a single partition.

  • ROW_NUMBER(): Always generate unique values without any gaps, even if there are ties.
  • RANK(): Ranks each row in the result set. Can have gaps in its sequence and when values are the same,they get the same rank.
  • DENSE_RANK(): It also returns the same rank for ties, but doesn’t have any gaps in the sequence.
  • NTILE(): Divides the rows in roughly equal sized buckets. Suppose you have 20 rows and you specify
    NTILE(2). This will give you 2 buckets with 10 rows each. When using NTILE(3), you get 2 buckets with 7
    rows and 1 bucket with 6 rows.

3. Which operator is used for Pattern Matching?

LIKE operator is used for pattern matching. It supports below wildcards.
 % : Matches any string of zero or more characters.
 _ : Matches any single character.
 [] : Matches any single character within the specified range ([a-f]) or set ([abcdef]).
 [^] : Matches any single character not within the specified range ([^a-f]) or set ([^abcdef])

4. What is Partitioning and its benefits?

SQL Server supports table and index partitioning. Partitioning is a way to divide a large table into smaller, more manageable parts without having to create separate tables for each part. Data in a partitioned table is physically stored in groups of rows called partitions and each partition can be accessed and maintained separately. Partitioning is not visible to end-users, a partitioned table behaves like one logical table when queried.
Benefits:

  • You can transfer or access subsets of data quickly and efficiently, while maintaining the integrity of a data collection
  • You can perform maintenance operations on one or more partitions more quickly. The operations are more efficient because they target only these data subsets, instead of the whole table.
    It is mostly intended to aid in maintenance on larger tables and to offer fast ways to load and remove large amounts of data from a table. Partitioning can enhance query performance, but there is no
    guarantee.

5. What is Index in SQL?

Indexes speed up the querying process by providing swift access to rows in the data tables, similarly tothe way a book’s index helps you find information quickly within that book. When a SQL Server has no index to
use for searching, the result is similar to the reader who looks at every page in a book to find a word; the SQL engine needs to visit every row in a table. In database terminology, we call this behavior a table scan, or just
scan. Indexes can be created on basically all columns except Large object data types such as Image, Text and Varchar (Max).
However, varchar(max), varbinary(max), and xml data types can participate in a non-clustered index as non-key index columns.

6. What are the different types of Joins in SQL?

  • INNER JOIN: Return rows only when there is a match in both tables.
  • LEFT OUTER JOIN: Returns all rows from Left Table even there isn’t any match in the Right table.
    Place NULL in the Right table when there isn’t any match with Left Table.
  • RIGHT OUTER JOIN: Opposite to LEFT OUTER JOIN. Returns all rows from the Right Table even if there isn’t any match in the Left table. Place NULL in Left table when there isn’t any match with
    Right Table.
  • FULL OUTER JOIN: Its combination of LEFT and RIGHT JOIN. It returns all rows from both tables
    and place NULL when there isn’t any match in both tables.
  • CROSS JOIN: Return the Cartesian Product of both Tables. i.e M*N rows.

7. What is Normalization? How many Normalization forms are there?

Normalization is used to organize the data in such a manner that data redundancy will never occur in the database and avoid insert, update and delete anomalies.
There are 5 forms of Normalization
First Normal Form (1NF): It removes all duplicate columns from the table. Creates a table for related data and identifies unique column values
Second Normal Form (2NF): Follows 1NF and creates and places data subsets in an individual table and defines the relationship between tables using the primary key
Third Normal Form (3NF): Follows 2NF and removes those columns which are not related through the primary key
Fourth Normal Form (4NF): Follows 3NF and does not define multi-valued dependencies. 4NF also known as BCNF

8. What are the different types of Indexes?
Ans.

  • Clustered Index: Clustered indexes sort and store the data rows in the table or view based on their key values. Key values are the column (or columns) included in the index definition. There can be only one
    clustered index per table, because the data rows themselves can be stored in only one order.
  • Non-Clustered Index: Non-clustered indexes cannot be sorted like clustered indexes. However, you can create more than one non-clustered index per table or view. The index in the back of a book is an
    example of a non-clustered index. A non-clustered index has the indexed columns and a pointer or bookmark pointing to the actual row. In the case of our example it contains a page number. Another
    example could be a search done on Google or another of the search engines. The results on the page contain links to the original web pages.
    In addition to an index being clustered or non-clustered, it can be configured in other ways:
    a. Composite index: An index that contains more than one column. You can include up to 16 columns in an index, as long as the index doesn’t exceed the 900-byte limit. Both clustered and non-clustered
    indexes can be composite indexes.
    b. Unique Index: An index that ensures the uniqueness of each value in the indexed column. If the index is a composite, the uniqueness is enforced across the columns as a whole, not on the individual columns. For example, if you were to create an index on the FirstName and LastName columns in a
    table, the names together must be unique, but the individual names can be duplicated.
    c. Covering Index: When all of the required columns in the SELECT list are part of the index, it is called a covering index. It is created using INCLUDE statement and can be created only with Non-Clustered
    Index. It can include Non-key columns in the Index to significantly improves the query performance because the query optimizer can locate all the column values within the index, table or clustered index data is not accessed resulting in fewer disk I/O operations.

9. What are the disadvantages of using an Index?

  • Disk Space: Indexes are stored on the disk, and the amount of space required will depend on the size of the table, and the number and types of columns used in the index. Disk space is generally cheap enough to trade for application performance, particularly when a database serves a large number of users.
  • Data Modification: Any time a query modifies the data in a table (INSERT, UPDATE, or DELETE), the database needs to update all of the indexes where data has changed. Hence, providing too many
    indexes to update can actually hurt the performance of data modifications.

10. What are the factors to be considered while creating an Index?

Well-designed indexes can reduce disk I/O operations and consume fewer system resources therefore improving query performance. Consider the following guidelines when you design an Index.

  • Understand the characteristics of the columns used in the queries. For example, an index is ideal for
    columns that have an integer data type and are also unique or nonnull columns.
  • Create non-clustered indexes on the columns that are frequently used in predicates(Where, Having) and join conditions in queries. However, you should avoid adding unnecessary columns. Adding too
    many index columns can adversely affect disk space and index maintenance performance.
  • Avoid over-indexing heavily updated tables and keep indexes narrow, that is, with as few columns as possible.
  • Use many indexes to improve query performance on tables with low update requirements, but large volumes of data. Large numbers of indexes can help the performance of queries that do not modify
    data, such as SELECT statements.
  • Indexes on views can provide significant performance gains when the view contains aggregations, table joins, or a combination of aggregations and joins
  • Determine which index options might enhance performance when the index is created or maintained.
    For example, creating a clustered index on an existing large table would benefit from the ONLINE index option. The ONLINE option allows for concurrent activity on the underlying data to continue
    while the index is being created or rebuilt.
  • Determine the optimal storage location for the index. A non-clustered index can be stored in the same filegroup as the underlying table, or on a different filegroup. The storage location of indexes can improve query performance by increasing disk I/O performance. For example, storing a non-clustered
    index on a filegroup that is on a different disk than the table filegroup can improve performance because multiple disks can be read at the same time. Alternatively, clustered and non-clustered
    indexes can use a partition scheme across multiple filegroups. Partitioning makes large tables or
    indexes more manageable by letting you access or manage subsets of data quickly and efficiently, while maintaining the integrity of the overall collection

Snippet from an article by Ombir Rathee

Keep Learning 🙂

The Data Monk



Author: TheDataMonk

I am the Co-Founder of The Data Monk. I have a total of 6+ years of analytics experience 3+ years at Mu Sigma 2 years at OYO 1 year and counting at The Data Monk I am an active trader and a logically sarcastic idiot :)