MsSQL – char and varchar comparison, storage concept

In Microsoft SQL Server, both CHAR and VARCHAR are used to store character string data, but they have some differences in terms of storage and behavior.

  1. Fixed-Length vs. Variable-Length:
    • CHAR: It is a fixed-length data type. When you define a column as CHAR(10), for example, it will always occupy 10 bytes in storage, regardless of the actual length of the data stored in it. If you store a string “hello” in a CHAR(10) column, it will be stored as “hello_____” (underscores represent padding to fill the fixed length).
    • VARCHAR: It is a variable-length data type. If you define a column as VARCHAR(10), it will only occupy the storage space equal to the actual length of the data stored in it. So, storing “hello” in a VARCHAR(10) column will use only 5 bytes.
  2. Storage Efficiency:
    • CHAR: Fixed-length storage can be less space-efficient when storing variable-length data, as it always reserves the maximum specified length.
    • VARCHAR: Variable-length storage is more space-efficient for columns with varying lengths of data.
  3. Trailing Spaces:
    • CHAR: Always pads the data with spaces to fill the fixed length.
    • VARCHAR: Does not pad the data, storing only the actual characters.

Now, one important question would be, how these data stored on pages? Each page can stored only 8KB of data, so how changing the length of a varchar column is being managed by pages.

Increasing Length:

  • If you update a VARCHAR column with a longer value, and the new length exceeds the available space in the original data page, SQL Server might need to move the row to a new page.
  • The new page can accommodate the longer value, and the original space on the old page is marked as available for future use.

Decreasing Length:

  • If you update a VARCHAR column with a shorter value, the existing space allocated for the column is not immediately reclaimed. SQL Server keeps the original space in the page for potential future updates with longer values to avoid unnecessary fragmentation.
  • Over time, as more updates are made and the available space in the page is used, SQL Server might eventually reclaim the space if it determines that it can be efficiently utilized.

When a row is too large to fit within a single page, SQL Server employs a mechanism known as Row-Overflow Data Exceeding 8 KB to handle the overflow.

Here’s how it works:

  1. In-Row Data:
    • The in-row data consists of columns that can fit within the 8,060-byte limit. This typically includes fixed-length data types (e.g., INT, DATETIME) and variable-length data types (e.g., VARCHAR, NVARCHAR) up to a certain size.
  2. Overflow Pages:
    • If a row’s in-row data exceeds the 8,060-byte limit, the portion that doesn’t fit is moved to one or more overflow pages.
    • A pointer in the original data page points to the overflow page(s) where the remaining data is stored.
  3. Performance Considerations:
    • Accessing the in-row data is generally more efficient than accessing overflow data. Therefore, queries that retrieve or modify in-row data only are more performant.
    • Queries that involve accessing or modifying overflow data may incur additional I/O operations and could be slower.
  4. Considerations for Large Rows:
    • Designing tables with rows approaching the 8,060-byte limit should be done carefully, considering performance implications.
    • Consider whether normalization or other database design changes could help distribute data across multiple tables or rows.

But the same mechanism is not used for varchar max column. VARCHAR(MAX) is considered a Large Object (LOB) data type in SQL Server, and it has a different way of handling large amounts of data.

For VARCHAR(MAX) and other LOB data types, the data is stored separately in LOB pages, and only a pointer to the actual LOB data is stored in the data page. LOB data is managed in a set of pages outside the normal data pages. This means that VARCHAR(MAX) columns do not participate in the same in-row overflow mechanism as regular columns.

Here are some key points regarding VARCHAR(MAX):

  1. Storage Mechanism:
    • The actual data for VARCHAR(MAX) is stored in separate LOB pages.
    • Only a 16-byte pointer to the LOB data is stored in the data page.
  2. Impacts on Performance:
    • Retrieving or modifying large VARCHAR(MAX) data can involve additional I/O operations because the LOB pages may need to be accessed.
    • Queries that do not reference the VARCHAR(MAX) column or modify other in-row columns might perform better.

Leave a Reply

Your email address will not be published. Required fields are marked *