Disk Block Size

New Big Hard Drives

The first computer hard drive on the PC XT in 1983 held 10 megabytes of data formatted in 512 byte sectors. Today a hard disk may have 100,000 times as much storage, but because computer systems don't change anything until they have to, disks still store data in 512 byte units. This means that under the covers, anything you read or write to disk is rounded up to occupy one or more 512 byte areas preformatted on the disk surface.

A program can ask the operating system to write smaller amounts of data. However, since the hardware can only accept a command to write an entire sector, the file system and device driver have to convert the program's small request into a hardware request by holding data in a buffer managed by the OS. When a program adds data onto the end of a file, the OS may have to read in the contents of the last sector, hold it in a buffer, append the program data, and eventually rewrite the buffer contents to disk.

This tiny size is inefficient, and the operating systems long ago abandoned it for (in almost all cases) a 4096 unit of storage called a "page" or "allocation unit". Physically, each page is simply a collection of 8 sectors on the disk.

The largest number that can be stored in 32 bits is 4 gig. If this number is a sector number, that means that the most sectors you can address on a hard drive with a 32 bit number is 2 terabytes. Disks are now available that hold 2 terabytes of data, and bigger disks are in the pipeline. So for these large disks, a new sector size is a requirement.

An increasingly large number of modern disks are coming with a physical sector size of 4096 bytes. They can pretend to have 512 byte sectors for compatibility, but this is inefficient.

A program writes 512 bytes of data to a disk file.

  1. The disk arms move to the location of the data. This is a seek and it takes around 8 milliseconds.
  2. Then the controller has to wait till the data rotates under the arms. On average, this is a half rotation. This on average is another 4 milliseconds.
  • Now, if the disk has native 512 byte sectors, the data is written and the operation is over. Total time: 12 milliseconds.
  • However, if the disk has 4096 byte sectors and is only emulating the 512 sector size, the controller has to read 4096 bytes from the disk, replace 512 bytes inside the larger block, and now wait one entire rotation or about 8.3 additional milliseconds for the data to come all the way around before the entire block can be rewritten to the disk. Total time: 20.3 milliseconds.

Of course, if the program always writes 4096 pages, or the system forces programs to write entire pages (as Unix, Linux, and Mac do) then this problem never occurs. It is also not a problem if the device drivers have been updated to buffer data and force the larger block size as happens in Windows 7. It is just the older systems like XP that can get in trouble.

Solid State Drives

The Solid State Drives present a much larger version of the same problem. SSDs store data in Flash memory, a faster version of the same technology on your keyring USB memory stick. Flash memory simulates disk sectors, but in reality the basic unit of storage management on an SSD is a thousand times larger: 512K bytes (a half megabyte) instead of 512 bytes.

An SSD writes data something like the toy where a child presses down on a plastic sheet to draw lines and then erases the entire page by lifting the plastic sheet up. The SSD can erase a half megabyte of data setting the entire area to zeros. Then it can write 1's into the individual bytes of memory. It can always turn an individual 0 into a 1, but it can only erase an entire block of 512K back to 0.

A conventional file system designed for a regular hard disk expects that every disk sector can be rewritten individually. The system only keeps track of the sectors that have been allocated to individual files and the sectors that are "free space" (not assigned to any file and available for allocation to create new files or extend existing ones).

In an SSD, the "free space" can be divided into sectors that have been erased and can be immediately assigned to hold new data, and sectors that were previously part of some file and have not yet been erased. They can be assigned for use, but they have to be erased before they can be reused.

In the current generation of operating systems and SSDs, the work of erasing blocks and keeping track of what is erased and what needs erasing is mostly managed by the SSD hardware. However, it would be more efficient if the SSD hardware and the device driver collaborate and if the applications, particularly I/O intense applications like database, optimize their patterns of use. Instead of rewriting small blocks of existing data in place, write new data to previously erased sections of the disk and adjust the control information to go to the new location instead of the old location. Once enough data has been replaced, migrate the rest and queue the 512K block for complete erasing.

If you have a laptop with one SSD hard disk, then you have to use it for everything. However, in a desktop machine where there is room for more than one disk, it makes sense to use the SSD as the boot disk for the operating system and for your most frequently used or small programs. Then one or more regular hard drives can be used for bulk storage and for the types of files that do not translate well to the SSD erase-and-write mechanism.

SSDs and hard drives are roughly comparable for large sequential files. The hard drive can read sequentially without seek delay or rotational latency. Given the igh price of SSD storage these days, it is probably better to put the pagefile on a hard drive because pages are stolen and written to the disk in 4K units. The hibernate file is written and read sequentially. If you can fit these files on the SSD they will probably run a bit faster, but they are big and you might find something better to do with those gigabytes.

However, every previous resource in the history of computers that at one time had to be carefully rationed is now insainely cheap, and eventually SSD storage will not be worth the trouble of optimizing. So while some care and consideration may be appropriate today, this is a temporary situation. Within a few years, everything will be optimized automatically.