ZFS is a new kind of 128-bit file system that provides simple administration, transactional semantics, end-to-end data integrity, and immense scalability. ZFS is not an incremental improvement to existing technology; it is a fundamentally new approach to data management. ZFS was first introduced in Solaris in 2004 and it is a default filesystem in OpenSolaris, but Linux ports are underway, Apple is shipping it in OS X 10.5 Leopard with limited zfs capability ( Apple shutdown this project afterward due to some known reason), and it will be included in FreeBSD 7.
- Pooled Storage Model
- Always consistent on disk
- Protection from data corruption
- Live data scrubbing
- Instantaneous snapshots and clones
- Portable snapshot streams
- Highly scalable
- Built in compression
- Simplified administration model
Pooled Storage Model: ZFS presents a pooled storage model that completely eliminates the concept of volumes and the associated problems of partitions, provisioning, wasted bandwidth and stranded storage. Thousands of file systems can draw from a common storage pool, each one consuming only as much space as it actually needs. The combined I/O bandwidth of all devices in the pool is available to all file systems at all times.
Always consistent on disk: All operations are copy-on-write transactions, so the on-disk state is always valid. Every block is checksummed to prevent silent data corruption, and the data is self-healing in replicated (mirrored or RAID) configurations. If one copy is damaged, ZFS detects it and uses another copy to repair it.
Protection from data corruption: ZFS introduces a new data replication model called RAID-Z. It is similar to RAID-5 but uses variable stripe width to eliminate the RAID-5 write hole (stripe corruption due to loss of power between data and parity updates). All RAID-Z writes are full-stripe writes. There's no read-modify-write tax, no write hole, and — the best part — no need for NVRAM in hardware. ZFS loves cheap disks.
Live data scrubbing: But cheap disks can fail, so ZFS provides disk scrubbing. Similar to ECC memory scrubbing, all data is read to detect latent errors while they're still correctable. A scrub traverses the entire storage pool to read every data block, validates it against its 256-bit checksum, and repairs it if necessary. All this happens while the storage pool is live and in use.
ZFS has a pipelined I/O engine, similar in concept to CPU pipelines. The pipeline operates on I/O dependency graphs and provides scoreboarding, priority, deadline scheduling, out-of-order issue and I/O aggregation. I/O loads that bring other file systems to their knees are handled with ease by the ZFS I/O pipeline.
Instantaneous snapshots and clones (Most important and useful for huge backups in seconds): ZFS provides 2 64 constant-time snapshots and clones. A snapshot is a read-only point-in-time copy of a file system, while a clone is a writable copy of a snapshot. Clones provide an extremely space-efficient way to store many copies of mostly-shared data such as workspaces, software installations, and diskless clients.
Portable snapshot streams (Important & useful feature): You snapshot a ZFS file system, but you can also create incremental snapshots. Incremental snapshots are so efficient that they can be used for remote replication, such as transmitting an incremental update every 10 seconds.
Highly scalable (Important useful feature): There are no arbitrary limits in ZFS. You can have as many files as you want: full 64-bit file offsets, unlimited links, directory entries, and so on.
Built in compression: ZFS provides built-in compression. In addition to reducing space usage by 2-3x, compression also reduces the amount of I/O by 2-3x. For this reason, enabling compression actually makes some workloads go faster.
In addition to file systems, ZFS storage pools can provide volumes for applications that need raw-device semantics. ZFS volumes can be used as swap devices, for example. And if you enable compression on a swap volume, you now have compressed virtual memory.
Simplified administration model: ZFS administration is both simple and powerful. zpool and zfs are the only two command you need to know. Please see the and man pages for more information.
The storage pool is a key abstraction: a pool can consist of many physical devices, and can hold many filesystems. Whenever you add storage to the pool, it becomes available to any filesystem that may need it. To take a newly-attached disk and use the whole disk for ZFS storage, you would use the command.
# zpool create zpool1 c2t0d0
Here, zpool1 represents the name of a pool, and c2t0d0 is a disk device.
If you have a disk had already been formatted – say, with a UFS filesystem on one partition – you can create a storage pool from another free partition:
# zpool create zpool1 c2t0d0s2
You can even use a plain file for storage:
# zpool create zpool1 ~/storage/myzfile
Once you have a storage pool, you can build filesystems on it:
# zfs create zpool1/data # zfs create zpool1/logs
Later on, if you run out of space, just add another device to the pool, and the filesystem will grow.
# zpool add zp1 c3t0d0
ZFS and Tablespaces:
innodb_data_file_path = /dbzpool/data/ibdatafile:20G:autoextend
Here is the only innodb_data_file_path that any ZFS system might ever need. You can split this over as many drives as you want, and ZFS will balance the load intelligently. You can stripe it, mirror it, add space when you need room to grow, bring spare disks online, and take faulted disks offline, without ever restarting the database.