Saturday, August 20, 2011

zfs Snapshot Commands Example

As i have already discussed in my previous post zfs filesystem and MySQL about zfs overview and two most important command zpool and zfs. I am going to continue with usage of zfs snapshots. It includes create a pool, Create file system, Taking a snapshot, Renaming Snapshots, Listing all snapshots, restoring from snapshot and Moving the snapshot to other location.

snapshot is a read-only copy of a file system or volume. Snapshots can be created almost instantly, and initially consume no additional disk space within the pool. However, as data within the active dataset changes, the snapshot consumes disk space by continuing to reference the old data and so prevents the space from being freed. Snapshots of volumes cannot be accessed directly, but they can be cloned, backed up, rolled back to.

Creating a Pool:
# zpool create zpool1 c2t0d0

List pool:
# zpool list

Create file system under above create pool:
Once you have a storage pool, you can build file systems on it:

# zfs create zpool1/data # zfs create zpool1/logs
Here we have built “/data” file system on pool zpool1

List all zfs file systems:
# zfs list

Taking a Snapshot:
zfs snapshot < pool name>/<filesystem name>@<snapshot name>
# zfs snapshot zpool1/data01@Snapshot1

Remove/Destroy a Snapshot:
zfs destroy < pool name>/<filesystem name>@<snapshot name>
# zfs destroy zpool1/data01@Snapshot1

Rename Snapshots:
You can rename snapshots but they must be renamed within the pool and dataset from which they were created.
zfs rename < pool name>/<filesystem name>@<snapshot name> < pool name>/<filesystem name>@<snapshot name>
# zfs rename zpool1/data01@Snapshot1 zpool1/data01@Snapshot2

Below snapshot rename operation is not supported because the target pool and file system name are different from the pool and file system where the snapshot was created.

# zfs rename zpool1/data01@Snapshot1 zpool3/data01@Snapshot2

Displaying zfs Snapshots:
zfs list
zfs list -t snapshot

You can also list snapshots that were created for a particular file system:
zfs list -r -t snapshot -o <name>,<creation> <pool>/<home>

Restore/Rolling Back zfs snapshots:
zfs rollback < pool name>/<filesystem name>@<snapshot name>
# zfs rollback zpool1/data01@Snapshot1

This will restore the entire file system with snapshot.

Restoring individual files:
It is possible to copy individual file from a snapshot by changing into the hidden “.zfs” directory of the pool that has been snapped.

cd /<pool name>/<file system name>
cd .zfs
cp <required file source location> <destination>

cd /zpool1/data01
cd .zfs
cp <required file source location> <destination>

Moving a  Snapshot to another system:
Wecan move the snapshot to another system and install it there as a usable file system. But at first we need to create a pool to receive the snapshot on the target system.

Step1: Create Pool on another system.
# zpool create -f zpool11 c2t0d0

Step2: Send the snapshot over the network and receive it into the pool using a combination of zfs send/receive command and a netwolrk pipe.
# zfs send zpool1/data01@snapshot1 | ssh <destination host> “usr/sbin/zfs receive zpool11/<myfilesystem>

Here zpool11 is the name of pool on another system which we have created above and myfilesystem is the name of filesystem you wish to put.

Wednesday, August 17, 2011

zfs FileSystem and MySQL

ZFS is a new kind of 128-bit file system that provides simple administration, transactional semantics, end-to-end data integrity, and immense scalability. ZFS is not an incremental improvement to existing technology; it is a fundamentally new approach to data management. ZFS was first introduced in Solaris in 2004 and it is a default filesystem in OpenSolaris, but Linux ports are underway, Apple is shipping it in OS X 10.5 Leopard with limited zfs capability ( Apple shutdown this project afterward due to some known reason), and it will be included in FreeBSD 7.

ZFS Features:
  • Pooled Storage Model
  • Always consistent on disk
  • Protection from data corruption
  • Live data scrubbing
  • Instantaneous snapshots and clones
  • Portable snapshot streams
  • Highly scalable
  • Built in compression
  • Simplified administration model

Pooled Storage Model: ZFS presents a pooled storage model that completely eliminates the concept of volumes and the associated problems of partitions, provisioning, wasted bandwidth and stranded storage. Thousands of file systems can draw from a common storage pool, each one consuming only as much space as it actually needs. The combined I/O bandwidth of all devices in the pool is available to all file systems at all times.

Always consistent on disk: All operations are copy-on-write transactions, so the on-disk state is always valid. Every block is checksummed to prevent silent data corruption, and the data is self-healing in replicated (mirrored or RAID) configurations. If one copy is damaged, ZFS detects it and uses another copy to repair it.

Protection from data corruption: ZFS introduces a new data replication model called RAID-Z. It is similar to RAID-5 but uses variable stripe width to eliminate the RAID-5 write hole (stripe corruption due to loss of power between data and parity updates). All RAID-Z writes are full-stripe writes. There's no read-modify-write tax, no write hole, and — the best part — no need for NVRAM in hardware. ZFS loves cheap disks.

Live data scrubbing: But cheap disks can fail, so ZFS provides disk scrubbing. Similar to ECC memory scrubbing, all data is read to detect latent errors while they're still correctable. A scrub traverses the entire storage pool to read every data block, validates it against its 256-bit checksum, and repairs it if necessary. All this happens while the storage pool is live and in use.
ZFS has a pipelined I/O engine, similar in concept to CPU pipelines. The pipeline operates on I/O dependency graphs and provides scoreboarding, priority, deadline scheduling, out-of-order issue and I/O aggregation. I/O loads that bring other file systems to their knees are handled with ease by the ZFS I/O pipeline.

Instantaneous snapshots and clones (Most important and useful for huge backups in seconds): ZFS provides 2 64 constant-time snapshots and clones. A snapshot is a read-only point-in-time copy of a file system, while a clone is a writable copy of a snapshot. Clones provide an extremely space-efficient way to store many copies of mostly-shared data such as workspaces, software installations, and diskless clients.

Portable snapshot streams (Important & useful feature): You snapshot a ZFS file system, but you can also create incremental snapshots. Incremental snapshots are so efficient that they can be used for remote replication, such as transmitting an incremental update every 10 seconds.

Highly scalable (Important  useful feature): There are no arbitrary limits in ZFS. You can have as many files as you want: full 64-bit file offsets, unlimited links, directory entries, and so on.

Built in compression: ZFS provides built-in compression. In addition to reducing space usage by 2-3x, compression also reduces the amount of I/O by 2-3x. For this reason, enabling compression actually makes some workloads go faster.
In addition to file systems, ZFS storage pools can provide volumes for applications that need raw-device semantics. ZFS volumes can be used as swap devices, for example. And if you enable compression on a swap volume, you now have compressed virtual memory.

Simplified administration model: ZFS administration is both simple and powerful. zpool and zfs are the only two command you need to know. Please see the zpool(1M) and zfs(1M) man pages for more information.
The storage pool is a key abstraction: a pool can consist of many physical devices, and can hold many filesystems. Whenever you add storage to the pool, it becomes available to any filesystem that may need it. To take a newly-attached disk and use the whole disk for ZFS storage, you would use the command.

# zpool create zpool1 c2t0d0

Here, zpool1 represents the name of a pool, and c2t0d0 is a disk device.

If you have a disk had already been formatted – say, with a UFS filesystem on one partition – you can create a storage pool from another free partition:
# zpool create zpool1 c2t0d0s2 

You can even use a plain file for storage:
# zpool create zpool1 ~/storage/myzfile

Once you have a storage pool, you can build filesystems on it:
# zfs create zpool1/data # zfs create zpool1/logs 

Later on, if you run out of space, just add another device to the pool, and the filesystem will grow.
# zpool add zp1 c3t0d0

ZFS and Tablespaces:

innodb_data_file_path = /dbzpool/data/ibdatafile:20G:autoextend

Here is the only innodb_data_file_path that any ZFS system might ever need. You can split this over as many drives as you want, and ZFS will balance the load intelligently. You can stripe it, mirror it, add space when you need room to grow, bring spare disks online, and take faulted disks offline, without ever restarting the database.

MySQL Cluster Webinar: Best practices in scaling Web databases with Auto-Partitioning and SQL/noSQL Access

Register yourself here
Thursday, July 28, 2011

MySQL 5.6.3 Performance improvements

 Mark Callaghan at facebook tested the test release of MySQL 5.6.3 and he has found some performance improvement with InoDB feature. Read below...

Mark tried two of the previews for MySQL 5.6.3 at His first attempt with the multi-threaded slave preview was not successful. Parallel apply on the slave is serial when the master does not also run 5.6.3. He said (I hope this changes as a typical upgrade is first done on the slave.)

He was more successful with the InnoDB features preview. A few more mutex contention bottlenecks were removed in it and he wanted to compare the peak row update rate between it and MySQL 5.1.52. he configured InnoDB to use a buffer pool large enough to cache all data and ran a custom version of sysbench with 8 tables. The peak rate on the preview is about twice the peak rate on the unmodified InnoDB plugin in 5.1.52 using an 8-core server.

This is good news. The results below list the number of rows updated per second using 8 to 256 concurrent clients updating 1 row by primary key per UPDATE statement.

Configuration used:

The database had 8 tables with 2M rows each.
The binlog was disabled during the test.

This is a configuration meant for benchmarks but it also allows maximum stress to be put on InnoDB. He only ran the test once for each level of concurrency and won't try to explain the results at 32 connections.

mysql 5.1.52
mysql 5.6.3

Wednesday, July 27, 2011

Reduced contention during datafile extension

Another performance problem found by PoorMansProfiler

Innam rana said in his blog post on innodb blog:

InnoDB has an internal file system management module that primarily manages the space in the data files. One of the pain points was the coarse level of locking used when a data file has to be extended. More about this issue can be found here. In the latest labs release we have fixed this problem.
When we need to extend a data file inside InnoDB we write zero filled pages synchronously to the file. The user thread which is extending the data file holds fil_system::mutex during the whole operation. This mutex covers changes to all data structures related to file system management. What this means is that when we do a regular IO we do need to acquire fil_system::mutex though only for a short time. Because the thread doing the data file extension is holding the mutex during the whole IO operation any other thread (user or background) trying to access data file for regular read or write ends up waiting. This brings the whole system to a virtual stand still as no read or write activity can happen. This is true even if a thread is trying to access a data file that is not the one being extended.
We fixed this issue by introducing an internal flag to the data structure indicating that a file is being extended. Now if a user thread needs to extend a data file it does acquire the fil_system::mutex but releases it after setting the flag. Once it is done with the extension IO it resets the flag. This allows other threads to access data files while one of the file is being extended. This also allows multiple files to be extended in parallel. Our tests have shown that the issue of stalls due to file extension is indeed fixed by this approach.
A related feature which can be considered as future work is to off load the file extension to a background thread.

Friday, July 22, 2011

When does InnoDB compress and decompress pages?

There are two sections for rows in the page format for InnoDB compressed tables. The compressed section has one or more rows and must be decompressed to access individual rows. The modification log has uncompressed rows and rows can be accessed without decompressing. The modification log is used to avoid decompressing and then possibly recompressing the compressed section on every row change. The buffer pool also has separate uncompressed copies of some pages so that every row read does not require a page decompression.

I want to understand when a page must be decompressed or recompressed. This is definitely an incomplete list.
  • A page is decompressed when a row is read and the uncompressed version of the page is not in the buffer pool.
  • I think a row can be deleted from the compressed section without decompressing it in many cases as I think marking it deleted uses fields not in the compressed section. 
  • Inserts are done to the modification log assuming it has room. When it is full the modification log and data from the compressed section are merged and the result is recompressed. When the result is too large to fit in a compressed page then the page is split and both post-split pages are recompressed.
  • I don't understand the code for UPDATE statements and need to read more source code. The docs state that updates can be done to the modification log but I don't know what that implies.
A compression failure occurs when a page is recompressed and the result is too big. Innodb fixes this by splitting the page. This only works for index-organized tables and InnoDB is index-organized. You can monitor the rate of compression failures using the information schema table INNODB_CMP. This reports the global rate of compression failures. When you have a server with many tables you need to know which tables have the high failure rate and that information is only available in a yet-to-be-published Facebook patch.

But even the changes in the Facebook patch are not sufficient. In some cases it is important to understand which indexes in a table cause the compression failures. The alternative is to guess. All indexes on a table do not compress equally well yet the same compression factor for a table is used for all indexes on it via the KEY_BLOCK_SIZE option to CREATE TABLE.

By Mark Callaghan
Tuesday, July 19, 2011

MySQL Cluster Architecture


MySQL Cluster is a write-scalable, real-time, ACID-compliant transactional database, combining 99.999% availability with the low TCO of open source. Designed around a distributed, multi-master architecture with no single point of failure, MySQL Cluster scales horizontally on commodity hardware to serve read and write intensive workloads, accessed via SQL and NoSQL interfaces.

MySQL Cluster's real-time design delivers predictable, millisecond response times with the ability to service millions of operations per second. Support for in-memory and disk-based data, automatic data partitioning (sharding) with load balancing and the ability to add nodes to a running cluster with zero downtime allows linear database scalability to handle the most unpredictable web-based workloads.

 MySQL Cluster comprises three types of node which collectively provide service to the application:
  • Data nodes manage the storage and access to data.  Tables are automatically sharded across the data nodes which also transparently handle load balancing, replication, failover and self-healing.

  • Application nodes provide connectivity from the application logic to the data nodes. Multiple APIs are presented to the application.  MySQL provides a standard SQL interface, including connectivity to all of the leading web development languages and frameworks. There are also a whole range of NoSQL inerfaces including memcached, REST/JSON, C++ (NDB-API), Java, JPA and LDAP.

  • Management nodes are used to configure the cluster and provide arbitration in the event of a network partition.

Thursday, July 14, 2011

getopts in shell script

The getopts command simplifies the task of validating and parsing command line options and arguments for your shell scripts.


getopts <optstring name> [arg...]


Step1: First I define all my option holding variables.


Step2: While loop.

The following while statement loops through all the options and sets them to the corresponding variable. getopts returns true while there are options to be processed. The argument string, here "lme:h", specifies which options the script accepts. If the user specifies an option which is not in this string, it will go into * section which will display a help to use this script with examples. If the option is succeeded by a colon, the value immediately following the option is placed in the variable $OPTARG.

while getopts "lme:h" option; do
case "$option" in
l) ListFiles=1;;
m) MoveFiles=1;;
e) email="$OPTARG";;
h|*) helpFunction;;

Script Call:
----------------------------------------------------------------------------------------------------------------------------------------------- -l

#It will go into case l) and set ListFiles=1. -m
#It will go into case m) and set MoveFiles=1. -m -e ""
#It will go into case m) and e) and set MoveFiles=1 as well as get the email address in $Ovariable and set it to "email" variable. -h
#It will go into case h|*) and call the function helpFunction to show help. -<anything apart from optstringname we have provided>
#It will also go into case h|*) and will be treated as "*" and show the help.

Comments/suggestions are welcome.. Happy scripting ... :)
Friday, July 1, 2011


We all know du command to get the size of a directory. But the problem is when you use "du <directory name>" it will give you the list of all subdirectory including the directory you want with size.

Bt what if i only want the size of directory which i have passed as an argument and not all the subdirectory?

In that senario we can use:

du -sh <directory name>                              

Example 1:

du -h /home/mysql/admin/                             
   1K   /home/mysql/admin/scripts/neel               
   8K   /home/mysql/admin/scripts                    
   1K   /home/mysql/admin/bin-logs/test_instance_4   
   1K   /home/mysql/admin/bin-logs/test_instance_3   
   1K   /home/mysql/admin/bin-logs/orphan            
   1K   /home/mysql/admin/bin-logs/test_instance_1   
   1K   /home/mysql/admin/bin-logs/test_instance_2   
   9K   /home/mysql/admin/bin-logs                   
  20K   /home/mysql/admin                            

In the above example i have have passed "/home/mysql/admin/" as an argument of du and it results all subdirectory with size. (Please note -h switch converts the size into human redable and understandable format i.e KB).

Example 2:

 du -sh /home/mysql/admin/
  20K   /home/mysql/admin

In this example i have used switch "s" ( to show the size of current directory and not the subdirectory) along with "h" (human redable format) and it gave me the size of "/home/mysql/admin"
directory only rather than all subdirectories also.

I hope this will help someone. :)
Tuesday, June 28, 2011

Turn on or off color syntax highlighting in vi or vim editor

Vim or vi is a text editor. It can be used to edit all kinds of plain text. It is especially useful for editing programs or UNIX/Linux configuration files.

Turn on syntax highlighting:

Open file (for example
$ vi
Now press ESC key to enter into command mode then type  ” : syntax on OR syn on”
:syntax on
:syn on
That’s it.. the color syntax highlighting will be enabled until you close that file (useful when you working on server where you can’t enable it permanently because of restriction and not having desired access of .vimrc file.)

Turn off syntax highlighting:

Press ESC key to enter into command mode then type “: syntax off OR syn off”
:syntax off
: syn off

Enable color syntax highlighting permanently:

You may need to add "syntax on" (or "syn on") in your $HOME/.vimrc file
$ vi HOME/.vimrc
Add "syntax on" (or "syn on")
Press ESC to enter into command mode then
Save and close the file.
:wq <enter>

Please note: .vimrc is a system file so it will be hidden and you can see it by using command “ls”

Any comment would be appreciated thanks J

Friday, June 24, 2011

mysql optimizer Index strategy

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).
MySQL cannot use an index if the columns do not form a leftmost prefix of the index. Suppose that you have the SELECT statements shown here:
SELECT * FROM tbl_name WHERE col1=val1;
SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;

SELECT * FROM tbl_name WHERE col2=val2;
SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3;
If an index exists on (col1, col2, col3), only the first two queries use the index. The third and fourth queries do involve indexed columns, but (col2) and (col2, col3) are not leftmost prefixes of (col1, col2, col3).

For detail please visit
Wednesday, February 16, 2011

Innodb plugin 1.0.4 fine tune for performance

#The MySQL server always must be started with the option ignore_builtin_innodb, as long as you want to use the InnoDB Plugin as a shared library.
#load the InnoDB Plugin and all Information Schema tables implemented in the InnoDB Plugin when the server starts:
plugin-load =;;;;;innodb_cmp_reset=;;
#To use compression, enable the "file per table" and "file compression"
#innodb_buffer_pool_size= 512M (Better performance if increased. suggested value is 50-80% of memory).
#The limit on the number of concurrent threads.
#number of background threads used for read. The purpose of this change is to make InnoDB more scalable on high end systems. Each background thread can handle up to 256 pending I/O requests.
#InnoDB uses to store data dictionary information and other internal data structures.
#The more tables you have in your application, the more memory you need to allocate here.

Cheers!!!!!!!!!High Performance MySQL
Tuesday, February 15, 2011

Linux Administration

Check out this SlideShare Presentation:

Linux Introduction (Commands)

Check out this SlideShare Presentation:
Friday, February 4, 2011

Inno Db Internals Inno Db File Formats And Source Code Structure

Mastering InnoDB Diagnostics

Check out this SlideShare Presentation:
Thursday, February 3, 2011

Partitions Performance with MySQL 5.1 and 5.5

Check out this SlideShare Presentation:

Linux performance tuning & stabilization tips (mysqlconf2010)

10x Performance Improvements

Check out this SlideShare Presentation:

MySQL Monitoring 101

Check out this SlideShare Presentation:
© Copyright 2010-2012 Learn MySQL All Rights Reserved.
Template powered by