Introduction
VMware offered a great product for creating virtual systems that can be used for a variety of purposes (application packaging, system testing, software testing, etc.). At the time this document was written, VMware Workstation was superior to Microsoft's Virtual PC because Virtual PC did not allow for the creation of virtual SCSI disks, which prevents the user from creating a virtual cluster. Virtual Server (now Hyper-V) does does allow for the creation of SCSI virtual disks, and therefore the creation of a virtual cluster.
There was not much free documentation on the steps for creating a virtual cluster under VMware Workstation floating around back in 2002. What follows is the result of personal experimentation combined with bits of information gleaned from dozens of web pages (most concerning clustering using VMware ESX or GSX Server, including VMware's documentation. The steps for creating a virtual machine that supports clustering are relatively simple, and I offer these instructions in the hopes that it prevents others from losing hours of time recreating the process.
Create a Virtual Template
- Install VMware 4.5.2.
- File, New Virtual Machine.
- At the Wizard screen, click Next.
- Select a Typical virtual machine configuration, click Next.
- Specify the guest operating system (Microsoft Windows) and the version (Windows Server 2003 Enterprise Edition). Click Next.
- Name the virtual machine whatever you want, and specify the desired location. Click Next.
- Select a brigded network connection, click Next.
- Specify the size of the virtual disk (C: drive). 4GB should suffice. Do not check either of the two checkboxes ("allocate all disk space now" or "split disk into 2GB files"). Click Next.
- Your virtual system has been created. Highlight the newly created system and click "Edit virtual machine settings". Under Hardware tab.
- Change the Memory setting to whatever your local system can handle. 128MB will work fine, 256MB is ideal for simple clustering, and 512MB+ for advanced clustering (e.g., SQL clustering).
- Change your CD-ROM to point to the Windows Server 2003 Enterprise Edition ISO on your hard drive.
- Change the floppy drive to NOT connect at power on (uncheck the "Connect at power on" box).
- Under Options tab, check the "Disable snapshots" box.
- Your virtual machine is now ready to have the OS installed on it. Power on the session and follow standard build procedures.
- Update the OS with applicable service packs and hotfixes
- Do not join a domain.
- Install any standard software/tools (antivirus, etc.)
- Install the VMware Tools (via VM, Install VMware Tools)
- Shutdown the OS, which closes the virtual machine.
- You now have a "template" virtual server. You won't be modifying this template anymore. Copy the template to two (or more) separate subdirectories of your choice, calling one NODE1 and the other NODE2 (or whatever you want). These copies will become the systems you modify for now on. I recommend configuring the virtual drive on template system to be nonpersistent so you don't accidently screw it up.
Create the Virtual Systems
- Edit the .vmx file for your newly copied virtual systems. If you followed the direction above exactly, there should be nothing that is directory-dependent within it. This should be verified; if there are directory references, fix them to point to the appropriate directory (that is, the directory within which the .vmx file resides).
- Via VMware, add the new systems to your Favorites (right click, Open). This will add them to the right window. Right click the tabs of the systems, select Add to Favorites. From the left pane, right click the systems and rename them (e.g., NODE1, NODE2, etc.).
- Configure the new nodes to have independent and persistent disks.
- Configure the OS on each node. Be sure to log on as an administrator for all OS configuration.
- Start each node separately (to avoid name/IP contention).
- Rename each node to whatever name you wish, and give it a static IP.
- Join the node to the domain.
- Add the cluster service account to the local administrators group.
- Shut down all nodes.
- Now it is time to add the cluster resources. For your first node, edit the virtual machine settings. You will need to add 2+ new hardware resources:
- NIC, configured as "Host-only" and "connect at power-on."
- SCSI hard drive, 100MB (the quorum drive). Select the option to "Create a new virtual disk" and "Allocate all disk space now." Name the virtual disk Quorum.vmdk, and configure it under Advanced to be SCSI 0:0, independent, and persistent.
- SCSI hard drive, 2GB (optional data drive). Select the option to "Create a new virtual disk" and "Allocate all disk space now." Name the virtual disk Data.vmdk, and configure it under Advanced to be SCSI 0:1, independent, and persistent. Additional drives can be added just like this, with sequentially higher SCSI IDs.
- Close VMware.
- Time to edit your .vmx file for the first node. Find the area where your SCSI drives are defined, and add the following two lines under the scsi0:0.present = "TRUE" line. Save and exit the file. I have included a sample .vmx file from my first node (only difference is that I made Data ID 0:0 and the Quorum ID 0:1).
- disk.locking = "FALSE"
- scsi0.sharedBus = "virtual"
- Power up the first node. You will be prompted to create a new unique indentifier (UUID); select "Create a new identifier" and click OK.
- Configure the new NIC for private IP (e.g., 10.1.1.1). You can disable NetBIOS on this NIC, btw, since it is for the heartbeat network. Similarly, you can disable File and Print Sharing.
- Configure the new disks via Disk Management:
- Write signatures to (initialize) the new disks, but do NOT upgrade them to dynamic disks (unless you use Veritas Volume Manager, since without VVM clustering does not work on dynamic disks).
- Configure each drive as a single extended partition, create a single logical drives on each, and format (default size for Quorum, 64K for Data). I assign the Data drive the letter E:, and the Quorum drive the letter Q:.
- Power down the first node. This node is now ready to have clustering installed on it. It is now time to configure the other nodes.
- The following steps need to be done for each additional node in the cluster. Via VMware, add the following virtual hardware:
- NIC, configured as "Host-only" and "connect at power-on."
- SCSI hard drive, 100MB (the quorum drive). Select the option to "Use a virtual disk," and specify the Quorum.vmdk file in first node's directory. Configure it under Advanced to be SCSI 0:0, independent, and persistent.
- SCSI hard drive, 2GB (optional data drive). Select the option to "Create a new virtual disk" and "Allocate all disk space now." Name the virtual disk Data.vmdk, and configure it under Advanced to be SCSI 0:1, independent, and persistent. Additional drives can be added just like this, with sequentially higher SCSI IDs.
- Close VMware.
- Edit the .vmx files for each node. Find the area where your SCSI drives are defined, and add the following two lines under the scsi0:0.present = "TRUE" line. Save and exit the files. I have included a sample .vmx file from my second node (only difference is that I made Data ID 0:0 and the Quorum ID 0:1).
- disk.locking = "FALSE"
- scsi0.sharedBus = "virtual"
- Physically (er, virtually), all your nodes are now ready for clustering. It should be noted that you should stagger the start-up of the nodes so that each virtual system queries its SCSI bus individually; starting all the nodes up at the same time may cause the nodes to freak out (and possible corrupt the shared virtual drives) if multiple nodes try to query the shared disks at the same time.
- Power up all the nodes (including the first node). New nodes will prompt you to create a new unique indentifier (UUID); select "Create a new identifier" and click OK.
- Verify that the new nodes can see the shared disk resources. Do not worry if the drive letters are not correct; clustering will configure that for you. Also don't worry about any unallocated space; for whatever reasons, this sometimes gets detected on additional nodes.
- On each node, configure the new NIC for private IP (e.g., 10.1.1.x). You can disable NetBIOS on this NIC, btw, since it is for the heartbeat network. Similarly, you can disable File and Print Sharing.
- Install clustering on the first node. You can actually install clustering first on any node, but I prefer to do it to the first node since it has the drives already defined. Install clustering on the additional nodes. All versions of Microsoft clustering (MSCS) should work with these systems (e.g., NT 4.0, Windows 2000, and Windows 2003). Build the cluster according to your own documentation, or refer to Microsoft's white paper.
Using the Virtual Systems
- Because the virtual systems are sharing SCSI drives, you need to be a bit careful on how you shutdown and start-up the nodes in order to prevent loss of data.
- Do not use the Suspend function of VMware. This tries to write the drive state, and will cause conflicts when you start the systems back up.
- Ideally, startup the node that owns drive resources first and shut it down last.
- If you want to be fancy, configure the shared drive resources to have a specific preferred node with failback, and keep the drive configuration set to independent/persistent on that virtual system. Then configure those drive resources as independent/nonpersistent on the other virtual system(s). This should, in theory, prevent data loss and corruption. Note: You should write all data to the shared drive via the cluster name, but be sure that the node configured for drive persistency owns the resource before shutting down, otherwise all changes will get discarded. The more I think about it, configuring all of this may be a Bad Thingtm...
Sample Configuration Files
Sample NODE1.vmx file:
config.version = "7"
virtualHW.version = "3"
scsi0.present = "TRUE"
memsize = "128"
ide0:0.present = "TRUE"
ide0:0.fileName = "Windows Server 2003 Enterprise Edition.vmdk"
ide1:0.present = "TRUE"
ide1:0.fileName = "auto detect"
ide1:0.deviceType = "cdrom-raw"
floppy0.fileName = "A:"
Ethernet0.present = "TRUE"
sound.present = "TRUE"
sound.fileName = "-1"
displayName = "NODE1"
guestOS = "winNetEnterprise"
priority.grabbed = "normal"
priority.ungrabbed = "normal"
powerType.powerOff = "default"
powerType.powerOn = "default"
powerType.suspend = "default"
powerType.reset = "default"
floppy0.startConnected = "FALSE"
sound.virtualDev = "es1371"
ide1:0.startConnected = "TRUE"
Ethernet0.addressType = "generated"
uuid.location = "56 4d 00 7e b6 56 03 e8-e6 31 c9 b2 26 67 6e 02"
uuid.bios = "56 4d 00 7e b6 56 03 e8-e6 31 c9 b2 26 67 6e 02"
ethernet0.generatedAddress = "00:0c:29:67:6e:02"
ethernet0.generatedAddressOffset = "0"
tools.syncTime = "FALSE"
redoLogDir = "."
undopoint.disableSnapshots = "TRUE"
ide0:0.mode = "independent-persistent"
scsi0:0.present = "TRUE"
disk.locking = "FALSE"
scsi0.sharedBus = "virtual"
scsi0:0.fileName = "Data.vmdk"
scsi0:0.mode = "independent-persistent"
scsi0:0.deviceType = "plainDisk"
scsi0:1.present = "TRUE"
scsi0:1.fileName = "Quorum.vmdk"
scsi0:1.mode = "independent-persistent"
scsi0:1.deviceType = "plainDisk"
Ethernet1.present = "TRUE"
Ethernet1.connectionType = "hostonly"
Sample NODE2.vmx file:
config.version = "7"
virtualHW.version = "3"
scsi0.present = "TRUE"
memsize = "128"
ide0:0.present = "TRUE"
ide0:0.fileName = "Windows Server 2003 Enterprise Edition.vmdk"
ide1:0.present = "TRUE"
ide1:0.fileName = "auto detect"
ide1:0.deviceType = "cdrom-raw"
floppy0.fileName = "A:"
Ethernet0.present = "TRUE"
sound.present = "TRUE"
sound.fileName = "-1"
displayName = "NODE2"
guestOS = "winNetEnterprise"
priority.grabbed = "normal"
priority.ungrabbed = "normal"
powerType.powerOff = "default"
powerType.powerOn = "default"
powerType.suspend = "default"
powerType.reset = "default"
floppy0.startConnected = "FALSE"
sound.virtualDev = "es1371"
ide1:0.startConnected = "TRUE"
Ethernet0.addressType = "generated"
uuid.location = "56 4d 00 7e b6 56 03 e8-e6 31 c9 b2 26 67 6e 02"
uuid.bios = "56 4d 00 7e b6 56 03 e8-e6 31 c9 b2 26 67 6e 02"
ethernet0.generatedAddress = "00:0c:29:67:6e:02"
ethernet0.generatedAddressOffset = "0"
tools.syncTime = "FALSE"
redoLogDir = "."
undopoint.disableSnapshots = "TRUE"
ide0:0.mode = "independent-persistent"
scsi0:0.present = "TRUE"
disk.locking = "FALSE"
scsi0.sharedBus = "virtual"
scsi0:0.fileName = "C:\VMWare\NODE1\Data.vmdk"
scsi0:0.mode = "independent-persistent"
scsi0:0.deviceType = "plainDisk"
scsi0:1.present = "TRUE"
scsi0:1.fileName = "C:\VMWare\NODE1\Quorum.vmdk"
scsi0:1.mode = "independent-persistent"
scsi0:1.deviceType = "plainDisk"
Ethernet1.present = "TRUE"
Ethernet1.connectionType = "hostonly"