Pankaj Singh Blog: 2012

Wednesday, 3 October 2012

Mobility with Control

I participated in VMware conference just after VMworld 2012. I appreciate VMware Vision & direction towards End user computing area.

As we know major transformation is happening in End user computing space but somehow i believe it is still limited with PC based end user devices. By virtue of VDI , we are able to consolidate consumer (employee's) desktops and saving cost on management/Maintenance & hardware refresh. Acceptance of VDI is still very low because of introduced latency/user experience and all above ROI. some % of customer are already running VDI instances for contractors/partners and to some extent for FTE's but large % of customers are still evaluating benefits of VDI.

As we know , Modern users rather i would say "consumers" have come out of PC world and have gone into phone/tablet era and therefore they want to extend availability of corp or non corp application on their mobile devices. I read a case study published by large automobile company using iPhone for their sales/Marketing department. They developed an application on top of iPhone to effectively distribute vehicle & part information.some organizations are promoting BYOD (Bring your own Device) Model to reduce investment on end user devices.

Having said all above, we can visualize the end state of end user computing in coming years. so lets assume what if your company allow you to bring your own laptop/Phone/Tablet and use that for your routine work. well it sounds excellent for you as you do not have to follow corporate IT policy/rules/ you can install any kind of application that you want even though its not licensed to your company. if we look at it from the organization perspective you are introducing following risks.

Breech of IT Policy
Data security
unmanned devices - IT dept has no control or idea what you are running on that device.
Bring unknown risks to corp network

and many more. i know some of you would say, well, its my device and if company is willing to use it why should i give them control ? I also agree , but lets also understand that its you who wants mobility correct ? How many of you are using corp laptop/tablet and following guidelines? :). Having said that , we need a middle way which allows you to have full control on your device and then at the same time your organization can control and manage it. Windows guys answer.. lets have dual boot systems... Ha Ha.. :) not a bad idea.

No need for dual boot, that's where VMware can help you. I like VMware approach when it comes to neutralize underlying Hardware. As we know different OEM have different IOS/OS and applications are also tied up with platforms. VMware opens the door and gives you a platform to run on any IOS/OEM and access your application. so what is that VMware offers , there you go

http://www.vmware.com/products/desktop_virtualization/horizon-application-manager/overview.html

VMware goes further and allows you have multiple profiles on single device which means, one for yourself where you install what you like and then another for your corporate. one big question is how do you manage these Sprawling devices ? Well answer is VMware only.

Having said all above , it does not mean that other vendors are not evolving their products however I think VMware is putting efforts into right direction and they really understand what will be the application delivery model of future.

HTML 5 is going to change lot of things the reason i am saying this is because VMware Horizon is leveraging the same.

Cheers,

Pankaj!!

Tuesday, 2 October 2012

Dell moves the needle in data center networking

Dell is doing some great work in Data center networking area ever since they acquired Force10 (http://www.force10networks.com/). Recently Dell announced DFM (Data Center Fabric Manager) which shows company vision towards Data center networking and most importantly towards SDN (Software Defined Networks).

Dell is capitalizing on some of the major challenges which we face while designing & Maintaining Data Center network. Let's take a look at some of those.

Designing Core Network :- Takes huge amount of time to design Data center core network and i am sure you must have spent weeks and weeks in front of Visio & Excel.
Maintaining similar Software on switches :- Depending on the size of organization you may be maintaining 100s of switches worldwide or may be in certain areas. Have you checked if they are updated and running latest binaries. well and answer is , we know to some extent.
Knowledge Transfer :- What if your network admin decides to move on you are left with Built/Architecture documents which are mostly not updated. Yes this is known fact , look at your document repository and see when was the last time it was updated :)
Lots of Typing :- If you have to configure 100 switches with same CLI commands , you may create a script or find some other way but there are no standard way to enable all network engineers/administrator to follow same standards while configuring switches. if you have to validate Layer 3 configuration on these switches, you again need to spend week or may be more than a week.
Cabling Information :- How many times you have asked for help to trace the cable, I know almost every time :). I have seen people maintaining excel sheets to maintain cabling information. If cable dressing was properly done , you still need to look into the excel sheet and find our what is connecting where. We have traced some servers in DC using MAC address and i can tell you from my experience although it was easy but u need to logon to multiple switches to trace it.

Answer to all above challenges from Dell is http://www.dell.com/us/enterprise/p/dell-fabric-manager/pd

Having said all above, i see a great value from Dell to customer in helping them with tool based approach to not only design the network but to maintain it. Dell has got very limited information on this overall solution offerings and therefore i am still not sure if its going to work with Cisco/Juniper/HP Network stacks or just with Dell Force10.

I hope Dell has bigger plans in Network space and they would like to provide unique solution in new era of networking which will be driven by software.

Software define Networking is going to be the future and i will write a separate post on this topic.

Cheers!!

Pankaj

Monday, 1 October 2012

Getting Disk IOPS right

One of the commonly discussed subject in storage space. Although Certain guidelines and recommendations are always available from the Application vendors however its worth spending some time to ensure you are correctly designing storage fabric & Arrays to provide agreed capacity + Performance.

From my opinion, its very important to first understand the application Nature and associated I/O pattern plus its latency thresholds. and at the same time you need to know the complete architecture of your SAN fabric and storage arrays.

Having said that , what are the most important information that i need to kickstart, Here you go!

Application IO Requirement :- Majority of Application Vendors (Microsoft/Oracle/SAP etc) - They provide I/O sizing tools
Latency threshold :- Some applications would require system to respond back in specified second.
I/O Pattern :- Very important point to consider , you need to know whether I/Os are Random/Very Random / Sequential. Also focus on de-duplication capabilities of storage arrays, this will effect sequential access to random.
% Read/Write :- Another important point to consider, you should know if there are 10 IOPS per second , then how many are for read and how many for write.
I/O Size :- You need to know what is the size of read/write request (512 Bytes, 4K , 8K etc). This is considered as size of one transaction.
Sudden Burst & Failure Penalties:- Assume some buffer for peak hour and to provide same performance in event of few spindle failure in RAID groups.
Future Growth :- think of future requirements as well, although you can always add capacity or performance but give some time to understand future requirements as well.
Raid Penalties :- We all know all kind of RAID levels for performance and redundancy but do we know I/O penalties associated with each raid level ?

let me emphasize a bit on Raid Penalties because generally its not discussed. so what is it? well depending on RAID level you have to commit write request twice or thrice or may be more. take an example of RAID 10 you are writing 2 times for one write request and therefore you are eating 2 write IOPS which is not in the case of RAID 0. There are no penalties involved while reading from RAID volume. Following table shows penalties for different RAIDs.

Now lets understand Layers of Storage Fabric/Arrays which may help you to think through some other aspect which may be important.

Following diagram shows a typical layers from server to the storage system.

so lets first understand , how physical disks type can significantly change overall math. Transactions which are being generated from server is passing through several layers and finally reaching to the physical disks which are the actual labors :)

I hope you all are aware of different disks types and their performance matrix , but just for the sake of people who are not exposed to Arrays, lets quickly look at some of the important offerings

SSD
FC
SCSI
SAS
NL-SAS
SATA

you can find more information on web, if you like to further dig down and understand how they are different or similar. but since we are talking about IOPS its worth knowing number of IOPS you can get from these disks depending on their RPM. Following table provides IOPS count.

These are some industry standard numbers and don't trust on vendor statements , they always add 10-15 % extra. to get complete performance matrix , you can visit www.spec.org and you can find published performance data from multiple vendors and then make a decision.

I hope this post was informative, some interesting posts on storage are on the way.

Cheers!!

Pankaj

Friday, 28 September 2012

Blade System Series Part-2

I hope you read Part-1 of this series. Today, we are going to talk about Cisco UCS (Unified Computing System). Cisco Journey in x86 server market was started back in 2009 and now it is giving tough competition to other Vendors in this space.

You will find all kind of fancy articles/Blogs on UCS ,but i would like to make it simple because that's how i understand.

So, lets first understand Building Blocks of UCS and then we will deep dive into architecture/Design and other stuff.

Cisco UCS Components

1. Chassis :- Cisco offers 6U 5100 series chassis which can house 8 half height or 4 full height B Series Blades. The fully populated chassis would require from 3.5 to 4 KVA power.if you want to accurately calculate power requirement click here to get Cisco UCS Power Calculator. Total of 8 Fans and 4 power supply would provide power & cooling to the systems. from the density perspective 5100 series does not look attractive as other vendors can offer 8 to 16 Blades in single chassis. Click here to get more details.

Lets quickly understand Chassis Architecture with the help of picture.

Front View of 5100 Chassis



Rear view





2. Fabric Extender:-   Cisco UCS chassis has 2 I/O slots (remember I/O bays) from my previous post. There is no intelligence build on Fabric Extender and therefore I call it Pass-through device. Dual fabric extender provides total of 80GB converged (LAN + SAN) bandwidth to Blade servers. With the release of Cisco 2^nd Generation UCS, Fabric extender (2208) can scale up to 8x10 GB from each Module which means total of 160GB converged Bandwidth to Blade systems. Each Fabric Extender provides 16 to 32 downlink ports to connect with blade servers via BackPlane and 4 to 8 uplink ports to connect with external world. uplink ports can be populated with 10GB SFP+ Gibic.

3. Fabric Interconnect :- To Many interconnects :). Well this is where Cisco is different from other vendors. Fabric Interconnects is the first intelligent hop to which UCS chassis connects to access external LAN or SAN Fabric.Then you would ask, what Fabric extender is doing ? :P.. like i said , Extender is doing nothing - if i make it simple ,its more or less like a port that you see on rack servers the only difference is that you can't plug RJ-45. You need twinax cable to connect Fabric Extender with Fabric Interconnect. so how Cisco offering is different , well answer is simple , with Fabric Interconnect you can connect multiple UCS chassis, and since all short of management is done from this device only, you don't need to think about multiple console. From the overall LAN infrastructure landscape perspective you are replacing distribution layer switches (the switches where your servers would generally connect, remember Top of the rack or Side of the Rack Topology). so in simple words, if you are going with HP , you need to buy Blade switch which will fit into the I/O slot and then buy top of the rack switch (Cisco/HP/Brocade) and then connect with aggregation layer. I hope you are aware of conventional 3 Layer (Distribution/Aggregation & Core) network architecture. You need pair of Fabric Interconnect for redundancy.

Having said that, Cisco offers Following Fabric interconnects

6100 series :- First generation of Fabric interconnects are available in 2 different Models

6120 :- 20 Ports

6140 :- 40 Ports

6200 series :- Second generation of Fabric interconnects are available in

6296 :- 96 Ports

6248 :- 48 Ports

so lets understand by diagram , how and where this device fits into the UCS system.

as we can see in the diagram, Fabric interconnect is the first device where UCS chassis is connecting with Twinax cable. There are no additional cable for FC, because all the traffic between UCS chassis and Fabric interconnect is converged (FCOE) which mean single cable is used for Fiber and Ethernet packets.unlike traditional servers where you need additional cable to FC. Depending on the customer requirements and scale of deployment , you can make a decision to isolate Ethernet & FC traffic from Fabric interconnect or continue with converged network for LAN & SAN having said that, i have not shown any connection after fabric interconnect. I will write another article around designing FCOE network in context of UCS and other BladeCenters.

4. UCS Blades :- Cisco Offers verity of Blades, what makes Cisco different from other vendor is that they offer Extended memory (sales pitch). HP can provide more memory support with G8 series Blades. For more information on Cisco Blade Server offerings Click here.

5. I/O Devices :- Fancy world but its basically NIC & HBA's. All major vendors are now shipping severs with CNA cards. Many people get confused with CNA's , folks its a simple consolidated MicroChip , which has 2 MicroChip built on top of it (1st for NIC- Broadcom , 2nd for HBA - QLogic or Emulax) , These vendors are known, you may find CNA having NIC/HBA MicroChip from other vendors. Very common question is , if its a single integrated Microchip how much bandwidth is allocated to NIC & HBA, well very simple if its 10GB CNA , 4GB is hardcoded for HBA and rest 6GB is for NIC. I know some people would say.. we will decide this allocation using Hardware based QoS in UCS.. :)... My friend go back and double check.. QoS reserves 40% of 10GB , so you are only left with 60% which is 6GB.

Due to the virtue of server Virtualization , a new trend has been started to consolidate Network/Storage/NIC everything. Having said that, remember those days when you needed 6 NICs on the server and there were no PCI slot available to install additional card. Talk to people who are working on server virtualization projects, they need 8-10 Nics due to the number of VLANs and traffic isolation for underlying virtual infrastrucutre (VMware FT, vMotion, Storage vMotion, Microsoft CSV, Cluster heartbeat and many more).

so in simple word , what we need is more I/O interfaces (ports) with less PCI cards or Mezzanine cards. so what is the answer, well build a Mezzanine card and use PCIe functions (SR-IOV) to chop it into smaller pieces and fool OS and Hypervisor . Single Root IO virtualization is huge topic in itself and lots of improvement happening in this space. I am planning to write article purely focused on I/O virtualization.

so what cisco offers , They offer something called "Virtual Interface Card" nick name "Palo Adapter" , its a single chip which can scale up to 256 Nic/HBA depending on the Model. They do offer traditional cards with 4 and 2 ports. complete protofolio is available here.

6. UCS Manager :- last but not the least, this is software component which is built inside fabric interconnect and runs in HA mode. UCS Manager is built on very modular architecture , which is capable of abstracting the Hardware identity. I can write 2 page article on UCS manager but, i think that's not in the scope of this post.

so these are major component which makes Cisco UCS system. Fabric Interconnect/UCS Manager & Palo Adapters requires another post and hopefully you will find some posts around that soon.

Thanks for the time and I wish you great weekend ahead!

Cheers!!

Pankaj

Wednesday, 26 September 2012

Blade System Series Part-1

Received lot of requests to write about Blade systems so there you go! The way server technology is evolving, soon Blade Systems will become history and some open server Architecture will be adopted by all Major hardware Vendors. Our discussion is around x86 Blade Systems not power or Sparc.

Facebook & Google are 2 Big examples where Blade systems are not used. They are encouraging Open System architecture. for more details have a look at following articles.

I believe Blade systems are going to run Data Centers for another couple of years so no need to worry about it today but its good to keep eyes on future trends and research.

Alright!!.. All Major Hardware vendors (HP/Cisco/Dell/IBM/Sun Oracle etc) offer wide rage of Blade systems with few common or different feature sets. This post will focus on Blade system architecture followed by offerings from different Vendors in subsequent Posts.

Any Blade system you talk about are made of more or less Following components

Chassis:- Consider this as a empty box with 8 to 10 unit in height which is the building block of the entire system.
BackPlane :- This component is assembled inside the chassis to provide high speed IO (input/output) path to Blade Server via I/O Bays.
Bays :- Consider this as a slot where you can install blades. Bays can be customized to allow full/Half height blades installation or Mixture of both.
I/O Interconnect Bays :- These are again empty slots where you can install switches (Fiber or Ethernet) to connect Blade Servers with external Fiber or Ethernet networks. unlike rack servers which connects directly to Fiber or Ethernet network. Blade servers connects with High speed BackPlanes which further connects with I/O Bays - and the switches installed inside I/O bays would allow further connectivity.
Blades:- Well, its the actual compute power which you install in bays. The reason they are called blades is because its highly dense in form factor and takes very less space.
Fan :- No Need to Explain
Power Supply :- No Need to Explain
Management Modules :- Consider this as a device which allows you to manage all the above components that we talked about.

too much of text :), I think picture speaks automatically. so lets understand all these component using HP Blade System.

Other vendors follows more or less the same architecture except Cisco. Having Management Module and I/O switches in every chassis increases Management as well as cabling that's why Cisco splits Management Module & I/O switches from the chassis. They further merge management Module into the I/O switch itself to reduce number of devices. This design increases efficiency by sharing I/O switches with multiple chassis , which is not possible when switches are mounted inside the chassis. so lets understand this design with examples.

Consider 2 Cisco Chassis
Consider 2 Other Chassis which follows integrated I/O switch & Management architecture

For Cisco , you need only 2 I/O switches (Redundant) , you don't need Management modules as its integrated with I/O switches.
For other you need 4 (2 in each for redundancy) + 4 Management Modules (redundant)

This is just an example , you will find more detailed comparison in upcoming posts.

Next Post is about Cisco UCS Blade system followed by other Vendors!!

Cheers!!

Pankaj

Tuesday, 25 September 2012

Battle Beyond Virtualization Part-1

With the release of VMware vCloud Suite 5.1, The new war has kicked off in comprehensive Management/Monitoring/Automation and Orchestration space by Microsoft & VMware. VMware released 5.1 after Microsoft announced Windows 2012 with all new exciting features. VMware answer to MS release was to discontinue RAM based licensing. Yes correctly heard, VMware is back to CPU based licensing.

VMware did more they added features like storage vMotion, Hot Add, Fault Tolerance and vShield Zones to the vSphere 5.1 Standard Edition with no extra charge. These features were previously available in more expensive editions. VMware also added their DR tool vSphere Replication to Essentials Plus and higher editions with no extra charge

Having said that lets evaluate Microsoft & VMware offerings in private cloud space. Let’s first take a look at Private cloud attributes by Microsoft. This was presented during WPC 2012 in June. I am going to evaluate Both Vendors on these attributes.

1. Pooled Resources:-

This is primarily about resource aggregation & maintains sufficient resources to fulfill any on demand request. So what is the definition of resource? is it just limited to compute?

Nope...we are talking about pooling storage/Network & Security Stacks as well.

Microsoft: -

Microsoft resource pooling is primarily done at SCVMM level where you can define pools of Hyper-v Cluster, IP/MAC address, Storage & VIP with load Balancers. New service Modeling has been introduced which is being seen as a good value proposition while defining service types/catalog etc. Hyper-v Cluster can scale up to 64 Nodes which I think will give you massive aggregated CPU & RAM capacity however resource aggregation is still not modular. Following diagram shows logical representation of aggregated/Pooled resource which is called Cloud Object in SCCM 2012

VMware:-

VMware has different strategy, they further abstract virtual resources and aggregate those into something called vDC (Virtual Data Centers). RAM/CPU/Network & Storage is aggregated at Virtual center level and called as vDC further those vDCs are aggregated as resource at vCloud Director. vCloud directory has several other roles which is not in the scope of today’s discussion. Following diagram shows Logical representation of resource aggregation in VMware.

Conclusion: - I think Microsoft solution is good for SMB Customer’s because of single layer aggregation whereas VMware can scale up and produce massive pooled resources, which makes it suitable for Enterprise customers and service providers. Although VMware only supports cluster size up to 32 Nodes however due to 2 layer aggregation they can produce huge capacity.

2. Self-Service

Self service capability is a critical business driver that enables members of an organization to become more agile in responding to business needs with IT capabilities to meet those needs in a manner that aligns and conforms with internal business IT requirements and governance. This means the interface between IT and the business are abstracted to simple, well defined and approved set of options that are presented as a menu in a portal or available from the command line. The business selects these services from the catalog, begins the provisioning process and is notified upon completion, the business is then only charged for what is actually used.

Microsoft: -

Microsoft offers self-service portal along with SCVMM. SSP runs as web service and can be installed on VMM server however Microsoft recommends that you should install it on separate machine to have better performance.

VMware: -

VMware offers self-service portal along with vCloud Director, which does more or less what Microsoft SSP can do. I have evaluated both Vendors on few parameters which I think are essential.

Parameter	Microsoft	VMware
Design Workflow	Yes	Yes
Role Based Provisioning	Yes	Yes
Catalog Templates	Yes	Yes
Extensible	Yes	Yes
Integration with Orchestration	Yes (Works well with Opalis)	Yes
Allows partners to extend functionality	Yes	Yes

Conclusion: - No significant difference. I think both offerings are equally good.

I hope this post was informative. I will talk about other attributes in Next post..

Cheers!!

Pankaj

Monday, 24 September 2012

Manage unstructured Data

I hope you had great Weekend..

Today i am going to talk about unstructured data.. I know most of you are already aware of what kind of Data i am talking about...

Well i am talking about Data which is not stored in rows and column format (Typically Database format) - Well you can find 100 different definition for the same , but in summary this is what
it is.

i am talking about following types of files

Text Documents
Media files
Flash Files
JPEG/JPG images and several other files which you can think of

Typically such files are hosted on either file servers or Share Points (Microsoft Encourage Share Point based file sharing), I don't understand the reason but let's not get into that :).

There are plenty of solutions available to store such information but this discussion is around file servers.

Any Business vertical you talk about , You are going to see file server footprint with huge Data.

let me summarize some of the challenges that customers would typically face if they have large file server infrastructure.

Access & Right Management :- This is one of the pain area, There is no built in intelligence in Windows , rather NTFS which can help you to correctly define Access/Delegation/Rights Model for distributed organizations. Windows Admins would typically work with 100 different consoles to Manage permission/rights.
Search & Report :- How many times you have tried to write PowerShell/VisualBasic script to find who has what permission on which folder ? I am sure we all have done that :)

Backup & Restore Permissions :- What if you are just asked to restore permission not just complete data ? ,

Migration of Data :- Robocopy & FSMTs and there are many more not a big challenge right ? , but yes it is a challenge when we working on large File Server Migration/Consolidation projects.

Having said all above, Let me propose you some Enterprise class solutions which can help you to great extent if you are managing large File server infrastructure.

I am sure some of you might be using it already or may be having some other solution in place.

VaronisDatavantage:-http://www.varonis.com/products/datadvantage/windows/index.html
Quest Security Explorer :- http://www.quest.com/security-explorer/

I hope this post was informative.

Cheers!!

Pankaj

Saturday, 22 September 2012

Improve SLA on EOS or EOL Hardwares

This post is useful for People Managing large Data centers and have some server instances running on Old hardware (EOL or EOS). if you think you don't have, Please re-look at your server inventory or CMDB database :).

so what is the risk associated ? , Well if server Hardware dies due to any reason You just can't meet server uptime SLA - Why ? some of you may say.. well our backup policy is really good, we will arrange New/Old hardware and restore :).. so far it sounds really good. Let me ask you this , How Many times you were able to restore system state backup on different Hardware ? :)... I know the percentage is fairly low because of not underlying Hardware but the Operating System (Windows guys are main victim of this)

so what is the solution:-

Simple Solution is to refresh the Hardware , but have no budget left :) - Very common

Virtualize the server :- Excellent Idea, but what if Application running on top of it is resource hungry.

1:1 Virtualization :- I think this approach can help you to some degree in re-mediating Hardware failure and at the same time Application can get same level of resources due to 1:1 ratio. I know you may see some upfront investment but that's not true because standalone Hypervisors are Freely available.

Just to Conclude , We are abstracting the Hardware layer so that in event of Hardware failure we can replace it with any Make/Model. One of my friend asked me , what if the server VHD/VMDK is sitting on Local disk and the Disk fails... Very good Question.. Well Hardware abstraction is going to give you mobility which does not mean you would stop talking backup of the server. But now Backup is different , its not the whole system state , its a Backup of a file (VHD/VMDK) - which has server and installed application.. isn't it. Restore of single file is easy and faster than complete system state backup with 1000s of files.

so.. what will happen in event of Hardware failure.. This is what you need to do

Bring new/old Hardware.
Installed Standalone Hypervisor.
Restore the VHD/VMDK
Power on the VM (You may need to restore some. incremental files depending on the Backup Policy.

I hope this is informative.

Cheers!!

Pankaj

Friday, 21 September 2012

Whats New in Hyper-v 2012

Folks,

Microsoft has released consolidated document talking about all new features in Hyper-v 2012.

https://gaddisevents.sharefile.com/d/s1520f8121a742128

Windows 2012 Introduction, free Guide is available at following link

http://go.microsoft.com/?linkid=9811411

I am digging into all new features, so far it appears to be average , will share the findings and observation soon.

Cheers!!

Pankaj

Server Sizing.. was really easy Part-2

I hope first Part of this series was informative and it may help you at some degree. In this post I am going to discuss about Virtual Host Sizing. You can find tons of information on this topic on Web however what I really don’t find is the consolidated view/approach. I am going to make this as simple as possible and hope this may help you.

Having said that, let’s get started. First of all you need to know what are the pre-requisites before you start sizing for Virtual Hosts.

· Pre-requisites

Identify workloads/Servers which can be virtualized.

Peak Hour/Average resource usage (CPU/Memory/Network/Disk).

Resource usage report for 1 Month at least.

Consolidation ratio (How many servers are going to share the Hardware)

Ad-hoc capacity demand

Well there may be several other factors to consider which I am going talk about later in this post.

So how to know what may be/ may not be virtualized, what is the current/peak hour utilization & Average utilization for the period of time. Let’s talk about it one by one.

What can be virtualized: - Tons of case studies and guidelines are available on web from VMware/Microsoft/Citrix etc. Question is , is there any thump rule which can help me to categories workload which are potential candidates for virtualization even when I don’t have utilization report with me ? Yes… there is… some of them are

§ Infrastructure servers (AD, DNS, DHCP, Print Server, File Servers etc.)

§ Web Servers or Intranet Portal Servers

§ Utility Servers (WDS, AV Distribution Points, SCCM Distribution Points, Altiris Distribution Point, WSUS etc.)

§ Middleware Servers

Alright, what about other servers like (Application Server, DB Server, and Messaging & Collaboration Server etc.) Well that’s where you need do utilization analysis. Now the challenge is, how do I get utilization report from 1000s of servers spread across multiple location ? Well then you need to think about some automated tool & Solution which can do this work for you. Let’s talk about them. These tools are smart and will tell you which workload is good or bad for virtualization.

Ø Microsoft MAP Tool Kit:- Microsoft Assessment & Planning tool can help you to discover and collect performance/resource utilization from multiple servers.

Ø VMware Capacity Planner: - Well, I am not big Fan of Capacity Planner because you rely on VMware to do Analysis for you and publish report on the Dashboard. But many of you would like it.

Ø Platespin Recon: - Excellent tool for Analysis and reporting, I prefer using Recon.

Ø IBM cDAT :- Although I have not used it but I have heard good feedback for this product. Worth trying it…

You may find some other tools/solution on web however I may not comment on those as I have not used them.

Alright, so what Next… Well now you have complete Data to process and make a decision on size. If Data collection and analysis was properly done you should have following Data with you.

Ø Total Number of servers which can be virtualized

Ø Total CPUs allocation and actual usage on servers which are identified as virtual candidates.

Ø Total RAM Allocation and actual usage

Ø Disk Size on Servers (SAN/DAS)

Ø IOPS

Ø Network Bandwidth usage

So let’s put some data and try to understand by example.

Ø Total Number of servers which can be virtualized - Assume 100

Ø Total CPUs allocation and actual usage on servers which are identified as virtual candidates – 200 Cores (Assuming 2 Core per server)

Ø Total RAM Allocation and actual usage – 1600 GB (Assuming 16GB per server)

Ø Disk Size on Servers (SAN/DAS) – 6000GB (Assuming 60GB per server)

Ø Network Bandwidth usage – (10000 MBPS, 100 MBPS per server)

Ø Consolidation Ratio = 1:10 (Just an example)

So for 10 virtual Machine the total capacity I need is as follows

Ø CPU 2x10 = 20 Cores (Sizing on actuals , in real world although server has 2 cores allocated to it but peak hour utilization is just 1 cores of even half)

Ø RAM 16GB x 10 = 160GB (Again you need to look at peak Hour utilization and adjust it accordingly with some buffer)

Ø Disk 60x10 = 600GB (put some buffer for Page file & Swap space

Ø Network 100x10 = 1000MBPS (Gig Adapter)

Having said that, the size of the Box will be

CPU = 2 Socket - (16 Core each)

RAM = 164GB

Disk = 600GB (Add buffer)

NIC = Dual Port 1gig

And then add 10/20 % Extra capacity for sudden burst depending on your budget and do think about Hypervisor penalties (Hyper-v guys would know what I am talking aboutJ)

When you are planning for HA or on demand provisioning you need to think about failover and spare capacity as well.

This topic is really huge and you can hear lot of debates on sizing but I believe this post can help you to some extent.

I will talk about some of the other factors to consider in upcoming post on virtualization planning…

Wish you all good weekend!!!

Cheers!!

Pankaj