View blog reactions

The Virtual Data Center

A Virtual Team Blog about the VDC and How To Get There
Subscribe

Archive for December, 2008

Why Did Microsoft Build Hyper-V? Let’s Turn Back The Clock…

December 23, 2008 By: Alan Category: data center, microsoft, systems, virtualization, vmware 1 Comment →

I was reading Alex’s post on SearchServerVirtualization this morning and really focused in on this comment from John Humphreys at Citrix:

If you think about what the operating system does, it only does three things. It provides an interface to the hardware [and] to the software and provides middleware,” Humphreys said. “If the hypervisor is doing one-third of that job, that’s a potential price cut for [Microsoft].”

This comment has had me thinking all morning, and now as I sit down to enjoy some lunch, I’m really starting to noodle a this idea (the boucing between past and present is intentionl - this has been sitting with me all morning, seriously).  First I started thinking about the OS vs. hypervisor idea. Do these two ideas compete? Is it really the goal of the hypervisor to replace the OS, or at least take on 1/3rd of the duties as John suggests? I think the answer is “No” to both of these.  I think the hypervisor’s role is to manage access to and, to a certain extent, emulate hardware, for operating systems, not instead and not to replace. Long term I think this will allow the OS to function more on resource management for its own applications rather than having to also worry about where those resources come from.  Seperation of duties: let the OS manage the apps so they all play nice together and then let the hypervisor manage the hardware for the OS, while the OS still owns allocating the resources. Even better when the OS’ and hypervisors start working together ala paravirtualization, which is an interesting side note to all of this and may have answered my question, but before I get to that…

So then I latched on to this comment from Gordon Haff:

Haff believes Microsoft’s attitude is misguided. Rather than “stress out” about delivering a hypervisor, it “could have done more in the partnering arena and saved the world 75% of the ‘Microsoft doesn’t get virtualization’ stories’ and never lost a penny.”

…and that got me thinking about the “Why didn’t MS partner for the hypervisor?” idea. And here, I’m stumped. We all know MS isn’t a hardware company, and minus the XBox they admit this as well. So if the hypervisor truly is meant to emulate hardware rather than software (ie the OS), why would MS want to build a virtual hardware platform? As far as the OS is concerned, the hypervisor is the hardware platform. Why buy then build rather than partner? Why not work with a company that’s 100% focused on the hypervisor?  I think we all also agree that MS is a software company; their revenue comes from software licenses. They make apps and operating systems to run those apps.

This begs the question: What was the original benefit evaluation for MS to start building Hyper-V (knowing that Hyper-V grew out of Virtual Server which was acquired by their puchase of Connectix in 2003)?  Why did they head down that path rather than working with VMware (remembering that until recently, Xen wouldn’t have been an option due to its *nix lineage)? MS makes money every time an enterprise installs VMware to run a Windows operating system. With issues like sprawl, MS could stand to make a good amount of money on both OS and app licenses.  And as we saw on Mike D’s blog (although take it with a grain of salt b/c Mike does work for VMware and has vested interest in propogating the MS vs. VMware price war), the MS licensing model does require a finite number of virtual instances of their OS for every license except with a full Data Center license, which makes sense.

Now please don’t misunderstand me: I am in no way suggesting that MS shouldn’t be in the hypervisor business or kill Hyper-V.  I’m simply thinking out loud. Long time readers of this blog will know that I actually think MS is in the perfect position (long term) to completely change how we think about using discrete hypervisors, operating systems, and apps. The day I can run Sharepoint on a bare metal, bootstrapped hypervisor (ie hardware -> Hyper-V/WinKernel -> Sharepoint) is the day I jump for joy. Now that we have Hyper-V, I think MS does have the ability to own the virtual stack and deploy true application virtualization better than anyone. And I also disagree with Haff’s comments about MS not getting virtualization. I think they get it extremely well, they’re just up against a technology originator (in the x86 space anyway) and have to defend every thing they do. Which is funny given the nature of this post I guess.  :)

But before we had Hyper-V…To me it seems they would have had nothing to lose from partnering with a VMware hypervisor company and jointly developing a “WinOptimized Hypervisor” or something of the like, in a similar manner to working directly with Intel for CPUs. Was their goal to own the stack for app virtualization without an OS? To create a paravirtualized running environment where Server 2008 and Hyper-V, for example, separate duties (which they’ve done on the hypervisor-only front)? Even if that comes at a huge cost from competing with VMware and having to become hardware experts for things like virtual switches and infrastructure? What’s in it for MS?

Anyone know the history (more recent than Connectix) and/or want to chime in? I’ll be the first to admit if I’ve got this wrong, so let the opinions fly! What am I missing? :)

VM Sprawl: Your Mom Was Right, It Is That Scary

December 22, 2008 By: Alan Category: data center, management, security, systems, virtualization No Comments →

Ok, we have to talk about Hoff’s comments on VM sprawl, and it’s lack of security severity, for a second.  Now I’m guessing that he’s more focused on picking at the wound to spark conversation, and driving an excellent discussion on terminology, than he is in actually suggesting VM sprawl isn’t a security threat. But I’ll bite for a minute, and expand on my comments to his original post.

Yes, I do believe VM sprawl is a very serious threat, specifically due to what he calls out: lack of management. But it’s more than just a lack to me; I believe it’s an inability to even try to manage all VMs that is the problem. This same problem exists with physical servers, too. If someone can get access to the data center, bring in a box, find a switch port and know how that network is configured, then she can install a rogue physical server. But the barrier is so, so much lower with VMs because everything is “flat” once it’s virtualized. Our management choices today, such as vCenter, are doing a good job at creating virtual security barriers analogous to a keycard on the data center door, but we’re not quite there yet.

This is like managing termites. You can do everything in your power to protect your house from those little invaders by managing your perimeter and trying your best to find where they’re coming in, but if your builder “accidentally” left an exposed beam in your root cellar then built concrete blocks around it, it’s going to be very difficult to find that point of entry. Just like rogue VMs. It’s going to be extremely difficult to stop new VMs from spinning up at every point in your network.  There are ways to do this on the network level in the data center with IP registration, passive packet sniffing the virtual switches, 100% control over application traffic routing, etc.  But they’re not full-proof, and it becomes much more difficult in the desktop world.

Example: I use VMs for work on a daily basis. When I travel for work, those VMs come with me on my laptop. Inevitably, when I fire up my laptop and VMs on the hotel network and go through the “pay to play” registration process, they want to charge me for both my laptop and my VM. On a basic level, this makes sense: they charge and auth based on MAC address, I typically run my VMs in bridge mode, they ARP their own MAC over the NIC attached to the hotel network, cha-ching!.  But it’s an easy fix that I forget about until I see the 2nd $19.95 charge for the same day appear on my bill: NAT mode. Hiding the VM behind the MAC that the hotel is already aware of saves me from calling the front desk.

This is a classic example of where VM sprawl is such a threat. I could easily set up 100s of VMs on their hotel network and only pay for one access connection, and that would have no insight into what I was doing on their network with my VM farm (assuming I’m doing something malicious outside of their network and an encapsulating in SSL). Sure, it would be slow as molasses, but I’m not concerned with speed, I want power.  :)

This is the same threat that enterprises are currently facing with VM sprawl.  The specific threat is the ease at which users can spin up VMs: they don’t even need IP addresses, just a terse understanding of NAT. And for the most part, I don’t believe the threat of rogue VMs comes from sending traffic TO the rogue VM anyway. It comes from what that rogue VM is sending OUT on the network.  What is it spewing across a corporate or production network?  How hard is it to track and shut down if it’s NAT’d? Even if it’s only up for 5 minutes, that’s enough. VDI Bombs only take seconds.

So as always, I thank Hoff for posing an excellent question and for the lively discussion on Twitter between him, myself, and Lori. I do agree that a proplery managed VM infrastructure, with tools like vCenter and 3rd party VM network monitoring tools drastically help stave off full-blown VM sprawl. But it’s the inability of these solutions to monitor everything that scares me the most.  But if nothing, Hoff knows how to pick at the market’s shared security wounds.  :)

Moving VMs to Find The Traffic: Constant “Road Work(load) Ahead”?

December 17, 2008 By: Alan Category: data center, management, storage, virtualization, vmware No Comments →

I just read Andreas’ post about “virtual routing” and moving VMs to match and optimize traffic flows (yes, it’s a few days old, but catching up on email and blogs takes time after vacation) and I’ll have to say my head started spinning – and not the good kind of spinning. The basic suggestion is that because we can move VMs a39_p1around as needed we should move them constantly to match traffic patterns and destinations on the network. As traffic patterns and routes change, we move VMs to follow the routes Really? You want to, in essence, move data centers to follow traffic patterns over the same network that’s supporting said traffic, constantly?

Let’s look at this from another perspective: imagine you own a local chain of gas stations. These gas stations provide services for users (gas, food, coffee, etc) as they travel along various highways. They come to you when they need something, said service, otherwise they drive right past you. Your gas stations are located along major interstates at various exits as well as smaller highways and byways throughout the state. Your business accelerates during high-peak travel times (rush hour, weekends, vacation periods) and drops off during slow times (2:00 AM, Christmas, etc). In this slower economy, you’re looking for ways to maximize your revenue and you’ve noticed an interesting trend: drivers are staying closer to home on the weekends and shopping local, avoiding long travel trips in order to save money. You typically don’t have gas stations on local 2-lane roads and you can’t afford to open more stations, so you’re faced with two options:

  1. Move your gas stations from the highways to local 2-lane roads: Sure, it’s an option, but you would be sacrificing valuable peek highway revenue.
  2. Drive traffic from the towns to the highway and to your gas stations: Using signs and promotions such as “Free Coffee with Fill-Up!” - “2 Hot Dog Burritos for $1 with Large Coke!” - “5¢ Gas Discount for Local Residents!” - you can somewhat control how people find your gas station. You can control traffic flow to your service station.

What Andreas (along with Doug) is suggesting is a morphing of #1: Moving the gas station 2 times/day to accommodate traffic patterns and ignoring how and why traffic is driven to your station. Gas station is on exit #421 during morning and evening rush hour and then is moved to the intersection of Maple & Elm during off-peak hours. There’s no care or concern about the traffic patterns, the station is just following the flow.

This doesn’t resonate with me on many levels. First off, the basic principle of moving a server to match the traffic rather than managing traffic to the server. Traffic is smaller and easier to manage than servers, even with virtual servers. Managing constant VM movement with something like vCenter, clusters, resource pools, etc, is a surmountable task. Second, the process of moving the VMs themselves will have a cost wrt management, downtime, bandwidth (in the event Storage vMotion is used to follow the traffic). As an example, a quote from the post:

“VMotioning nodes (servers) to optimize the flow of traffic on the network”

Hmmm…How will moving a server optimize the traffic flow to that server? Again back to our gas station example: moving the gas station won’t change the traffic patterns at all. And are we talking LAN or WAN? If we’re talking LAN, then there should be traffic management devices in place already to manage optimized access to the apps running on the VMs. If you’re talking WAN, moving VMs cross-WAN on the fly has its own set of issues with traffic and storage.

Your goal when pulling into a gas station is to request a service: gas, coffee, food.  Moving the gas station just addresses cars driving into the parking lot. As always, we should be focused on the service being delivered in addition to the service station.

Don’t get me wrong: the mobility is one of the coolest benefits of VMs, and tools like vMotion and Storage vMotion are definitely changing the way we think about how and where we locate servers. But moving the servers instead of managing the traffic just because we can ain’t the way to go, IMO. Build the platforms for the services, build the infrastructure for the services, and then manage the traffic and access to the services. We have ADCs, let’s use them and manage the traffic and not micro-manage the service stations. And this quote:

“…solve for least switch hops per flow”

I think there’s a whole ‘nother blog post on how this idea becomes moot with something like DVS across multiple physical platforms. But I’ll take a break for now. :)

VMware View Offline Desktops: The Birth of the VDI Bomb?

December 04, 2008 By: Alan Category: data center 1 Comment →

About two years ago, I was part of the team that helped design and build a technology showcase lab – a fully functioning showcase lab housing  all kinds of data center products and infrastructure and application technologies (think VMware, Microsoft, Dell, IBM, SAP, etc).  My role (besides cable monkey during install week ;) was to define and isolate the security risks associated with guest virtual machines and segmentation (both physical and logical) of the virtual network and working spaces.  Probably one of the coolest and most fun projects I’ve ever been involved in.  bomb

One of the areas I was most concerned about from the get-go was the portability and availability of guest VMs; users can bring in their own virtual machines to provide back-end services during their testing time at the DC and those will be run on shared virtual platforms. Now while all of these users are probably trustworthy people, we paranoid types know that trust in the security realm is a tough thing to come by, so we needed to create a fully-functional yet safe environment for these VMs to run — safe for them so there’s minimal sharing of resources with other users that may be visiting, and safe for us since it’s our infrastructure. From day one I’ve been concerned about the check-in/check-out possibilities with virtual machines; it equates to building a secure box that’s ready to be racked, and just before you rack it you let a stranger take it home for a week, then that stranger brings it back, you rack, cable, boot, and let it run loose.  Scary, huh. 

We solved that problem in the showcase lab by using extremely tight networking configurations (walled gardens are your friends) and hardware isolation for our users and their specific virtual machine environments, with good management on top to make sure everything functions as expected.  To date it’s been an excellent solution.

Fast forward two years later and meet VMware View: VMware’s VDI management solution that, in theory, should lead us down a path of near complete mobile desktop computing which, as a side benefit, should drastically improve desktop security.  With one huge exception: Offline Desktops. Still an experimental feature today, Offline Desktops allow an end-user to check out their VDI image (which is a complete VM: OS, apps, everything) and take it with them. Let’s say Alice (or maybe Mallory is a better example? ;) is in the office on Tuesday morning running her desktop remotely over the LAN, she checks out her desktop and hops on a plane, works on the flight, and then jumps on the VPN when she’s in her hotel and check her desktop back in, changes and all. Rinse. Repeat. Cool idea with huge benefits, but…

The paranoid me writing this blog immediately started locking doors, lowering blinds, checking the lamps for bugs, etc. I see this as a huge security risk: once the desktop is out of the DC and out of IT’s control, it’s immediately suspect. Just like one of the primary use cases for NAC where external, unknown laptops aren’t allowed to jack directly into the corporate network without auth and some level of validation and sanity check, any VDI image checked out via View and then checked back in should be treated as a new, external device, requiring quarantine, inspection, sanitization, re-authentication, the whole ball of wax. The risks are even more severe for VMs that are allowed to leave and re-enter the data center due to shared resources that run those VMs. The risk is no longer limited to a segmented network; it now extends to the entire VDC platform.

Could this simple, highly usable and beneficial feature open the door for VDI Bombs? A Frankensteinian marriage of trojan bombs (plant today, blow up on delivery or at a later time) delivered via VMs and target the host hypervisor, network, or CPU? Are IT departments going to build sophisticated quarantine environments for VDI VMs that are checked back in, and if so, will those tools be available soon enough to catch these bombs before time runs out? Or maybe this is a perfect opportunity to see VMSafe used as intended. Seems like the check-in/sync operation would be the perfect time to scan the guest kernel, filesystem, VM tools, etc, via VMsafe. 

I think a completely mobile virtual desktop is a great idea, even if it just pacifies us until true application virtualization becomes a reality, especially for IT client management.  But how much more management and security will be needed in the data center to make this a functional reality? Too much to justify the benefit? We’ll see. And while this type of concern has been around for a while, even when we were planning for somewhat mobile server VMs, VMware is the first company to make this easily accessible to end-users.  This scenario is applicable to any virtual machine architecture where VMs can come and go in and out of the data center. This is not restricted to VMware.

A lot to think about as VMs come full circle from end-user desktops with Workstation through the data center with ESX and now back to end-users with View, and hopefully a good chunk of that thinking involves security planning for portable VDI images. I already have ideas for bumper stickers: “Stop VDI Bombs!” :)