View blog reactions

The Virtual Data Center

A Virtual Team Blog about the VDC and How To Get There
Subscribe

Archive for the ‘storage’

Application Virtualization: The Client Point of View

February 06, 2009 By: Alan Category: apple, cloud, data center, desktop, storage, systems, vmware, wax poetic No Comments →

I’ve hinted in the past on my ultimate application virtualization scenario – where I want the market to be for deploying and supporting remote applications for clients in the future. I’m still working on that giant whiteboard architecture map in my basement on what AppVirt looks like from the DC computing side, but today I want to write about the client side of that architecture. And while brevity has eluded me in all parts of my lingual life recently, I’m going to try to be succinct here (expect failure).

I attended the first of a four-part series on Cloud Computing last night held by the WTIA, which included an excellent presentation by Aaron Kimball from Cloudera on the basics of the cloud from the data point of view. Having retired the engineer title for marketing a few years ago, it always makes me happy to see someone who spends their career designing complex systems stand up and give an intro presentation that also includes the business benefit. So often engineers address the How rather than the What and the Why, and Aaron did an excellent job with the latter.

His presentation, along with others last night, got me thinking about what application virtualization in the cloud would look like to the client (and I’m not talking about GMail here).  Let’s look at a real example:

I bought a Mac a few months ago primarily to run Lightroom, so I spec’d out the Mac to go high-end because it would be running a very beefy photo application (along with Photoshop in the future I’m sure). The machine also had to run VMware Fusion in parallel (no pun intended; sorry to my Parallel friends) - I have a photo stitching application that’s currently Windows-only. Standard operation keeps me stable at 75% RAM and 40% CPU on average.

But what if I didn’t need to buy local computing resources and everything was processed remotely? Let’s jump ahead 10 years (a big leap, I know) and look at how this could be different if client apps were in the cloud.

I buy a local processing machine that’s drastically stripped down from my current Mac. I boot this machine to a web browser, where I head over to Adobe and say “This is Alan; I need to run Lightroom.” Adobe says “No problem. Let me push down the secure Lightroom App Shell. Ok, now you’re ready. Here’s a list of your albums pulled from Amazon S3.” I say “I need to process the latest batch of Mt. Baker HDR images.”  HDR images take a substantial amount of computing power to process, so Adobe comes back and says “No problem. I’m going to need 2gig RAM and a dedicated CPU core for this, but your monthly subscription only covers 1gig and .5 core. I’ll charge you $0.021/minute if you’d like to burst.”  I say “Great, let’s do it.”

Amazon then pulls its own resources from AWS and start distributing my HDR processing over thousands of machines/cores/RAM, all controlled from my local Lightroom App Shell. To me it’s displayed as one 8ghz core (remember, 10 years from now:) ), but in practice that value is an aggregate of distributed resources pooled from thousands of machines.I process my HDRs, they’re stored back in my Lightroom library which is managed by Adobe.com, I take my tiny laptop over to the client’s office, rinse, repeat.

But what if I don’t want the processing done exclusively in the cloud? I mean after all the above scenario is very similar to what I can do today with a web browser. What if instead my local computing platform is moderately beefy and I process everything locally. But as my machine starts to bog down from stitching together HDR panoramas it uses the Lightroom App Shell to request raw computing resources from AWS, and I become part of the distributed cluster? I scale and process background tasks remotely and only use my local resources for rendering the images to my display. My machine is part of the cluster and the line between local and remote processing becomes blurred. That’s some cool client-based application virtualization. The application running state is spread across elastic resources, of which my local resources are a part of.

True application virtualization will be a huge undertaking and this is simply one part and one idea. But why not think big? Go for the gusto I say. Oh, and there’s more here, but I’ve long since crossed the brevity line. Maybe more later.

Moving VMs to Find The Traffic: Constant “Road Work(load) Ahead”?

December 17, 2008 By: Alan Category: data center, management, storage, virtualization, vmware No Comments →

I just read Andreas’ post about “virtual routing” and moving VMs to match and optimize traffic flows (yes, it’s a few days old, but catching up on email and blogs takes time after vacation) and I’ll have to say my head started spinning – and not the good kind of spinning. The basic suggestion is that because we can move VMs a39_p1around as needed we should move them constantly to match traffic patterns and destinations on the network. As traffic patterns and routes change, we move VMs to follow the routes Really? You want to, in essence, move data centers to follow traffic patterns over the same network that’s supporting said traffic, constantly?

Let’s look at this from another perspective: imagine you own a local chain of gas stations. These gas stations provide services for users (gas, food, coffee, etc) as they travel along various highways. They come to you when they need something, said service, otherwise they drive right past you. Your gas stations are located along major interstates at various exits as well as smaller highways and byways throughout the state. Your business accelerates during high-peak travel times (rush hour, weekends, vacation periods) and drops off during slow times (2:00 AM, Christmas, etc). In this slower economy, you’re looking for ways to maximize your revenue and you’ve noticed an interesting trend: drivers are staying closer to home on the weekends and shopping local, avoiding long travel trips in order to save money. You typically don’t have gas stations on local 2-lane roads and you can’t afford to open more stations, so you’re faced with two options:

  1. Move your gas stations from the highways to local 2-lane roads: Sure, it’s an option, but you would be sacrificing valuable peek highway revenue.
  2. Drive traffic from the towns to the highway and to your gas stations: Using signs and promotions such as “Free Coffee with Fill-Up!” - “2 Hot Dog Burritos for $1 with Large Coke!” - “5¢ Gas Discount for Local Residents!” - you can somewhat control how people find your gas station. You can control traffic flow to your service station.

What Andreas (along with Doug) is suggesting is a morphing of #1: Moving the gas station 2 times/day to accommodate traffic patterns and ignoring how and why traffic is driven to your station. Gas station is on exit #421 during morning and evening rush hour and then is moved to the intersection of Maple & Elm during off-peak hours. There’s no care or concern about the traffic patterns, the station is just following the flow.

This doesn’t resonate with me on many levels. First off, the basic principle of moving a server to match the traffic rather than managing traffic to the server. Traffic is smaller and easier to manage than servers, even with virtual servers. Managing constant VM movement with something like vCenter, clusters, resource pools, etc, is a surmountable task. Second, the process of moving the VMs themselves will have a cost wrt management, downtime, bandwidth (in the event Storage vMotion is used to follow the traffic). As an example, a quote from the post:

“VMotioning nodes (servers) to optimize the flow of traffic on the network”

Hmmm…How will moving a server optimize the traffic flow to that server? Again back to our gas station example: moving the gas station won’t change the traffic patterns at all. And are we talking LAN or WAN? If we’re talking LAN, then there should be traffic management devices in place already to manage optimized access to the apps running on the VMs. If you’re talking WAN, moving VMs cross-WAN on the fly has its own set of issues with traffic and storage.

Your goal when pulling into a gas station is to request a service: gas, coffee, food.  Moving the gas station just addresses cars driving into the parking lot. As always, we should be focused on the service being delivered in addition to the service station.

Don’t get me wrong: the mobility is one of the coolest benefits of VMs, and tools like vMotion and Storage vMotion are definitely changing the way we think about how and where we locate servers. But moving the servers instead of managing the traffic just because we can ain’t the way to go, IMO. Build the platforms for the services, build the infrastructure for the services, and then manage the traffic and access to the services. We have ADCs, let’s use them and manage the traffic and not micro-manage the service stations. And this quote:

“…solve for least switch hops per flow”

I think there’s a whole ‘nother blog post on how this idea becomes moot with something like DVS across multiple physical platforms. But I’ll take a break for now. :)

Provisioning For Election Application Traffic: Physical or Virtual, Old or New?

November 05, 2008 By: Alan Category: cloud, data center, management, storage, systems, virtualization No Comments →

I just read Rich Miller’s excellent blog post on sites scaling up for election traffic on Data Center Knowledge. As he points out in a post this morning, traffic hit record levels through Akamai’s CDN on election night. Some companies adequately planned for the burst, other didn’t. Spike management isn’t something new, however we do deal with massively larger amounts of traffic than we have in the past, and our traffic usage is different. An election that everyone is watching is an excellent case study for these new traffic patterns. Me, I was sitting in front of the TV last night with my laptop open to MSNBC and twitter, CNN mobile on the iPhone (primarily b/c I enjoyed seeing all those 404 and 500 errors that were showing up on CNN mobile; I know, I’m evil :) ). And I’m guessing this was the norm for people who use the Internet as their primary news source, like me. And the company responses that Rich covers, that did plan for the election spike and anticipated this flood of traffic, are interesting to me on two fronts:

  1. The lack of the V-word: Surprisingly in this day and age, none of the companies interviewed said they were relying on any virtualization solutions to scale for their traffic. All the remedies involved physical servers and physical space in a data center or with a hosting company. But with all the hype (and b/c it’s all I think about all day), I expected to see something about VMs or virtual storage as part of their spike management plans. On one hand this is encouraging that yes, the world can still spin without VMware or Microsoft virtual platforms. On other other, though, the election should serve as a perfect use case for provisioning and scaling using tools like virtual machines. This election is the best example I can imagine for “elastic computing,” and I’m surprised that it wasn’t first in responses from these companies. The ability to provision up and de-provision down as need based on real-time, immediate traffic needs is the long term bread-and-butter for virtual platforms; companies like BlueLock and Joyent know this today and have built virtual hosting solutions around provisioning scale for both infrastructure and the applications. So why not use the virtual tools available today as part of your scaling and provisioning needs, rather than having to plan for a spike by pre-ordering batches of servers and waiting weeks for them to go online?
  2. Focus on the Apps: I have to say it warms my heart anytime someone mentions applications in the data center — I’m a softie for those darned apps! :) All of the examples in his post were customers who were expecting an increased need for their application: a political blog, a CDN that hosts political websites, Twitter, etc. Their concern isn’t with scaling core infrastructure (switches, routers, cables, trunks, etc), it’s with scaling the application platforms (servers, OS’, webservers, etc); again, a pointer to those hosting providers who have already built out virtual infrastructure platforms to allow VM and application scale and provisioning as needed.

The phrase “old school” kept popping up in my head as I was reading the post. Are these companies sticking with what works, what’s tried and true, by provisioning physical servers well in advance of the expected spike? Or does this show that virtual platforms are still in their infancy and companies that know how to plan for and manage massive amounts of application data traffic don’t yet trust virtual solutions? I would probably lean somewhere in the middle, and until virtual platforms and dynamic provisioning proves itself, we’ll continue to see dynamic provisioning in the VDC as more of a test case rather than a real-world use case.

VMware Buys Blue Lane - But For VM Security or VM Patching…or Both?

October 10, 2008 By: Alan Category: data center, management, security, storage, systems, virtualization, vmware No Comments →

As was reported by virtualization.info, VMware announced it’s acquisition today of Blue Lane. Blue Lane has some cool stuff that falls into two categories: Virtual Machine security (VirtualShield) and server security & patch management (ServerShield), virtual or physical. There’s been a good bit of chatter today on how this acquisition is going to play with VMsafe, and I think there’s definitely some obvious overlap between what Blue Lane is doing with VirtualShield and what VMware wants people to do with VMsafe. I actually assumed that this was the path Blue Lane was headed down; they already support inter-VM network traffic protection, but they don’t currently do it on the hypervisor level. It was the next obvious step to port VirtualSafe directly to ESX with VMsafe.

However, I’m more intrigued by this purchase from the in-line server patching stuff with ServerShield. ServerShield is basically an application proxy that sits in front of an app server and inspects all traffic destined for a particular application running on a particular host. From what I understand, though, it doesn’t inspect that traffic in the same way an I[D|P]S does; it doesn’t use signatures looking for attack patterns, it’s looking for patterns that match exploits that have been remedied with application and/or server OS patches. It’s like the flip-side of the coin: IPS looks for the attack in the payload, ServerShield looks for a way to attack that’s been identified by a patch that’s already been released. So they take a new patch, ask “What does this patch do and what does it change?” and they look for pattern data in the application flow that matches that delta behavior. At least that’s my understanding from talking to them about a year ago. :)

So this has some really cool possibilities for VMsafe and using the virtual network to protect against both app and OS exploits, but it also sounds really cool for VMware’s VDDK (announced earlier this year). Just off the top of my head I can see the ServerShield management component in the VirtualCenter GUI, ServerShield itself inspecting all traffic on the virtual switch at the hypervisor level, and then throwing an event when it detects a payload that’s targeted at a known exploit destined for a VM. It:

  1. Corrects that traffic so it’s no longer a threat
  2. Throws an event to VC that there are machines that would have handled that traffic that aren’t patched correctly
  3. VC starts pulling machines out of clusters, mounts the VMDKs with VDDK, patches are applied off-line by SecureShield
  4. Freshly patched VM is powered back up, returned to the cluster, and on to the next one until that particular problem is corrected across the board. VC could then keep a real-time patch level list of every VM, and as new traffic came through, it could tell SecureShield “Hey, these guys are current, so you can opt out of inspecting if you wish.” Yes, I know security heads are exploding all over the place, but I’m just talking technical ability rather than how a security policy should be managed, etc.

It’s like a mash-up of dynamic provisioning, dynamic security, and dynamic patching for both the OS and app. Gets me all tingly! :)

Storm in The Storage Cloud…And It Flooded My Office

July 22, 2008 By: Alan Category: data center, management, storage, systems No Comments →

For some strange reason I choose to work even when I’m not working and have what some could call two jobs (well, one real job and another job that supports itself, anyway). My day job is what you see here: helping to change the way people think about and implement virtualization in their data center. My moonlit weekend job that doesn’t quite pay any bills (yet) is professional photographer. To date, these two worlds haven’t had any relation or overlap at all (although I did take the main picture you see in the blog header, which is a shot of freshly installed data center racks, so maybe that counts). Last night, however, my separate professional lives collided in a storm I hadn’t witnessed before, and I felt rouge waves on both sides.

As has been widely reported, Amazon’s S3 service was down for a good while on Sunday, July 20th. I don’t personally or directly use their service (although I do know of individuals who are looking into it as a safe and secure backup system), however I do use SmugMug as my back-end photo “store” and processing lab for the pro photog business and (as I learned on Monday) SmugMug uses S3 for all of my valuable and (hopefully someday) bill-paying photography. I have my own local backup systems that I manage (more on that some other time) and I don’t rely on SmugMug as my content storage house, but I do rely on them to make my photography available for purchase (always available, always fast, and always securely). But I don’t want to know what they use in their data center or how they manage and store my content; I only want to know that my content is safe and available. And all was good in the fields until Sunday evening when S3 went down, and took SmugMug (and all of the pro photographers they support) down with it (details available here).

So on Monday morning I began looking into the S3 outage for the Day Job and just happened to see that my Night Job was impacted by the outage, and that got my head all spinning. It got me spinning primarily because this is the 2nd outage that S3 has suffered in the past few months, and that’s big business for a lot of people beyond SmugMug. For most normal enterprise IT shops that kept their storage in-house, a critical outage and unavailability of dynamic data twice in such a short amount of time would cause the higher-ups to start asking questions about what, why, who, and how to make sure this never happens again. I imagine those types of questions are happening for large-scale S3 customers, like SmugMug, all around the globe.

The other reason I got so spun up was the response, or lack-there-of, from Amazon. As far as I can tell, the first reports came into their public forum from customers in droves reporting a “Service Unavailable” error message. Shouldn’t Amazon have known before the customers, and shouldn’t they have done a better job (beyond posting a green/yellow/red dot on a service page) notifying all their customers? Does SmugMug really want to find out about a storage outage when they try to retrieve my galleries for perspective customer, or would they prefer to know before hand so they don’t let their app spin indefinitely? Or here’s a novel idea: Perhaps Amazon should architect their storage service in an HA/DR manner so that a customer never sees a “Service Unavailable” message, or more importantly so that their service never goes down beyond a simple blip while service requests are redirected. Highly available data centers ain’t rocket science, and since Amazon is building VDCs like nobody’s business, perhaps they should already know this…

I don’t want to be too short or critical here, but f anything, Amazon is blazing a trail in the Clouds on how not to build a production-class cloud service. The core requirement for offering a cloud service has go to be availability above everything else. Otherwise there’s no reason for a customer to trust the service with their mission critical data. My Night Job customer persona is hoping that SmugMug is really sticking it to Amazon for taking them down (and at the same time making sure all their own eggs don’t fall off the tree when the S3 nest crashes again).

I think I’m going to write Amazon’s regular storefront customer service and ask for a credit in their MP3 download store to compensate for all the money I lost by not being able to sell my photographs while S3 was down. Think they’ll go for it? ;)

Is There Really a Need or Market for OVF? Do the Apps Care?

June 17, 2008 By: Alan Category: data center, microsoft, storage, systems, virtualization, vmware No Comments →

Once my brain starts spinning around one particular topic, it basically stays there until I’ve reached some sort of mental closure. Now that closure may be achieved when I’ve reached a personal conclusion, or it may come when I throw my arms up and say “I’m out!” Either way, I need to keep processing something until I’ve reached one of those points. This week, it’s the overlay between VMs, VMDK/VHDs, and OVF, which I started a few days ago with this post. So here I am again, and now I’m wondering if there’s even a point to OVF.

As reported at Server Virtualization, the DMTF is saying that OVF is still a few months away from a standard. Now a few months may not seem like a long time, but there are going to be some big movements between now and then, depending on which projects release on time and which are delayed, most importantly we should see Hyper-V moving out of beta. Chris Wolf has some interesting comments on that post and to be honest…I just don’t get all the fuss. Mounting VMs so any hypervisor can run an application? Telling the hypervisor what the packaged VM OS needs in order to optimize the running environment? It just seems like too many steps to get to the endgame. Here are two examples where I think OVF is just going to get in the way:

  • Converting VM Disk Images: Chris states that even with OVF (right now it’s just a packaging framework standard, not a runtime standard) an interim conversion step will most likely be required. So when I grab a pre-packaged VM appliance from VMware wrapped in OVF and decide I want to run that on Hyper-V, I’m going to have to extract it, do a full conversion (which amounts to running P2V, or V2V in this case), and then re-wrap it before I drop it on Hyper-V. Hypervisors are platforms, and every hypervisor is going to run VMs in a different manner. Running 2008 in Hyper-V probably won’t take as many hypervisor resources as running it on VMware simply because 2008 shares kernel code with Hyper-V. So my app on 2008 will require X resources for Hyper-V but Y resources for VMware. Then what’s the point in packaging that data with the app? Is OVF going to have an XML switch element that contains running information for every possible hypervisor scenario? If I’m that concerned with app performance, I’m going to build the VM and app natively and not trust two translation layers (the original hypervisor the VM was built for and the OVF management metadata to allocate resources for me). To me, this is pushing OS virtualization further away from production environments.
  • Lose the OS: OVF and virtual appliances deal with full-blown VMs; the OS, the disk image, and the running hypervisor. But we’re making such strides towards true application virtualization these days, I don’t see the need to focus on a solution that’s only concerned with bloated OS and disk images, pieces of the virtualization puzzle that only exist to run applications. I’d much rather see work being done on something like APS (Application Packaging Standard). Unlike VMs and VMDKs/VHDs, applications truly are portable. I’m looking forward to the day when we don’t need a full-blown OS in the data center, where we run apps directly on a hypervisor, where a packaging solution like APS can really be valuable. But even until then, something like APS has more value today because it’s “future proofing” our solutions for tomorrow. With VMs, both the OS and hypervisor have to become hardware resource managers. With true application virtualization, you only need the app hypervisor to manage your resources.

So why OVF? Why not let the DC admins worry about the hypervisor and OS installs? These are platform decisions, just like choosing HP vs. Dell. You don’t see Microsoft offering a pre-built 2003 image installed on a Dell with a conversion utility to run it on HP hardware (more on that in a few days as I start to drift into the problems with P2V…stay tuned) because that wouldn’t make any sense. OVF is the exact same thing: it’s a system to create a full-blown OS image and move it around the heterogeneous data center. But why? Every OS install is different, and it will continue to be that way until we get rid of the OS, even with major band-aids like OVF. Focus on the application and why you’re virtualizing in the first place. Right now, OVF appears to be an extra step we don’t need.

Moving Beyond VMDKs and VMFS: Symantec Veritas VM Storage Solution

June 10, 2008 By: Alan Category: data center, management, microsoft, storage, systems, virtualization 1 Comment →

I know, it’s been quiet around here lately. I’ve been heads down in research and haven’t had a lot of time to digest new ideas and pick up old ones (or respond to Hoff :) ). But the Symantec+Veritas+Xen announcement today gave me good reason to poke my head up, log in, and revisit an idea I’ve been working on for a while.

When I’m not noodling virtualization and data centers, I’m a semi-professional photographer; most of my evening/weekend free time in 2008 has been spent on building a solid digital workflow from shooting to selling. One of the technology choices I’ve implemented in the middle of my workflow is converting from my camera’s proprietary RAW format to the Adobe’s open Digital Negative file format, DNG. I made this decision because I don’t want to be stuck fighting with specific RAW format support down the road, and I can edit and process files natively in DNG using Adobe tools, which I use already. So you could say I’ve “Future Proofed” my workflow for tomorrow, even if I change cameras or processing software.

So the above started me thinking a few months ago about virtual machine filesystems and what’s going on under the hood. The whole model today seems silly to me: I have a VM guest that has filesystem, say NTFS; that filesystem is packaged in a proprietary flat-file format for the virtual hypervisor platform, VMDK in VMware’s case, and that flat file is stored on top of another filesystem (VMFS, again for VMware), which is vaguely connected to the host OS filesystem (let’s say ext3 for ESX), and then layered on top of yet another file management tool with iSCSI, only to finally be stored on a real disk on a SAN. So my ‘index.html’ file hosted on my guest IIS VM has to go through approximately 6 virtual<->physical layers before it’s physically stored on a device that can manage that block data, such as VMware’s DRS. That seems excessive and very inefficient.

So that brought up two questions:

  1. Why can’t we have a solution like DNG for VM filesystems that will allow me to take that flat-file and manage it as part of my virtual infrastructure on any platform? Granted we do have the OVF, but this is mostly a transport and packaging solution; it’s not a running solution. And yes, I know that disk formats are part of each hypervisor secret sauce, but that’s exactly what I’m suggesting: Let each vendor continue to refine their secret sauce (just like Nikon and Sony will continue to refine their particular flavors of RAW), but let me store and run that secret sauce in an open utility so I can simply click to push a VMDK from VMware to Hyper-V.
  2. Beyond the above, do we even need that extra secret sauce filesystem layer at all? Why can’t ESX write directly to a block device in my SAN over iSCSI without storing my guest filesystem in a flat package that’s stored on VMware’s proprietary VMFS file system, only to be pushed out over an iSCSI network? If we’re going through so much trouble to virtualize the OS anyway, why can’t we simply write a translator that takes the guest block read/write request and map that to a physical block on our remote SAN/NAS disk? Basically, let’s virtualize the guest filesystem. Think about the I/O we could save…may make those VMware storage benchmarks near moot. Which leads me to…

The Symantec announcement. If it’s true, it’s exactly what we need in the VM storage space and a no-brainer. Anything that removes middleman components while also adding manageability is a great thing. We remove moving parts, which in itself can remove complexity, and then obfuscate the management (or probably integrate it an existing management platform)…we move <this much> closer to a functional VDC. And since it’s Xen based and is purported to work with Hyper-V, this could also be a driver in customers choosing one hypervisor platform over another. If it delivers and specifically doesn’t work with VMware ESX, VMFS, Virtual Center, etc, then it could be end up being a platform driver for Xen and Hyper-V. We’ll have to wait until the end of the year to see if this solution delivers as promised.