Research Papers

HAVEN: Holistic load balancing and auto scaling in the cloud

Load balancing and auto scaling are important services in the cloud. Traditionally, load balancing is achieved through either hardware or software appliances. Hardware appliances perform well but have several drawbacks. They are fairly expensive and are typically bought for managing peaks even if average volumes are 10% of peak. Further, they lack flexibility in terms of adding custom load balancing algorithms. They also lack multi-tenancy support. To address these concerns, most public clouds have adopted software load balancers that typically also comprise an auto scaling service. However, software load balancers do not match the performance of hardware load balancers. In order to avoid a single point of failure, they also require complex clustering solutions which further drives their cost higher.

Continue reading »

Effective switch memory management in OpenFlow networks

OpenFlow networks require installation of flow rules in a limited capacity switch memory (Ternary Content Addressable Memory or TCAMs, in particular) from a logically centralized controller. A controller can manage the switch memory in an OpenFlow network through events that are generated by the switch at discrete time intervals. Recent studies have shown that data centers can have up to 10,000 network flows per second per server rack today. Increasing the TCAM size to accommodate these large number of flow rules is not a viable solution since TCAM is costly and power hungry. Current OpenFlow controllers handle this issue by installing flow rules with a default idle timeout after which the switch automatically evicts the rule from its TCAM. This results in inefficient usage of switch memory for short lived flows when the timeout is too high and in increased controller workload for frequent flows when the timeout is too low.

Continue reading »

NCP: Service replication in data centers through software defined networking

Enterprise systems often use replication technology to keep multiple instances of applications or data stores in synchronization. Existing data replication techniques are primarily storage based and are usually applied at coarse time granularity. Furthermore, they can not be used to achieve service replication that requires state synchronization in addition to disk synchronization.

Continue reading »

VMPatrol: Dynamic and automated QoS for virtual machine migrations

As more and more data centers embrace end host virtualization and virtual machine (VM) mobility becomes commonplace, we explore its implications on data center networks. Live VM migrations are considered expensive operations because of the additional network traffic they generate, which can impact the network performance of other applications in the network, and because of the downtime that applications running on a migrating VM may experience. Most virtualization vendors currently recommend a separate network for VM mobility. However, setting up an alternate network just for VM migrations can be extremely costly and thus presents a barrier to seamless VM mobility. Therefore, it is apparent that VM migrations should be orchestrated in a network-aware manner with appropriate QoS controls such that they do not degrade network performance of other flows in the network while still being allocated the bandwidth they require for successful completion within the specified time lines.

Continue reading »

Remedy: Network-aware steady state VM management for data centers

Steady state VM management in data centers should be network-aware so that VM migrations do not degrade network performance of other flows in the network, and if required, a VM migration can be intelligently orchestrated to decongest a network hotspot. Recent research in network-aware management of VMs has focused mainly on an optimal network-aware initial placement of VMs and has largely ignored steady state management.

Continue reading »

iTrack: Correlating user activity with system data

Human error has been identified one of the major factors behind system outages and network downtime in a number of previous research papers and surveys. Gartner statistics show that almost 40% of unplanned application downtime is caused due to operator errors such as unintentional changes to network configuration resulting in a network outage, patch installations, service restart, etc. Yet, system admin activities on production IT systems are rarely properly logged and monitored. Existing tools to track user activities either produce too much information without any hints of a potential outage scenario or too little information to be useful in a meaningful way.

Continue reading »

CrossRoads: Seamless VM mobility across data centers through software defined networking

Most enterprises today run their applications on virtual machines (VMs). VM mobility - both live and offline, can provide enormous flexibility and also bring down OPEX (Operational Expenditure) costs. However, both live and offline migration of VMs is still limited to within a local network because of the complexities associated with cross subnet live and offline migration. These complexities mainly arise from the hierarchical addressing used by various layer 3 routing protocols. For cross data center VM mobility, virtualization vendors require that the network configuration of the new data center where a VM migrates must be similar to that of the old data center. This severely restricts wide spread use of VM migration across data center networks. For offline migration, the above limitations can be overcome by reconfiguring IP addresses for the migrated VMs. However, even this effort is non-trivial and time consuming as these IP addresses are embedded in various configuration files inside these VMs. As enterprises grow and new data centers emerge in different geographic locations, there is a need to interconnect these data centers in a way that allows seamless VM mobility.

Continue reading »

Living on the edge: Monitoring network flows at the edge in cloud data centers

Scalable network wide flow monitoring has remained a distant dream because of the strain it puts on network router resources. Recent proposals have advocated the use of coordinated sampling or host based flow monitoring to enable a scalable network wide monitoring service. As most hosts in data centers get virtualized with the emergence of the cloud, the hypervisor on a virtualized host adds another network layer in the form of a vSwitch (virtual switch). The vSwitch now forms the new edge of the network.

Continue reading »

Identity: A data center network fabric to enable co-existence of identical addresses

Seamless virtual machine (VM) mobility within and across data centers brings its own set of problems. One of these problems is enabling co-existence of identical or overlapping layer-2 and layer-3 addresses in a single data center network. The motivation for this problem comes from a number of compelling scenarios. These include the need to backup and restore or replicate multi-tier applications that comprise of multiple VMs from one data center to another or within the same data center. This requires significant network reconfiguration costs as IP addresses of replicated VMs may clash with other existing IP addresses in the data center or with other replicas of the same VMs. Similarly, when multiple data centers need to be consolidated through a single data center interconnect, their address ranges may overlap. Lastly, cloud providers need to ensure that various customers can backup and restore their VMs which can have potentially conflicting addresses with other customers’ VMs without requiring time consuming network reconfiguration efforts.

Continue reading »
Top