Instrumentation, Measurement, and Control for the Cloud (IMC2)

EPSRC-funded Project (EP/L005255/1)

Project Summary

The Internet landscape is changing rapidly, from a completely decentralised paradigm where distinct services were offered by different providers in a fully distributed and decentralised way, to a unified ICT environment where data, storage, and processing resources are co-located in the Cloud, and offered alongside connectivity. Although Cloud services and the underlying communication infrastructures are built on top of commodity Internet mechanisms (transport protocols, IP switching, multipath routing, etc.), it becomes apparent that the performance-agnostic and slow-converging operational assumptions of today’s data communications are challenged by the new unified technological and business model. Massive overprovisioning of fully distributed resources that are managed in distinct and often long timescales (e.g., traffic aggregates over backbone networks) is not sustainable in an environment where connectivity and system resources need to be managed by a single unified ICT provider over a centralised infrastructure and in very short timescales. Cloud providers need to maximise return-on-investment from their infrastructures through rapid provisioning and elastic resource management, offering predictable services while operating at higher utilisation thresholds.

 

In order to achieve these goals, in this project we will design and develop an always-on Instrumentation, Measurement, and Control (IMC) framework that will dynamically and adaptively provision unified resources in a unified manner and in short timescales. Evidence has shown that distinct control loops typically employed to manage different resources in different timescales can themselves constitute factors of performance degradation over unified Cloud environments. For example, network-agnostic placement and migration of virtual machines can itself cause congestion in the underlying Data Centre topology. We will therefore revisit the one-dimensional, static or pseudo-random control loops that are typically employed over Cloud topologies, and develop an adaptive closed-loop system that will manage both server and network resources synergistically, in short timescales and based on temporal topology-wide performance. In doing so, we will exploit often controversial concepts such as non-shortest path routing for increasing load balancing while meeting flow completion deadlines, and network-aware dynamic virtual machine migration, to demonstrate the feasibility and also the benefits of combinatorial resource provisioning in achieving global performance optimisation and in increasing the usable capacity of future networks and services. One of the key aims of the proposed research is to investigate and to demonstrate the applicability of measurement-based processes to control and to admit resources in a unified manner and at appropriate, short timescales. Through the necessary system and network node instrumentation, we will devise a logically-centralised measurement and control closed-loop architecture that will be an integral part of the underlying infrastructure’s data forwarding operation. The long-term impact of such endeavour will be to revisit the currently disjoint data and control planes in packet communications, and to transform next generation networked infrastructures from performance-agnostic to adaptive and self-managed, through synergy across the different layers and planes of the architecture.

 

The proposed research will be carried out at the University of Glasgow, and experiments will be conducted over a purpose-built programmable Cloud services testbed infrastructure, partly supported by EPSRC’s first grant scheme and partly through a generous contribution from the host institution. The research will be conducted in close collaboration with Onyx Group, Microsoft Research and JANET(UK).

For more information, contact Dimitrios Pezaros.