|
NSF Grants
Project Title : I3C: An Infrastructure for Innovation in Information Computing
Funding Agency : National Science Foundation
The CISE Research Infrastructure grant will be used for
setting up a 128-node (64+64) Linux cluster, a 32-node Sun cluster,
a storage area network (SAN) and a wireless infrastructure for supporting
coordinated research of 25 investigators.
The multidisciplinary research that spans across many engineering and
science disciplines focuses on three core areas that are essential to advance
the state-of-the-art in cluster computing. The first area is applications,
which is the main driving force of the infrastructure.
Here our strength lies in three specific application domains; Computational Science,
Digital Immortality, and Bioinformatics.
In addition, the clusters will be used extensively for simulation of various complex systems.
The second area of research is System Software,
which examines how various cluster resources can be effectively
used for improving the delivered performance.
Finally, we investigate many low-level Architectural issues
that are essential to provide high and assured performance. The research is proposed
in a cohesive manner explaining the interaction among the three main areas.
Project Title : An Integrated Approach for Quality of Service in Cluster
Networks
Funding Agency : National Science Foundation
With the increasing use of cluster systems for a variety of
interactive applications, predictable communication performance or Quality
of Service (QoS), has become a major concern. The motivation of this
research was to design a cluster communication infrastructure to support
different applications with varying QoS requirements. In particular, the
research focused on using the wormhole switching paradigm, which has been
used in designing many commercial routers such as the Marcum's Myrinet and
IBM SP2, for providing QoS guarantees. The research activities include (1)
wormhole router architecture design and evaluation to provide predictable
and high performance; (2) network interface and software messaging layer
support to inject/eject traffic into/from the network as per the QoS
requirements; and (3) CPU scheduling mechanisms that propagate these
capabilities up to the application level.
Two main contributions of this research are the following: We have shown how
the commercially successful wormhole routers can be extended with minimal
hardware modifications to support QoS in clusters. The proposed router,
called MediaWorm, includes two modifications to provide predictable
performance. First, the virtual channels (VCs), which were originally
proposed for improving performance by time-multiplexing different
packets/flits on the same physical channel, were statically allocated to
different traffic classes. Second, the traditional FCFC or round robin (RR)
scheduling was replaced by the VirtualClock scheduling to provide rate
proportional bandwidth to different traffic classes. It was shown that a
cluster designed with the MediaWorm routers can provide soft guarantees to
MPEG II media streams in the presence of both best-effort and media traffic.
Next, the design was extended for allocating the VCs dynamically to
different classes of traffic.
Second, for end-to-end QoS assurance, we have designed a QoS-capable NIC
based on the virtual interface architecture (VIA) design paradigm. The
design involves three modifications to the original VIA: (i) Inclusion of a
prioritized doorbell scheme for informing the NIC the arrival of different
traffic classes; (ii) Partitioning of the NIC buffer to a number of VCs
compatible with the router design; and (iii) Providing a rate proportional
scheduling of the VCs to inject flits into the network. Co-evaluation of the
QoS-capable routers and QoS-capable NICs revealed that QoS provisioning in
the NIC is more critical than that in the router/network. We believe that
our NIC study is the first effort in highlighting the importance of network
interface design for QoS support.
Project Title : Scalable and Efficient Scheduling Techniques for Clusters
Funding Agency : National Science Foundation
The main motivation of this research is to design scalable
and efficient scheduling algorithms for clusters. The proposed research
addresses three closely intertwined issues for developing such algorithms.
First, an in-depth evaluation of the existing communication-induced
scheduling schemes will be done using real workloads to investigate their
performance, implementation complexity, scalability, and fairness
properties. With a better understanding of the strength and weakness of
these policies, Second, since these scheduling algorithms rely on a
low-latency, user-level communication mechanism, various design issues in
implementing these algorithms will be explored. The last component of the
research will examine how the scheduling algorithms can be tailored to
facilitate predictable performance in clusters. The research is being
conducted in collaboration with the Penn State's Center for Academic
Computing (CAC) group and the Lawrence Livermore National Laboratory (LLNL).
We have developed a generic framework for implementing all prior
coscheduling on a Linux cluster with minimal overhead. The framework has
been implemented on a 16-node Linux cluster connected through Myrinet. Three
prior coscheduling techniques (DCS, SB and PB) have been implemented and
analyzed using this framework. We have proposed two new coscheduling
algorithms, called Co-ordinated Coscheduling (CC) and Hybrid coscheduling,
which can outperform not only all prior coscheduling techniques but also the
traditional batch scheduling (PBS). In particular, we believe that the
proposed Hybrid scheme has the potential to replace the currently used batch
scheduling techniques in clusters. We are currently evaluating all the
coscheduling techniques on large clusters with LLNL and CAC workloads to
study the scalability issue.
Project Title : QoS Provisioning in InfiniBand Architecture (IBA) for
System Area Networks
Funding Agency : National Science Foundation
The InfiniBandTM Architecture (IBA) is envisioned
to be the default communication infrastructure for future System Area
Networks (SANs) or clusters. The InfiniBand Trade Association (IBTA) has
released the first IBA specification, and is currently augmenting it with
enhanced features such as congestion management, Quality of Service (QoS)
and router management. However, the IBA design is currently in its infancy
since the released specification outlines only the high level
functionalities, leaving it open for the research and industrial community
to explore various design alternatives. In particular, QoS support in IBA is
targeted as a critical issue to support many server applications with
real-time constraints. However, design of IBA for QoS is an unexplored area
of research and is the main focus of this investigation.
The main objective of this ongoing research is to explore various design
alternatives for providing high and predictable performance in IBA-style
SANs. It covers the design of an IBA fabric, design of congestion avoidance
techniques, developing multicasting algorithms, and developing
fault-tolerant techniques. The research is aimed at developing and releasing
a complete IBA simulator platform for various types of research on IBA.
We have proposed four different techniques for improving the performance of
IBA-style SANs. These include using the Internet compliant Shortest Path
First (SPF) routing algorithm in stated of the UP/DOWN routing, a
fault-tolerant version of the SPF routing to provide automatic path
migration (APM) in the presence of faults, a packet dropping mechanism to
avoid network congestion, and efficient hardware implementation of
multicasting. These are developed based on the IBA specification. We have
developed an IBA simulator and is currently augmenting the simulator with
additional features. We have also developed an energy model and an energy
optimization technique, called dynamic link shutdown (DLS), for the cluster
network. It is shown that the DLS scheme can provide better
performance-energy tradeoffs compared to the known dynamic voltage scaling
(DVS) scheme. Currently, we are extending our IBA design for better
performance and energy saving. |