Self-* Systems
CSE 598B
Fall 2005

Bhuvan Urgaonkar
Department of Computer Science and Engineering
The Pennsylvania State University

 

Schedule

Class meetings are Wednesdays 6-8.30 PM in IST 223B. Office hours are Thursdays 2-3 PM.

Course description

The ever-increasing complexity of modern computing and networking systems makes managing and maintaining them difficult tasks. Let us take the example of modern Internet applications such as Web-mail, online retail sales, online auctions, wikis, discussion boards, Web-logs etc. These applications are hosted on data centers. A data center is a facility used for housing a large amount of electronic equipment, typically computers and communications equipment. Two key features of Internet applications make the design of data centers challenging. First, modern Internet applications are extremely complex. Existing resource management solutions rely on simple abstractions of these applications and therefore fail to accurately capture this complexity. Second, these applications exhibit highly dynamic workloads with multi-time-scale variations. Managing the resources in a data center to realize the often opposing goals of meeting application performance requirements and achieving high resource utilization is therefore difficult.

Administrators of these systems typically use simple "rules-of-thumb" to manage and maintain them. However, such manual management is often unfeasibly complex and error-prone in these systems. Therefore, it is desirable for these systems to have properties that allow them to manage themselves with minimal or no human intervention. In this course, we will study three such properties: the abilities to self-tune, self-heal, and self-stabilize.

Self-tuning systems: Such systems have the ability to adapt their behavior to dynamically changing conditions. A simple example is the congestion control mechanism used by TCP. How can we argue about how good a particular self-tuning mechanism is? Can we identify common principles underlying different kinds of self-tuning systems?

Self-healing systems: These are systems that have the ability to recover from one or more kinds of failures. An example system where this property is desirable is a wireless sensor network consisting of numerous cheap devices. Such devices are typically energy-constrained resulting in a finite life-time. Furthermore, since these devices are cheap, their measurements may often be inaccurate. How do such systems cope with such failures?

Self-stabilizing systems: The idea of self-stabilization in distributed computing first appeared in a classic paper by E.W. Dijkstra in 1974. In this short paper in Communications of ACM, he proposed the idea of stabilization of a distributed system from some illegitimate global state to some legitimate state. The idea was that the system should be able to converge to a legitimate state within a bounded amount of time by itself without any outside intervention. We will study classic papers in this area and explore the applicability of this work to systems that interest us.

Grading

Paper presentations: 30%
Class participation and discussion: 15%
Paper evaluations due before class: 15%
Semester-long project: 30%
Final Exam (take home): 10%

Note: Students may form groups of 2 for the project. A few weeks into the semester, each group will make a presentation about the problem statement of its project and its plan for the rest of the semester. There will be final presentations at the end of the semester to report the progress made on the projects. A project report will be due at the end of the semester. Students are encouraged to meet me for project ideas. I may consider replacing the project with a survey report for students who have heavy course/research loads during the semester

Reading list and presentation schedule.

Required components of a paper review (text or pdf, due the midnight before class).

A PowerPoint template for presentations.