July 30, 2008
Professors Indranil Gupta, Roy Campbell, and Michael Heath from the Computer Science Department at the University of Illinois at Urbana Champaign have formed a strong alliance with the National Science Foundation and industrial leaders Hewlett-Packard, Intel, and Yahoo! with the mission of building an innovative high-performance computing cluster known as the Cloud Computing Testbed (CCT).
The main goal of this research initiative is to explore the design, construction and application of large-scale data-intensive programming geared towards challenging research problems. The new cluster will have over a thousand cores with gigabytes of RAM per node connected to a hundreds of terabytes of local and shared storage. With this storage and computational power, it will be capable of crunching numbers and solving large-scale problems by means of novel distributed and parallel computing algorithms for efficient and scalable computing. The Apache Hadoop framework along with the Pig parallel programming language developed by Yahoo! Research, and other open-source software will provide a firm foundation for building a robust environment suitable for the development of first-rate parallel applications.
Using the distributed computing services afforded by this new system, researchers around the world will be able to build applications that are geared towards a better understanding and administration of the voluminous tomes of data we produce and store on a daily basis. Some compelling applications of this technology include:
- Cloud Resource Management, Sharing, Monitoring: The sheer scale of the system requires special provisions to ensure reliability, performance and efficiency. In this effort, we intend to investigate the use of virtualization technology to migrate nodes when necessary, conserve power by hibernating unused machines, enforce isolation policies for secure computations and transparently recover from failures.
- Cyber Tomography on Network Data: Today's cyber attacks are growing in sophistication at an alarming rate, while the capacity of our networks increases faster than traditional intrusion detection systems are able to scale. To protect network users, a new approach to intrusion detection is required. The CCT provides the perfect testbed to evaluate large-scale data mining techniques to identify, understand and develop effective defenses of sophisticated cyber attacks.
- Distributed Log Management: System logs provide a wealth of information the health of a system and that of the network as a whole. Unfortunately, these log files are frequently hard to parse, query and aggregate due to their sheer size in enterprise networks. The CCT provides a scalable solution to analyze the contents and reason about the health of the system as a whole.
- Others: Large Scale Semantic Analysis, Search Engines Yielding Navigation Maps, Crawling Online Social Networks, Multimedia Applications
Lessons learned from the process of analyzing, designing and building the CCT testbed will help researchers evaluate and improve the fundamental software and hardware systems that drive the cluster. Since each node consists of several processing cores connected gigabytes of memory, intrinsic architectural issues of the underlying hardware must be addressed to harness the full power of the computing hardware. On a larger scale, this initiative begins researching the different ways a user can transparently manage the computing resources of CCTs in order to maximize the collaboration among the nodes of the system to solve complex problems.
Unlike prior cluster systems, the service-oriented CCTs are driven by applications, rather than computing elements. Instead of using fixed a priori allocation strategies, the computing fabric of CCTs will be responsible for the allocation of resources for the applications. Virtual machine technology and focus on the development of application stacks will afford the system the flexibility to migrate, load balance and handle faults gracefully. Therefore, applications treat the computing resources as utilities rather than individual nodes. The dynamic nature of the system allows it to ingeniously administer and re-provision computing resources on demand, thus affording the cloud great versatility to serve the needs of its users.
NEWS ARTICLES IN POPULAR MEDIA
UIUC CS and Engineering Press Releases:
Other Press Releases:
RESEARCH GROUPS INVOLVED IN THIS EFFORT