Geeks invade LISA'08

Last week I was thrilled to attend the Large Installation Systems Administration (LISA) 08 Conference put on by Usenix in San Diego, California. I was joined by two of our system administrators, Ted and Shaw. This was my first year attending LISA, and based on the wide range of workshop topics, I knew we were going to be treated to a very exciting week.
The atmosphere was electric as we met some of the technical world’s top celebrities and learned about the latest advancements in the industry. In some cases, LISA was the official unveiling of state-of-the-art new tech tools. A perfect example of this was a paper presented by a group from the IBM Almaden Research Center on a new Large-Window Compression tool geared toward the efficient storage of Virtual Machines (VM).
IZO Compression
This tool, called IZO, will work alongside traditional compression tools we already use such as gzip and bzip2. IZO efficiently compresses VMs by chunking the data. Then it compares the chunks with a hash function subsequently indexing and removing duplicate chunks. This is akin to data deduplication. Once this process is complete, the data footprint is significantly smaller and can be passed through gzip, for example, which does the traditional Small-Window Compression. The result is a dramatically smaller final footprint for the original data. As a bonus, this method is often faster than using gzip alone. I won’t go into too much detail in this post, but you can read more about IZO and how it works as well as comparisons to existing Large-Window Compression tools such as rzip or lrzip in the full paper, posted here: IZO: Applications of Large-Window Compression to Virtual Machine Management.
For the simplified summary, you can think about this compression method in terms of packing a suitcase: if you were, for example, to take all your clothes and simply stuff them into your suitcase, you would pack in as much as you could, then maybe sit on your suitcase to try to compress it enough to get the zipper closed. However, if you first folded your clothes neatly, then placed them in your suitcase and only then closed the lid, sat on it to compress and zipped it up, you would be able to get a lot more clothes packed into the same space. Using IZO on your files is essentially like folding them neatly before trying to pack them into a gzip archive (only much faster and cooler than folding shirts).

High Performance Computing
High Performance Computing (HPC) was another topic featured at LISA. The advances and challenges in HPC today are a good indication of what is on the industry’s horizon. I start feeling nostalgic thinking back to 1999/2000 when I was helping build out 100+ node Beowulf clusters as a more economical alternative to the Silicon Graphics Origin 3000 series (which ran on the MIPS processors and distributed shared memory architecture). This was a time before multi-core technology even existed. Today, these systems are dwarfed by several orders of magnitude.
Argonne National Laboratory are currently running one of the largest HPC clusters in the world. Their jaw-dropping 40,960 node cluster, housed in only 40 racks, is based on IBM’s Blue Gene/P system. They presented a paper outlining the experiences and challenges of running such a massive system: Petascale System Management Experiences. The days of CPU bottlenecks are over and the era of true cloud computing is fast approaching. At OpenSRS, we are already seeing and assessing these trends. A number of components in our architecture use the concepts of clustered computing and can be organically expanded and contracted to fit our needs.
The idea of virtualized systems has been around for a while, but has always been tightly tied to the physical platform. Today we have already started to divorce the two by deploying virtual machines essentially at will. The ease of deployment solves many problems and increases our overall flexibility. We have seen some of the same problems with nodes becoming I/O bound in some cases while competing for resources. By keeping a vigilant eye on dimensioning we have thankfully been able to keep these sorts of caveats in check.
Log, Trend and Relation Analysis
There are some other challenges which are often overshadowed by the focus on performance and availability. When dealing with the scale of the systems described above, the volume of logs generated by the system can be astounding. For example, a single incident on this system can generate up to 160,000 messages. The ability to efficiently parse and run diagnoses on this volume of data is essential. Currently, on the OpenSRS Email platform we generate over 100 Gigabytes of logs daily. All of this is with debug mode/verbose logging turned off. This is just the tip of the iceberg if you include logs produced on the OpenSRS Domains platform and pure system-level logging. The future will increasingly see these volumes balloon rapidly as the platform grows and we process more transactions. Tools to address these challenges are becoming readily available with the emergence of cloud computing.
Splunk is an example of a piece of commercial software designed to help analyze these large volumes of log information. If you haven’t seen it before, the free version will allow you to parse 500M of raw log data daily and is available for download from their site. Also interesting is piece of software developed by University of Notre Dame called ENAVis (Enterprise Network Activities Visualization). ENAVis offers a very unique visualized view of a platform. It parses system statistics at regular intervals to create links between hosts, users and processes providing a single picture of the entire platform. The interface allows one to drill down and look at a vast number of metrics. To get more details on this project read their paper here.
My personal focus at LISA was around virtualization, massive storage and compute clusters, a major focus for many organizations this year. There was no shortage of people willing to share their experiences on these subjects. I’ve touched on some of the highlights above, but its nearly impossible to capture the atmosphere this conference provided. The whole point of any professional conference is to help people to be able to make better decisions. Being able to have candid conversations and share experiences is what makes it all worthwhile. It’s clear that the challenges are the same for everyone operating at the massive scale. The innovative solutions being developed are truly exciting. We will continue to analyze these developments and see how they fit with our needs to serve you.
