I'll present my understanding of the superscalar processor architecture being developed by the advanced VLSI architecture research group, lead by Kemal Ebcioglu, at IBM T.J. Watson Research Center in Yorktown. I visited IBM in March, and talked to Kemal and his team, who explained their architecture to me. They claim to get 5 instructions per clock on a wide range of C code. The chief innovation, as I see it, is the use of static multiway predicated instructions. Each instruction consists of up to 8 instructions that are each conditionally executed based on several condition registers. To explain the architecture, I'll show how a simple "reduction" loop is implemented.