Capturing the Content of (Computer) Science in Digital Libraries

Speaker: Michael Kohlhase, CMU

When & Where:

3:30pm, Wednesday, Jan 22, 2003, Room 500 AKW


One of the pertinent tasks of computer science is to supply techniques for structuring data and representing it in a form that supports algorithmic problem solving and added-value services.

It is surprising to note that the field does very little to apply these techniques to its own research and educational materials. We still predominantly use tools like LaTeX for publishing our papers and PowerPoint for presenting the CS theory and practice to our students. In effect, we produce large volumes of data about CS knowledge without turning it into a structured resource.

In this talk I will present techniques for content-based markup of CS documents and some of the added-value services supported by these.

Content markup techniques are becoming increasingly popular on the XML-based world wide web, as they add enough structure to allow for automated document processing -- in contrast to presentation markup, which facilitates human document processing -- without inflicting the burden of full formalization of the knowledge contained in the document.

I want to discuss relevant content markup formats like MathML, OpenMath, ChemML, and OMDoc, and extend the latter with the ability for markup of program code (CodeML) to arrive at a full-coverage markup format for CS content.

The talk concludes with a brief overview of the Course Capsules Project at Carnegie Mellon University, where these techniques are employed in computer-supported courseware.