Technical Report Number
Data, Computer Systems Organization, Information Systems
This paper discusses the design of BATRUN Distributed Processing System (DPS). We have developed this system to automate the execution of jobs in a cluster of workstations where machines belong to different owners. The objective is to use a general purpose cluster as one massive computer for processing large applications. In contrast to a dedicated cluster, the scheduling in BATRUN DPS must ensure that only the idle cycles are used for distributed computing and local users, when they are operating, have the full control of their machines. BATRUN DPS has several unique features: (1) group-based scheduling policy to ensure execution priority based on ownership of machines, (2) multi-cell distributed design to eliminate a single point failure as well as to guarantee better fault tolerance and scalability. The implementation of the system is based on multi-threading and remote procedure call mechanism.