Scheduling

Standard schedulers like PBS and CONDOR only consider user specifications (memory, CPU) to determine where and how to submit jobs. Current smart application-specific schedulers couple application performance metrics with knowledge of resource benchmarks and network usage to minimize the estimated wall-clock time required to execute the application code. The transfer of an executing simulation to another computer can enable further speedups. Dr. Stefano Cozzini from DEMOCRITOS and senior personnel in this proposal has extensive benchmarking experience. He has applied this strategy to submit complete classical MD simulations and will extend this approach to other simulations codes used within VLab. Some codes have fundamentally different requirements. For example, the checkpoint/restart procedures in the first principles simulation code PWscf, are more expensive (larger data files), which leads to more complex performance models. Together with Dr. Cozzini we will consider some of the file migration strategies studied in Europe to enable similar strategies for PWscf. A key component of this work will be the collection, storage, and efficient access to metadata that describes all codes, hardware, and network connections. The network metrics will be updated at regular intervals to ensure that decisions are based on current data. (Pierce, Cozzini, and Erlebacher)

sponsors | home | links | contact us