This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems.
Le informazioni nella sezione "Riassunto" possono far riferimento a edizioni diverse di questo titolo.
This timely text/reference presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC).
The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as algorithm-based fault tolerance. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models.
Topics and features:
This authoritative volume is essential reading for all researchers and graduate students involved in high-performance computing.
Dr. Thomas Herault is a Research Scientist in the Innovative Computing Laboratory (ICL) at the University of Tennessee Knoxville, TN, USA. Dr. Yves Robert is a Professor in the Laboratory of Parallel Computing at the Ecole Normale Supérieure de Lyon, France, and a Visiting Research Scholar in the ICL.
Le informazioni nella sezione "Su questo libro" possono far riferimento a edizioni diverse di questo titolo.
EUR 9,90 per la spedizione da Germania a Italia
Destinazione, tempi e costiEUR 9,70 per la spedizione da Germania a Italia
Destinazione, tempi e costiDa: Buchpark, Trebbin, Germania
Condizione: Sehr gut. Zustand: Sehr gut | Seiten: 332 | Sprache: Englisch | Produktart: Bücher. Codice articolo 25708812/12
Quantità: 1 disponibili
Da: Universitätsbuchhandlung Herta Hold GmbH, Berlin, Germania
ix, 320p. Hardcover. Versand aus Deutschland / We dispatch from Germany via Air Mail. Einband bestoßen, daher Mängelexemplar gestempelt, sonst sehr guter Zustand. Imperfect copy due to slightly bumped cover, apart from this in very good condition. Stamped. Stamped. Computer Communications and Networks. Sprache: Englisch. Codice articolo 4823IB
Quantità: 2 disponibili
Da: moluna, Greven, Germania
Condizione: New. Dieser Artikel ist ein Print on Demand Artikel und wird nach Ihrer Bestellung fuer Sie gedruckt. The first complete overview of this increasingly important fieldPresents a unique, rigorous approach based on the design of analytical models to predict performanceProvides a coherent collection of valuable insights from internationally-renown. Codice articolo 31406393
Quantità: Più di 20 disponibili
Da: Books Puddle, New York, NY, U.S.A.
Condizione: New. pp. 320. Codice articolo 26372815544
Quantità: 1 disponibili
Da: Majestic Books, Hounslow, Regno Unito
Condizione: New. pp. 320. Codice articolo 374278503
Quantità: 1 disponibili
Da: Biblios, Frankfurt am main, HESSE, Germania
Condizione: New. pp. 320. Codice articolo 18372815538
Quantità: 1 disponibili
Da: BuchWeltWeit Ludwig Meier e.K., Bergisch Gladbach, Germania
Buch. Condizione: Neu. This item is printed on demand - it takes 3-4 days longer - Neuware -This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems. 332 pp. Englisch. Codice articolo 9783319209425
Quantità: 2 disponibili
Da: AHA-BUCH GmbH, Einbeck, Germania
Buch. Condizione: Neu. Druck auf Anfrage Neuware - Printed after ordering - This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems. Codice articolo 9783319209425
Quantità: 1 disponibili
Da: buchversandmimpf2000, Emtmannsberg, BAYE, Germania
Buch. Condizione: Neu. Neuware -This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems.Springer Verlag GmbH, Tiergartenstr. 17, 69121 Heidelberg 332 pp. Englisch. Codice articolo 9783319209425
Quantità: 2 disponibili
Da: GreatBookPrices, Columbia, MD, U.S.A.
Condizione: New. Codice articolo 23922726-n
Quantità: Più di 20 disponibili