Spark: Big Data Cluster Computing in Production - Brossura

Ganelin, Ilya

 
9781119254010: Spark: Big Data Cluster Computing in Production

Sinossi

Professional Spark addresses the challenges in moving from proof-of-concept or demo Spark applications to live Spark in production. It covers:

  • Cluster Managers
  • Performance Tuning
  • Security
  • Lifecycle of Jobs
  • Shared experience about what problems are encountered while running Spark in production.
  • Real use cases where Spark fits best.
  • Data warehouse with Spark SQL 
  • How to schedule resources on a production cluster between Spark applications. Tips and tricks.
  • Spark with Tachyon. The benefits of storing the the RDDs of heap.
  • Describe the available spark db connectors. Tips and tricks, and how to use them.
  • What are the limitations and the advantages of using spark MLlib?
  • Spark streaming in production. What are the limitations? The problems encountered? How were they fixed?
  • most important for a production environment : security. How to ensure security on spark (Kerberos)
  • Spark on Yarn
  • Spark on Mesos
  • Spark Hardware requirements / estimating cluster size

Le informazioni nella sezione "Riassunto" possono far riferimento a edizioni diverse di questo titolo.

Informazioni sull?autore

Ilya Ganelin is a data engineer working at Capital One Data Innovation Lab. Ilya is an active contributor to the core components of Apache Spark and a committer to Apache Apex.

Ema Orhian is a Big Data Engineer interested in scaling algorithms. She is the main committer on jaws-spark-sql-rest, a data warehouse explorer on top of Spark SQL.

Kai Sasaki is a software engineer working in distributed computing and machine learning. He is a Spark contributor who develops mainly MLlib, ML libraries.

Brennon York has been a core contributor to Apache Spark since 2014 including development on GraphX and the core build environment.

Le informazioni nella sezione "Su questo libro" possono far riferimento a edizioni diverse di questo titolo.

Altre edizioni note dello stesso titolo

9788126562480: Spark: Big Data Cluster Computing in Production

Edizione in evidenza

ISBN 10:  812656248X ISBN 13:  9788126562480
Casa editrice: Wiley india Pvt. Ltd
Brossura