Spark : big data cluster computing in production / Ilya Ganelin [and others]
データ種別 | 電子ブック |
---|---|
出版者 | Indianapolis, IN : John Wiley & Sons, Inc |
出版年 | [2016] |
大きさ | 1 online resource (219 pages) |
著者標目 | *Ganelin, Ilya author Orhian, Ema author Sasaki, Kai author York, Brennon author |
書誌詳細を非表示
一般注記 | Spark: Big Data Cluster Computing in Production goes beyond general Spark overviews to provide targeted guidance toward using lightning-fast big-data clustering in production. Written by an expert team well-known in the big data community, this book walks you through the challenges in moving from proof-of-concept or demo Spark applications to live Spark in production. Real use cases provide deep insight into common problems, limitations, challenges, and opportunities, while expert tips and tricks help you get the most out of Spark performance. Coverage includes Spark SQL, Tachyon, Kerberos, ML Lib, YARN, and Mesos, with clear, actionable guidance on resource scheduling, db connectors, streaming, security, and much more Print version record Spark"!Big Data Cluster Computing in Production; About the Authors; About the Technical Editors; Credits; Acknowledgments; Contents at a glance; Contents; Introduction; Chapter 1 Finishing Your Spark Job; Installation of the Necessary Components; Native Installation Using a Spark Standalone Cluster; The History of Distributed Computing That Led to Spark; Enter the Cloud; Understanding Resource Management; Using Various Formats for Storage; Text Files; Sequence Files; Avro Files; Parquet Files; Making Sense of Monitoring and Instrumentation; Spark UI; Spark Standalone UI; Metrics REST API Metrics SystemExternal Monitoring Tools; Summary; Chapter 2 Cluster Management; Background; Spark Components; Driver; Workers and Executors; Configuration; Spark Standalone; Architecture; Single-Node Setup Scenario; Multi-Node Setup; YARN; Architecture; Dynamic Resource Allocation; Scenario; Mesos; Setup; Architecture; Dynamic Resource Allocation; Basic Setup Scenario; Comparison; Summary; Chapter 3 Performance Tuning; Spark Execution Model; Partitioning; Controlling Parallelism; Partitioners; Shuffling Data; Shuffling and Data Partitioning; Operators and Shuffling Shuffling Is Not That Bad After AllSerialization; Kryo Registrators; Spark Cache; Spark SQL Cache; Memory Management; Garbage Collection; Shared Variables; Broadcast Variables; Accumulators; Data Locality; Summary; Chapter 4 Security; Architecture; Security Manager; Setup Configurations; ACL; Configuration; Job Submission; Web UI; Network Security; Encryption; Event logging; Kerberos; Apache Sentry; Summary; Chapter 5 Fault Tolerance or Job Execution; Lifecycle of a Spark Job; Spark Master; Spark Driver; Spark Worker; Job Lifecycle; Job Scheduling; Scheduling within an Application Scheduling with External UtilitiesFault Tolerance; Internal and External Fault Tolerance; Service Level Agreements (SLAs); Resilient Distributed Datasets (RDDs); Batch versus Streaming; Testing Strategies; Recommended Configurations; Summary; Chapter 6 Beyond Spark; Data Warehousing; Spark SQL CLI; Thrift JDBC/ODBC Server; Hive on Spark; Machine Learning; DataFrame; MLlib and ML; Mahout on Spark; Hivemall on Spark; External Frameworks; Spark Package; XGBoost; spark-jobserver; Future Works; Integration with the Parameter Server; Deep Learning; Enterprise Usage Collecting User Activity Log with Spark and KafkaReal-Time Recommendation with Spark; Real-Time Categorization of Twitter Bots; Summary; Index; EULA John Wiley and Sons Wiley Online Library: Complete oBooks HTTP:URL=https://onlinelibrary.wiley.com/doi/book/10.1002/9781119254805 |
---|---|
件 名 | LCSH:Spark (Electronic resource : Apache Software Foundation) FREE:Spark (Electronic resource : Apache Software Foundation) LCSH:Electronic data processing -- Distributed processing 全ての件名で検索 LCSH:Big data LCSH:Parallel processing (Electronic computers) CSHF:Traitement r�eparti CSHF:Donn�ees volumineuses CSHF:Parall�elisme (Informatique) FREE:COMPUTERS -- General 全ての件名で検索 FREE:Parallel processing (Electronic computers) FREE:Electronic data processing -- Distributed processing 全ての件名で検索 FREE:Big data |
分 類 | LCC:QA76.9.D5 DC23:005.3/76 |
書誌ID | EB00004470 |
ISBN | 9781119254805 |
類似資料
この資料の利用統計
このページへのアクセス回数:7回
※2019年3月27日以降
全貸出数:0回
(1年以内の貸出:0回)
※2019年3月27日以降