このページのリンク

The data science handbook / Field Cady

データ種別 電子ブック
出版者 (Hoboken, NJ : John Wiley & Sons, Inc)
出版年 2017
大きさ 1 online resource
著者標目 *Cady, Field 1984- author

所蔵情報を非表示

URL
射水-電子 007 EB0005185 Wiley Online Library: Complete oBooks

9781119092933

書誌詳細を非表示

一般注記 Includes bibliographical references and index
Print version record and CIP data provided by publisher
Giving extensive coverage to computer science and software engineering since they play such a central role in the daily work of a data scientist, this comprehensive book provides a crash course in data science, combining all the necessary skills into a unified discipline. -- Edited summary from book
Cover -- Title Page -- Copyright -- Dedication -- Contents -- Preface -- Chapter 1 Introduction: Becoming a Unicorn -- 1.1 Aren't Data Scientists Just Overpaid Statisticians? -- 1.2 How Is This Book Organized? -- 1.3 How to Use This Book? -- 1.4 Why Is It All in Python™, Anyway? -- 1.5 Example Code and Datasets -- 1.6 Parting Words -- Part 1 The Stuff You'll Always Use -- Chapter 2 The Data Science Road Map -- 2.1 Frame the Problem -- 2.2 Understand the Data: Basic Questions -- 2.3 Understand the Data: Data Wrangling -- 2.4 Understand the Data: Exploratory Analysis -- 2.5 Extract Features -- 2.6 Model -- 2.7 Present Results -- 2.8 Deploy Code -- 2.9 Iterating -- 2.10 Glossary -- Chapter 3 Programming Languages -- 3.1 Why Use a Programming Language? What Are the Other Options? -- 3.2 A Survey of Programming Languages for Data Science -- 3.3 Python Crash Course -- 3.4 Strings -- 3.5 Defining Functions -- 3.6 Python's Technical Libraries -- 3.7 Other Python Resources -- 3.8 Further Reading -- 3.9 Glossary -- Interlude: My Personal Toolkit -- Chapter 4 Data Munging: String Manipulation, Regular Expressions, and Data Cleaning -- 4.1 The Worst Dataset in the World -- 4.2 How to Identify Pathologies -- 4.3 Problems with Data Content -- 4.4 Formatting Issues -- 4.5 Example Formatting Script -- 4.6 Regular Expressions -- 4.7 Life in the Trenches -- 4.8 Glossary -- Chapter 5 Visualizations and Simple Metrics -- 5.1 A Note on Python's Visualization Tools -- 5.2 Example Code -- 5.3 Pie Charts -- 5.4 Bar Charts -- 5.5 Histograms -- 5.6 Means, Standard Deviations, Medians, and Quantiles -- 5.7 Boxplots -- 5.8 Scatterplots -- 5.9 Scatterplots with Logarithmic Axes -- 5.10 Scatter Matrices -- 5.11 Heatmaps -- 5.12 Correlations -- 5.13 Anscombe's Quartet and the Limits of Numbers -- 5.14 Time Series -- 5.15 Further Reading -- 5.16 Glossary
Chapter 6 Machine Learning Overview -- 6.1 Historical Context -- 6.2 Supervised versus Unsupervised -- 6.3 Training Data, Testing Data, and the Great Boogeyman of Overfitting -- 6.4 Further Reading -- 6.5 Glossary -- Chapter 7 Interlude: Feature Extraction Ideas -- 7.1 Standard Features -- 7.2 Features That Involve Grouping -- 7.3 Preview of More Sophisticated Features -- 7.4 Defining the Feature You Want to Predict -- Chapter 8 Machine Learning Classification -- 8.1 What Is a Classifier, and What Can You Do with It? -- 8.2 A Few Practical Concerns -- 8.3 Binary versus Multiclass -- 8.4 Example Script -- 8.5 Specific Classifiers -- 8.6 Evaluating Classifiers -- 8.7 Selecting Classification Cutoffs -- 8.8 Further Reading -- 8.9 Glossary -- Chapter 9 Technical Communication and Documentation -- 9.1 Several Guiding Principles -- 9.2 Slide Decks -- 9.3 Written Reports -- 9.4 Speaking: What Has Worked for Me -- 9.5 Code Documentation -- 9.6 Further Reading -- 9.7 Glossary -- Part II Stuff You Still Need to Know -- Chapter 10 Unsupervised Learning: Clustering and Dimensionality Reduction -- 10.1 The Curse of Dimensionality -- 10.2 Example: Eigenfaces for Dimensionality Reduction -- 10.3 Principal Component Analysis and Factor Analysis -- 10.4 Skree Plots and Understanding Dimensionality -- 10.5 Factor Analysis -- 10.6 Limitations of PCA -- 10.7 Clustering -- 10.8 Further Reading -- 10.9 Glossary -- Chapter 11 Regression -- 11.1 Example: Predicting Diabetes Progression -- 11.2 Least Squares -- 11.3 Fitting Nonlinear Curves -- 11.4 Goodness of Fit: R2 and Correlation -- 11.5 Correlation of Residuals -- 11.6 Linear Regression -- 11.7 LASSO Regression and Feature Selection -- 11.8 Further Reading -- 11.9 Glossary -- Chapter 12 Data Encodings and File Formats -- 12.1 Typical File Format Categories -- 12.2 CSV Files -- 12.3 JSON Files -- 12.4 XML Files
17.5 Smoothing Signals -- 17.6 Logarithms and Other Transformations -- 17.7 Trends and Periodicity -- 17.8 Windowing -- 17.9 Brainstorming Simple Features -- 17.10 Better Features: Time Series as Vectors -- 17.11 Fourier Analysis: Sometimes a Magic Bullet -- 17.12 Time Series in Context: The Whole Suite of Features -- 17.13 Further Reading -- 17.14 Glossary -- Chapter 18 Probability -- 18.1 Flipping Coins: Bernoulli Random Variables -- 18.2 Throwing Darts: Uniform Random Variables -- 18.3 The Uniform Distribution and Pseudorandom Numbers -- 18.4 Nondiscrete, Noncontinuous Random Variables -- 18.5 Notation, Expectations, and Standard Deviation -- 18.6 Dependence, Marginal and Conditional Probability -- 18.7 Understanding the Tails -- 18.8 Binomial Distribution -- 18.9 Poisson Distribution -- 18.10 Normal Distribution -- 18.11 Multivariate Gaussian -- 18.12 Exponential Distribution -- 18.13 Log-Normal Distribution -- 18.14 Entropy -- 18.15 Further Reading -- 18.16 Glossary -- Chapter 19 Statistics -- 19.1 Statistics in Perspective -- 19.2 Bayesian versus Frequentist: Practical Tradeoffs and Differing Philosophies -- 19.3 Hypothesis Testing: Key Idea and Example -- 19.4 Multiple Hypothesis Testing -- 19.5 Parameter Estimation -- 19.6 Hypothesis Testing: t-Test -- 19.7 Confidence Intervals -- 19.8 Bayesian Statistics -- 19.9 Naive Bayesian Statistics -- 19.10 Bayesian Networks -- 19.11 Choosing Priors: Maximum Entropy or Domain Knowledge -- 19.12 Further Reading -- 19.13 Glossary -- Chapter 20 Programming Language Concepts -- 20.1 Programming Paradigms -- 20.2 Compilation and Interpretation -- 20.3 Type Systems -- 20.4 Further Reading -- 20.5 Glossary -- Chapter 21 Performance and Computer Memory -- 21.1 Example Script -- 21.2 Algorithm Performance and Big-O Notation -- 21.3 Some Classic Problems: Sorting a List and Binary Search
21.4 Amortized Performance and Average Performance -- 21.5 Two Principles: Reducing Overhead and Managing Memory -- 21.6 Performance Tip: Use Numerical Libraries When Applicable -- 21.7 Performance Tip: Delete Large Structures You Don't Need -- 21.8 Performance Tip: Use Built-In Functions When Possible -- 21.9 Performance Tip: Avoid Superfluous Function Calls -- 21.10 Performance Tip: Avoid Creating Large New Objects -- 21.11 Further Reading -- 21.12 Glossary -- Part III Specialized or Advanced Topics -- Chapter 22 Computer Memory and Data Structures -- 22.1 Virtual Memory, the Stack, and the Heap -- 22.2 Example C Program -- 22.3 Data Types and Arrays in Memory -- 22.4 Structs -- 22.5 Pointers, the Stack, and the Heap -- 22.6 Key Data Structures -- 22.7 Further Reading -- 22.8 Glossary -- Chapter 23 Maximum Likelihood Estimation and Optimization -- 23.1 Maximum Likelihood Estimation -- 23.2 A Simple Example: Fitting a Line -- 23.3 Another Example: Logistic Regression -- 23.4 Optimization -- 23.5 Gradient Descent and Convex Optimization -- 23.6 Convex Optimization -- 23.7 Stochastic Gradient Descent -- 23.8 Further Reading -- 23.9 Glossary -- Chapter 24 Advanced Classifiers -- 24.1 A Note on Libraries -- 24.2 Basic Deep Learning -- 24.3 Convolutional Neural Networks -- 24.4 Different Types of Layers. What the Heck Is a Tensor? -- 24.5 Example: The MNIST Handwriting Dataset -- 24.6 Recurrent Neural Networks -- 24.7 Bayesian Networks -- 24.8 Training and Prediction -- 24.9 Markov Chain Monte Carlo -- 24.10 PyMC Example -- 24.11 Further Reading -- 24.12 Glossary -- Chapter 25 Stochastic Modeling -- 25.1 Markov Chains -- 25.2 Two Kinds of Markov Chain, Two Kinds of Questions -- 25.3 Markov Chain Monte Carlo -- 25.4 Hidden Markov Models and the Viterbi Algorithm -- 25.5 The Viterbi Algorithm -- 25.6 Random Walks -- 25.7 Brownian Motion
John Wiley and Sons Wiley Online Library: Complete oBooks
HTTP:URL=https://onlinelibrary.wiley.com/doi/book/10.1002/9781119092919
件 名 LCSH:Databases -- Handbooks, manuals, etc  全ての件名で検索
LCSH:Statistics -- Data processing -- Handbooks, manuals, etc  全ての件名で検索
LCSH:Big data -- Handbooks, manuals, etc  全ての件名で検索
LCSH:Information theory -- Handbooks, manuals, etc  全ての件名で検索
CSHF:Statistique -- Informatique -- Guides, manuels, etc  全ての件名で検索
CSHF:Donn�ees volumineuses -- Guides, manuels, etc  全ての件名で検索
CSHF:Th�eorie de l'information -- Guides, manuels, etc  全ての件名で検索
FREE:COMPUTERS -- Databases -- General  全ての件名で検索
FREE:Big data
FREE:Databases
FREE:Information theory
FREE:Statistics -- Data processing  全ての件名で検索
MESH:Handbook
FREE:handbooks
FREE:Handbooks and manuals
FREE:Handbooks and manuals
FREE:Guides et manuels
分 類 LCC:QA76.9.D32
DC23:005.74
書誌ID EB00004475
ISBN 9781119092933

 類似資料