店铺推荐

数据科学的理论与实践:大数据分析与复杂数据集的技术 Data Science In Theory And Practice: Techniques For Big Data Analytics And Complex Data Sets


基本信息

Format:Hardback 400 pages

Publisher:John Wiley and Sons Ltd

Imprint:Wiley-Blackwell

ISBN:9781119674689

Published:12 Oct 2021

Weight:686g

Dimensions:233 x 161 (mm)

页面参数仅供参考,具体以实物为准


书籍简介

《数据科学的理论与实践》利用这一富有洞察力的新资源探索数据科学的基础


《数据科学的理论与实践》全面论述了用于分析各学科中出现的数据集的数学和统计模型,如银行、金融、保健、生物信息学、安全、教育和社会服务。本书分为五个部分,研究了一些常用的基本数学和统计概念,这些概念构成了数据科学的基础。作者接着分析了对从原始数据中提取信息有用的各种数据转换技术、长时记忆行为和预测性建模。


本书为读者提供了众多与复杂数据集分析相关的主题。除了对数据科学基础理论的有力探索外,它还包含了对具体和实际问题的大量应用。该书还提供了R和Python中的代码算法的例子,并提供了伪算法,以便将代码移植到任何其他语言。


对于没有强大的数据科学背景的学生和从业者来说,读者还可以从以下主题中学习:


对基础理论科目的分析,包括数据科学的历史、矩阵代数和随机向量以及多变量分析

对时间序列预测的全面考察,包括时间序列的不同组成部分和实现静止性的转换

介绍R和Python编程语言,包括两种语言的基本数据类型和样本操作

对算法的探索,包括如何编写算法和如何进行渐进式分析

对分析和预测复杂数据集的几种技术的全面讨论


《数据科学的理论与实践》非常适合数据科学、商业分析和统计学课程的高级本科生和研究生,也将在私营部门、政府和学术界的执业数据科学家、数据和商业分析师以及统计学家的图书馆中赢得一席之地。


DATA SCIENCE IN THEORY AND PRACTICE EXPLORE THE FOUNDATIONS OF DATA SCIENCE WITH THIS INSIGHTFUL NEW RESOURCE


Data Science in Theory and Practice delivers a comprehensive treatment of the mathematical and statistical models useful for analyzing data sets arising in various disciplines, like banking, finance, health care, bioinformatics, security, education, and social services. Written in five parts, the book examines some of the most commonly used and fundamental mathematical and statistical concepts that form the basis of data science. The authors go on to analyze various data transformation techniques useful for extracting information from raw data, long memory behavior, and predictive modeling.


The book offers readers a multitude of topics all relevant to the analysis of complex data sets. Along with a robust exploration of the theory underpinning data science, it contains numerous applications to specific and practical problems. The book also provides examples of code algorithms in R and Python and provides pseudo-algorithms to port the code to any other language.


Ideal for students and practitioners without a strong background in data science, readers will also learn from topics like:


Analyses of foundational theoretical subjects, including the history of data science, matrix algebra and random vectors, and multivariate analysis

A comprehensive examination of time series forecasting, including the different components of time series and transformations to achieve stationarity

Introductions to both the R and Python programming languages, including basic data types and sample manipulations for both languages

An exploration of algorithms, including how to write one and how to perform an asymptotic analysis

A comprehensive discussion of several techniques for analyzing and predicting complex data sets


Perfect for advanced undergraduate and graduate students in Data Science, Business Analytics, and Statistics programs, Data Science in Theory and Practice will also earn a place in the libraries of practicing data scientists, data and business analysts, and statisticians in the private sector, government, and academia.


作者简介

MARIA CRISTINA MARIANI, PHD, 是德克萨斯大学埃尔帕索分校数学科学系的Shigeko K. Chan特聘教授和主席。她目前的研究重点是随机分析、微分方程和机器学习,并应用于公共卫生、地球物理学、金融和其他领域的大数据和复杂数据集。Mariani博士是Wiley其他书籍的合著者,包括《量化金融》。


OSEI KOFI TWENEBOAH, PHD, 是新泽西州拉马波学院的数据科学助理教授。他的主要研究是随机分析、机器学习和科学计算,应用于金融、健康科学和地球物理学。


MARIA PIA BECCAR-VARELA, PHD, 是德州大学埃尔帕索分校数学科学系的教学副教授。她的研究兴趣包括微分方程、随机微分方程、小波分析和判别分析,应用于金融、健康科学和地震研究。


MARIA CRISTINA MARIANI, PHD, is Shigeko K. Chan Distinguished Professor and Chair in the Department of Mathematical Sciences at The University of Texas at El Paso. She currently focuses her research on Stochastic Analysis, Differential Equations and Machine Learning with applications to Big Data and Complex Data sets arising in Public Health, Geophysics, Finance and others. Dr. Mariani is co-author of other Wiley books including Quantitative Finance.


OSEI KOFI TWENEBOAH, PHD, is Assistant Professor of Data Science at Ramapo College of New Jersey. His main research is Stochastic Analysis, Machine Learning and Scientific Computing with applications to Finance, Health Sciences, and Geophysics.


MARIA PIA BECCAR-VARELA, PHD, is Associate Professor of Instruction in the Department of Mathematical Sciences at the University of Texas at El Paso. Her research interests include Differential Equations, Stochastic Differential Equations, Wavelet Analysis and Discriminant Analysis applied to Finance, Health Sciences, and Earthquake Studies?.


目录

仅部分目录

List of Figures xvii


List of Tables xxi


Preface xxiii


1 Background of Data Science 1


1.1 Introduction 1


1.2 Origin of Data Science 2


1.3 Who is a Data Scientist? 2


1.4 Big Data 3


1.4.1 Characteristics of Big Data 4


1.4.2 Big Data Architectures 5


2 Matrix Algebra and Random Vectors 7


2.1 Introduction 7


2.2 Some Basics of Matrix Algebra 7


2.2.1 Vectors 7


2.2.2 Matrices 8


2.3 Random Variables and Distribution Functions 12


2.3.1 The Dirichlet Distribution 15


2.3.2 Multinomial Distribution 17


2.3.3 Multivariate Normal Distribution 18


2.4 Problems 19


3 Multivariate Analysis 21


3.1 Introduction 21


3.2 Multivariate Analysis: Overview 21


3.3 Mean Vectors 22


3.4 Variance–Covariance Matrices 24


3.5 Correlation Matrices 26


3.6 Linear Combinations of Variables 28


3.6.1 Linear Combinations of Sample Means 29


3.6.2 Linear Combinations of Sample Variance and Covariance 29


3.6.3 Linear Combinations of Sample Correlation 30


3.7 Problems 31


4 Time Series Forecasting 35


4.1 Introduction 35


4.2 Terminologies 36


4.3 Components of Time Series 39


4.3.1 Seasonal 39


4.3.2 Trend 40


4.3.3 Cyclical 41


4.3.4 Random 42


4.4 Transformations to Achieve Stationarity 42


4.5 Elimination of Seasonality via Differencing 44


4.6 Additive and Multiplicative Models 44


4.7 Measuring Accuracy of Different Time Series Techniques 45


4.7.1 Mean Absolute Deviation 46


4.7.2 Mean Absolute Percent Error 46


4.7.3 Mean Square Error 47


4.7.4 Root Mean Square Error 48


4.8 Averaging and Exponential Smoothing Forecasting Methods 48


4.8.1 Averaging Methods 49


4.8.1.1 Simple Moving Averages 49


4.8.1.2 Weighted Moving Averages 51


4.8.2 Exponential Smoothing Methods 54


4.8.2.1 Simple Exponential Smoothing 54


4.8.2.2 Adjusted Exponential Smoothing 55


4.9 Problems 57


5 Introduction to R 61


5.1 Introduction 61


5.2 Basic Data Types 62


5.2.1 Numeric Data Type 62


5.2.2 Integer Data Type 62


5.2.3 Character 63


5.2.4 Complex Data Types 63


5.2.5 Logical Data Types 64


5.3 Simple Manipulations – Numbers and Vectors 64


5.3.1 Vectors and Assignment 64


5.3.2 Vector Arithmetic 65


5.3.3 Vector Index 66


5.3.4 Logical Vectors 67


5.3.5 Missing Values 68


5.3.6 Index Vectors 69


5.3.6.1 Indexing with Logicals 69


5.3.6.2 A Vector of Positive Integral Quantities 69


5.3.6.3 A Vector of Negative Integral Quantities 69


5.3.6.4 Named Indexing 69


5.3.7 Other Types of Objects 70


5.3.7.1 Matrices 70


5.3.7.2 List 72


5.3.7.3 Factor 73


5.3.7.4 Data Frames 75


5.3.8 Data Import 76


5.3.8.1 Excel File 76


5.3.8.2 CSV File 76


5.3.8.3 Table File 77


5.3.8.4 Minitab File 77


5.3.8.5 SPSS File 77


5.4 Problems 78


6 Introduction to Python 81


6.1 Introduction 81


6.2 Basic Data Types 82


6.2.1 Number Data Type 82


6.2.1.1 Integer 82


6.2.1.2 Floating-Point Numbers 83


6.2.1.3 Complex Numbers 84


6.2.2 Strings 84


6.2.3 Lists 85


6.2.4 Tuples 86


6.2.5 Dictionaries 86


6.3 Number Type Conversion 87


6.4 Python Conditions 87


6.4.1 If Statements 88


6.4.2 The Else and Elif Clauses 89


6.4.3 The While Loop 90


6.4.3.1 The Break Statement 91


6.4.3.2 The Continue Statement 91


6.4.4 For Loops 91


6.4.4.1 Nested Loops 92


6.5 Python File Handling: Open, Read, and Close 93


6.6 Python Functions 93


6.6.1 Calling a Function in Python 94


6.6.2 Scope and Lifetime of Variables 94


6.7 Problems 95


7 Algorithms 97


7.1 Introduction 97


7.2 Algorithm – Definition 97


7.3 How toWrite an Algorithm 98


7.3.1 Algorithm Analysis 99


7.3.2 Algorithm Complexity 99


7.3.3 Space Complexity 100


7.3.4 Time Complexity 100


7.4 Asymptotic Analysis of an Algorithm 101


7.4.1 Asymptotic Notations 102


7.4.1.1 Big O Notation 102


7.4.1.2 The Omega Notation, Ω 102


7.4.1.3 The Θ Notation 102


7.5 Examples of Algorithms 104


7.6 Flowchart 104


7.7 Problems 105


8 Data Preprocessing and Data Validations 109


8.1 Introduction 109


8.2 Definition – Data Preprocessing 109


8.3 Data Cleaning 110


8.3.1 Handling Missing Data 110


8.3.2 Types of Missing Data 110


8.3.2.1 Missing Completely at Random 110


8.3.2.2 Missing at Random 110


8.3.2.3 Missing Not at Random 111


8.3.3 Techniques for Handling the Missing Data 111


8.3.3.1 Listwise Deletion 111


8.3.3.2 Pairwise Deletion 111


8.3.3.3 Mean Substitution 112


8.3.3.4 Regression Imputation 112


8.3.3.5 Multiple Imputation 112


8.3.4 Identifying Outliers and Noisy Data 113


8.3.4.1 Binning 113


8.3.4.2 Box and Whisker plot 113


8.4 Data Transformations 115


8.4.1 Min–Max Normalization 115


8.4.2 Z-score Normalization 115


8.5 Data Reduction 116


8.6 Data Validations 117


8.6.1 Methods for Data Validation 117


8.6.1.1 Simple Statistical Criterion 117


8.6.1.2 Fourier Series Modeling and SSC 118


8.6.1.3 Principal Component Analysis and SSC 118


8.7 Problems 119


9 Data Visualizations 121


9.1 Introduction 121


9.2 Definition – Data Visualization 121


9.2.1 Scientific Visualization 123


9.2.2 Information Visualization 123


9.2.3 Visual Analytics 124


9.3 Data Visualization Techniques 126


9.3.1 Time Series Data 126


9.3.2 Statistical Distributions 127


9.3.2.1 Stem-and-Leaf Plots 127


9.3.2.2 Q–Q Plots 127


9.4 Data Visualization Tools 129


9.4.1 Tableau 129


9.4.2 Infogram 130


9.4.3 Google Charts 132


9.5 Problems 133


10 Binomial and Trinomial Trees 135

返回顶部