Saturday, July 10, 2021

Data Analytics and Science - Introduction - Bibliography

 


D ATA S C I E N C E : T H E O R I E S , M O D E L S , A L G O R I T H M S , A N D  A N A LY T I C S

S . R . D A S


Chapters and Sections

1 The Art of Data Science 25

1.1 Volume, Velocity, Variety 27

1.2 Machine Learning 29

1.3 Supervised and Unsupervised Learning 30

1.4 Predictions and Forecasts 30

1.5 Innovation and Experimentation 31

1.6 The Dark Side 31

1.6.1 Big Errors 31

1.6.2 Privacy 32

1.7 Theories, Models, Intuition, Causality, Prediction, Correlation 37


 This book’s viewpoint is that a data scientist is someone who asks unique, interesting questions of data based on formal or informal theory, to generate rigorous and useful insights. It is likely to be an individual with multi-disciplinary training in computer science, business, economics, statistics, and armed with the necessary quantity of domain knowledge relevant to the question at hand.


2 The Very Beginning: Got Math? 41

2.1 Exponentials, Logarithms, and Compounding 41

2.2 Normal Distribution 43

2.3 Poisson Distribution 43

2.4 Moments of a continuous random variable 44

2.5 Combining random variables 45

2.6 Vector Algebra 45

2.7 Statistical Regression 48

2.8 Diversification 49

2.9 Matrix Calculus 50

2.10 Matrix Equations 52



3 Open Source: Modeling in R 55

3.1 System Commands 55

3.2 Loading Data 56

3.3 Matrices 58

3.4 Descriptive Statistics 59

3.5 Higher-Order Moments 61

3.6 Quick Introduction to Brownian Motions with R 61

3.7 Estimation using maximum-likelihood 62

3.8 GARCH/ARCH Models 64

3.9 Introduction to Monte Carlo 66

3.10 Portfolio Computations in R 71

3.11 Finding the Optimal Portfolio 72

3.12 Root Solving 75

3.13 Regression 77

3.14 Heteroskedasticity 81

3.15 Auto-regressive models 83

3.16 Vector Auto-Regression 86

3.17 Logit 90

3.18 Probit 94

3.19 Solving Non-Linear Equations 95

3.20 Web-Enabling R Functions 97



4 MoRe: Data Handling and Other Useful Things 103

4.1 Data Extraction of stocks using quantmod 103

4.2 Using the merge function 109

4.3 Using the apply class of functions 114

4.4 Getting interest rate data from FRED 114

4.5 Cross-Sectional Data (an example) 117

4.6 Handling dates with lubridate 121

4.7 Using the data.table package 124

4.8 Another data set: Bay Area Bike Share data 128

4.9 Using the plyr package family 130



5 Being Mean with Variance: Markowitz Optimization 135

5.1 Quadratic (Markowitz) Problem 135

5.1.1 Solution in R 137

5.2 Solving the problem with the quadprog package 138

5.3 Tracing out the Efficient Frontier 140

5.4 Covariances of frontier portfolios: rp,rq 141

5.5 Combinations 142

5.6 Zero Covariance Portfolio 143

5.7 Portfolio Problems with Riskless Assets 143

5.8 Risk Budgeting 145



6 Learning from Experience: Bayes Theorem 149

6.1 Introduction 149

6.2 Bayes and Joint Probability Distributions 151

6.3 Correlated default (conditional default) 152

6.4 Continuous and More Formal Exposition 153

6.5 Bayes Nets 156

6.6 Bayes Rule in Marketing 159

6.7 Other Applications 162

6.7.1 Bayes Models in Credit Rating Transitions 162

6.7.2 Accounting Fraud 162

6.7.3 Bayes was a Reverend after all... 162


7 More than Words: Extracting Information from News 

7.1 Prologue
7.2 Framework 
7.3 Algorithms 169
7.3.1 Crawlers and Scrapers 169
7.3.2 Text Pre-processing 172
7.3.3 The tm package 175
7.3.4 Term Frequency - Inverse Document Frequency (TF-IDF) 178
7.3.5 Wordclouds 180
7.3.6 Regular Expressions 181
87.4 Extracting Data from Web Sources using APIs 184
7.4.1 Using Twitter 184
7.4.2 Using Facebook 187
7.4.3 Text processing, plain and simple 190
7.4.4 A Multipurpose Function to Extract Text 191
7.5 Text Classification 193
7.5.1 Bayes Classifier 193
7.5.2 Support Vector Machines 198
7.5.3 Word Count Classifiers 200
7.5.4 Vector Distance Classifier 201
7.5.5 Discriminant-Based Classifier 202
7.5.6 Adjective-Adverb Classifier 204
7.5.7 Scoring Optimism and Pessimism 205
7.5.8 Voting among Classifiers 206
7.5.9 Ambiguity Filters 206
7.6 Metrics 207
7.6.1 Confusion Matrix 207
7.6.2 Precision and Recall 208
7.6.3 Accuracy 209
7.6.4 False Positives 209
7.6.5 Sentiment Error 210
7.6.6 Disagreement 210
7.6.7 Correlations 210
7.6.8 Aggregation Performance 211
7.6.9 Phase-Lag Metrics 213
7.6.10 Economic Significance 215
7.7 Grading Text 215
7.8 Text Summarization 216
7.9 Discussion 219
7.10 Appendix: Sample text from Bloomberg for summarization 221


 8 Virulent Products: The Bass Model 227
8.1 Introduction 227
8.2 Historical Examples 227
8.3 The Basic Idea 228
8.4 Solving the Model 229
8.4.1 Symbolic math in R 231
8.5 Software 233
8.6 Calibration 234
8.7 Sales Peak 236
8.8 Notes 238



9 Extracting Dimensions: Discriminant and Factor Analysis 241
9.1 Overview 241
9.2 Discriminant Analysis 241
9.2.1 Notation and assumptions 242
9.2.2 Discriminant Function 242
9.2.3 How good is the discriminant function? 243
9.2.4 Caveats 244
9.2.5 Implementation using R 244
9.2.6 Confusion Matrix 248
9.2.7 Multiple groups 249
9.3 Eigen Systems 250
9.4 Factor Analysis 252
9.4.1 Notation 252
9.4.2 The Idea 253
9.4.3 Principal Components Analysis (PCA) 253
9.4.4 Application to Treasury Yield Curves 257
9.4.5 Application: Risk Parity and Risk Disparity 260
9.4.6 Difference between PCA and FA 260
9.4.7 Factor Rotation 260
9.4.8 Using the factor analysis function 261



10 Bidding it Up: Auctions 265
10.1 Theory 265
10.1.1 Overview 265
10.1.2 Auction types 266
10.1.3 Value Determination 266
10.1.4 Bidder Types 267
10.1.5 Benchmark Model (BM) 267
10.2 Auction Math 268
10.2.1 Optimization by bidders 269
10.2.2 Example 270
10.3 Treasury Auctions 272
10.3.1 DPA or UPA? 272
10.4 Mechanism Design 274
10.4.1 Collusion 275
10.4.2 Clicks (Advertising Auctions) 276
10.4.3 Next Price Auctions 278
10.4.4 Laddered Auction 279


11 Truncate and Estimate: Limited Dependent Variables 283
11.1 Introduction 283
11.2 Logit 284
11.3 Probit 287
11.4 Analysis 288
11.4.1 Slopes 288
11.4.2 Maximum-Likelihood Estimation (MLE) 292
11.5 Multinomial Logit 293
11.6 Truncated Variables 297
11.6.1 Endogeneity 299
11.6.2 Example: Women in the Labor Market 301
11.6.3 Endogeity – Some Theory to Wrap Up 303

12 Riding the Wave: Fourier Analysis 305
12.1 Introduction 305
12.2 Fourier Series 305
12.2.1 Basic stuff 305
12.2.2 The unit circle 305
12.2.3 Angular velocity 306
12.2.4 Fourier series 307
12.2.5 Radians 307
12.2.6 Solving for the coefficients 308
12.3 Complex Algebra 309
12.3.1 From Trig to Complex 310
12.3.2 Getting rid of a0 311
12.3.3 Collapsing and Simplifying 311
12.4 Fourier Transform 312
12.4.1 Empirical Example 314
12.5 Application to Binomial Option Pricing 315
12.6 Application to probability functions 316
12.6.1 Characteristic functions 316
12.6.2 Finance application 316
12.6.3 Solving for the characteristic function 317
12.6.4 Computing the moments 318
12.6.5 Probability density function 318


13 Making Connections: Network Theory 321
13.1 Overview 321
13.2 Graph Theory 322
13.3 Features of Graphs 323
13.4 Searching Graphs 325
13.4.1 Depth First Search 325
13.4.2 Breadth-first-search 329

13.5 Strongly Connected Components 331
13.6 Dijkstra’s Shortest Path Algorithm 333
13.6.1 Plotting the network 337
13.7 Degree Distribution 338
13.8 Diameter 340
13.9 Fragility 341
13.10Centrality 341
13.11Communities 346
13.11.1 Modularity 348
13.12Word of Mouth 354
13.13Network Models of Systemic Risk 355
13.13.1 Systemic Score, Fragility, Centrality, Diameter 355
13.13.2 Risk Decomposition 359
13.13.3 Normalized Risk Score 360
13.13.4 Risk Increments 361
13.13.5 Criticality 362
13.13.6 Cross Risk 364
13.13.7 Risk Scaling 365
13.13.8 Too Big To Fail? 367
13.13.9 Application of the model to the banking network in India 369
13.14Map of Science 371


14 Statistical Brains: Neural Networks 377

14.1 Overview 377
14.2 Nonlinear Regression 378
14.3 Perceptrons 379
14.4 Squashing Functions 381
14.5 How does the NN work? 381
14.5.1 Logit/Probit Model 382
14.5.2 Connection to hyperplanes 382
14.6 Feedback/Backpropagation 382
14.6.1 Extension to many perceptrons 384

14.7 Research Applications 384
14.7.1 Discovering Black-Scholes 384
14.7.2 Forecasting 384
14.8 Package neuralnet in R 384
14.9 Package nnet in R 390


15 Zero or One: Optimal Digital Portfolios 393
15.1 Modeling Digital Portfolios 394
15.2 Implementation in R 398
15.2.1 Basic recursion 398
15.2.2 Combining conditional distributions 401
15.3 Stochastic Dominance (SD) 404
15.4 Portfolio Characteristics 407
15.4.1 How many assets? 407
15.4.2 The impact of correlation 409
15.4.3 Uneven bets? 410
15.4.4 Mixing safe and risky assets 411

16 Against the Odds: Mathematics of Gambling 415
16.1 Introduction 415
16.1.1 Odds 415
16.1.2 Edge 415
16.1.3 Bookmakers 416
16.2 Kelly Criterion 416
16.2.1 Example 416
16.2.2 Deriving the Kelly Criterion 418
16.3 Entropy 421
16.3.1 Linking the Kelly Criterion to Entropy 421
16.3.2 Linking the Kelly criterion to portfolio optimization 422
16.3.3 Implementing day trading 422
16.4 Casino Games 423

In the Same Boat: Cluster Analysis and Prediction Trees 427

17.1 Introduction 427

17.2 Clustering using k-means 427

17.2.1 Example: Randomly generated data in kmeans 430

17.2.2 Example: Clustering of VC financing rounds 432

17.2.3 NCAA teams 434

17.3 Hierarchical Clustering 436

17.4 Prediction Trees 436

17.4.1 Classification Trees 440

17.4.2 The C4.5 Classifier 442

17.5 Regression Trees 445

17.5.1 Example: California Home Data 447


18 Bibliography 451


Business Analytics: The Next Frontier for Decision Sciences

by James R. Evans, Carl H. Lindner College of Business, University of Cincinnati, Decision Science Institute, March 2012


Business Analytics, 2nd Edition

James R. Evans, University of Cincinnati


©2016 

|

Pearson 


2016 Edition

Table of Contents

Brief Contents


Preface


About the Author


PART 1: Foundations of Business Analytics


1. Introduction to Business Analytics


2. Analytics on Spreadsheets


Part 2: Descriptive Analytics


3. Visualizing and Exploring Data


4. Descriptive Statistical Measures


5. Probability Distributions and Data Modeling


6. Sampling and Estimation


7. Statistical Inference


Part 3: Predictive Analytics


8. Trendlines and Regression Analysis


9. Forecasting Techniques


10. Introduction to Data Mining


11. Spreadsheet Modeling and Analysis


12. Monte Carlo Simulation and Risk Analysis


Part 4: Prescriptive Analytics


13. Linear Optimization


14. Applications of Linear Optimization


15. Integer Optimization


16. Decision Analysis


Supplementary Chapter A (online): Nonlinear and Non-Smooth Optimization


Supplementary Chapter B (online): Optimization Models with Uncertainty


Appendix A


Glossary


Index


https://www.pearson.com/us/higher-education/program/Evans-Business-Analytics-2nd-Edition/PGM285763.html?tab=contents


http://faculty.cbpp.uaa.alaska.edu/afef/business_analytics.htm

http://cs.furman.edu/~pbatchelor/csc105/MyPPT/Evans%20Ch%201%20My%20Version.pptx


http://cs.furman.edu/~pbatchelor


http://cs.furman.edu/~pbatchelor/csc105/

No comments:

Post a Comment