Cart
Free US shipping over $10
Proud to be B-Corp

Probability and Statistics for Computer Science David Forsyth

Probability and Statistics for Computer Science By David Forsyth

Probability and Statistics for Computer Science by David Forsyth


$45.00
Condition - Good
Only 1 left

Faster Shipping

Get this product faster from our US warehouse

Probability and Statistics for Computer Science Summary

Probability and Statistics for Computer Science by David Forsyth

This textbook is aimed at computer science undergraduates late in sophomore or early in junior year, supplying a comprehensive background in qualitative and quantitative data analysis, probability, random variables, and statistical methods, including machine learning.

With careful treatment of topics that fill the curricular needs for the course, Probability and Statistics for Computer Science features:

* A treatment of random variables and expectations dealing primarily with the discrete case.

* A practical treatment of simulation, showing how many interesting probabilities and expectations can be extracted, with particular emphasis on Markov chains.

* A clear but crisp account of simple point inference strategies (maximum likelihood; Bayesian inference) in simple contexts. This is extended to cover some confidence intervals, samples and populations for random sampling with replacement, and the simplest hypothesis testing.

* A chapter dealing with classification, explaining why it's useful; how to train SVM classifiers with stochastic gradient descent; and how to use implementations of more advanced methods such as random forests and nearest neighbors.

* A chapter dealing with regression, explaining how to set up, use and understand linear regression and nearest neighbors regression in practical problems.

* A chapter dealing with principal components analysis, developing intuition carefully, and including numerous practical examples. There is a brief description of multivariate scaling via principal coordinate analysis.

* A chapter dealing with clustering via agglomerative methods and k-means, showing how to build vector quantized features for complex signals.

Illustrated throughout, each main chapter includes many worked examples and other pedagogical elements such as

boxed Procedures, Definitions, Useful Facts, and Remember This (short tips). Problems and Programming Exercises are at the end of each chapter, with a summary of what the reader should know.

Instructor resources include a full set of model solutions for all problems, and an Instructor's Manual with accompanying presentation slides.

About David Forsyth

David Alexander Forsyth is Fulton Watson Copp Chair in Computer Science at the University of Illinois at Urbana-Champaign, where he is a leading researcher in computer vision.
Professor Forsyth has regularly served as a program or general chair for the top conferences in computer vision, and has just finished a second term as Editor-in-Chief for IEEE Transactions on Pattern Analysis and Machine Intelligence.

A Fellow of the ACM (2014) and IEEE (2009), Forsyth has also been recognized with the IEEE Computer Society's Technical Achievement Award (2005), the Marr Prize, and a prize for best paper in cognitive computer vision (ECCV 2002). Many of his former students are famous in their own right as academics or industry leaders.

He is the co-author with Jean Ponce of Computer Vision: A Modern Approach (2002; 2011), published in four languages, and a leading textbook on the topic.

Among a variety of odd hobbies, he is

a compulsive diver, certified up to normoxic trimix level.

Table of Contents

1 Notation and conventions 9

1.0.1 Background Information........................................................................ 10

1.1 Acknowledgements................................................................................................. 11

I Describing Datasets

; 12

2 First Tools for Looking at Data 13

2.1 Datasets....................................................................................

................................... 13

2.2 What's Happening? - Plotting Data................................................................. 15

2.2.1 Bar< Charts.................................................................................................... 16

2.2.2 Histograms................................................................................................... 16

2.2.3 How to Make Histograms...................................................................... 17

2.2.4 Conditional Histograms.......................................................................... 19

2.3 Summarizing 1D Data.................................

........................................................... 19

2.3.1 The Mean...................................................................................................... 20

2.3.2 Standard Deviation................................................................................... 22

2.3.3 Computing Mean and Standard Deviation Online...................... 26

2.3.4 Variance......................................................................................................... 26

2.3.5 The Median.................................................................................................. 27

2.3.6 Interqu

artile Range.................................................................................. 29

2.3.7 Using Summaries Sensibly.................................................................... 30

2.4 Plots and Summaries............................................................................................. 31

2.4.1 Some Properties of Histograms.......................................................... 31

2.4.2 Standard Coordinates and Normal Data......................................... 34

2.4.3 Box Plots....................................................................................................... 38

2.5 Whose is bigger? Inves

tigating Australian Pizzas...................................... 39

2.6 You should.................................................................................................................. 43

2.6.1 remember these definitions:................................................................. 43

2.6.2 remember these terms............................................................................ 43

2.6.3 remember these facts:............................................................................. 43

2.6.4 be able to...................................................................................................... 43

3 Looking at Relationships

47

3.1 Plotting 2D Data...................................................................................................... 47

3.1.1

3.1.2 Series...............................

............................................................................... 51

3.1.3 Scatter Plots for Spatial Data.............................................................. 53

3.1.4 Exposing Relationships with Scatter Plots..................................... 54

3.2 Correlation.................................................................................................................. 57

3.2.1 The Correlation Coefficient................................................................... 60

3.2.2 Using Correlation to Predict................................................................ 64

3.2.3 Confusion caused by co

rrelation......................................................... 68

1


<3.3 Sterile Males in Wild Horse Herds.................................................................. 68

3.4 You should.................................................................................................................. 72

3.4.1 remember these definitions:................................................................. 72

3.4.2 remember these terms............................................................................ 72

3.4.3

remember these facts: . .

. . .

3.4.4

use these procedures: . . .

. . .

3.4.5

be able to: . . . . . . . . .

. . .

. . . . . . . . . . . . . . . . . 72

. . . . . . . . . . . . . . . . . 72

. . . . . . . . . . . . . . . . . 72

II Probability &

nbsp; 78

4 Basic ideas in probability 79

4.1 Experiments, Outcomes and Probability....................................................... 79

4.1.1 Outcomes and Probability...................................................................... 79

4.2 Events.................

.......................................................................................................... 81

4.2.1 Computing Event Probabilities by Counting Outcomes............. 83

4.2.2 The Probability of Events...................................................................... 87

4.2.3 Computing Probabilities by Reasoning about Sets...................... 89

4.3 Independence............................................................................................................ 92

4.3.1 Example: Airline Overbooking............................................................ 96

4.4 Conditional ...........................................

............. 99

4.4.1 Evaluating Conditional Probabilities.............................................. 100

4.4.2 Detecting Rare Events is Hard......................................................... 104

4.4.3 Conditional Probability and Various Forms of Independence . 106 4.4.4 The Prosecutor's Fallacy 108

4.4.5 Example: The Monty Hall Problem................................................ 110

4.5 Extra Worked Examples.................................................................................... 112

4.5.1 Outcomes and Probability........................................

........................... 112

4.5.2 Events.......................................................................................................... 114

4.5.3 Independence........................................................................................... 115

4.5.4 Conditional Probability......................................................................... 117

4.6 You should............................................................................................................... 121

4.6.1 remember these definitions:.............................................................. 121

4.6.2 remember these terms........

................................................................. 121

4.6.3 remember and use these facts.......................................................... 121

4.6.4 remember these points:....................................................................... 121

4.6.5 be able to.................................................................................................... 121

5 Random Variables and Expectations

128

5.1 Random Variables................................................................................................. 128

5.1.1 Joint and Conditional Probability for Random Variables . . . 131

5.1.2 Just a Little Continuous Probability............................................... 134

5.2 Expectations and Expected Values................................................................ 137

5.2.1 Expected Values...................................................................................... 138

5.2.2 Mean, Variance and Covariance....................................................... 141

5.2.3

Expectations and Statistics................................................................. 145

5.3 The Weak Law of Large Numbers................................................................ 145


5.3.1

IID Samples . . . . . . .

. .

. .

. . . .

. . . . .

. . .

. . . .

. 145

5.3.2

Two Inequalities . . . .

. .

. .

. . . .

. . . . .

. . .

. . . .<

. 146

5.3.3

Proving the Inequalities

. .

. .

. . . .

. . . . .

. . .

. . . .

. 147

5.3.4 The Weak Law of Large Numbers.................................................. 149

5.4 Using the Weak Law of Large Numbers 151

5.4.1 Should you accept a bet?..................................................................... 151

5.4.2 Odds, Expectations and Bookmaking - a Cultural Diversion 152 5.4.3 Ending a Game Early 154

5.4.4 Making a Decision with Decision Trees and

Expectations . . 154 5.4.5 Utility 156

5.5 You should................................................................................... 159

5.5.1 remember these definitions:.............................................................. 159

5.5.2 remember these terms......................................................................... 159

5.5.3 use and remember these facts.......................................................... 159

5.5.4 be able to.................................................................................................... 160

6 Useful Probability Distributions

; 167

6.1 Discrete Distributions 167

6.1.1 The Discrete Uniform Distribution................................................. 167

6.1.2 Bernoulli Random Variables..........................................................

..... 168

6.1.3 The Geometric Distribution................................................................ 168

6.1.4 The Binomial Probability Distribution........................................... 169

6.1.5 Multinomial probabilities..................................................................... 171

6.1.6 The Poisson Distribution..................................................................... 172

6.2 Continuous Distributions

; 174

6.2.1 The Continuous Uniform Distribution........................................... 174

6.2.2 The Beta Distribution........................................................................... 174

6.2.3 The Gamma Distribution..................................................................... 176

6.2.4 The Exponential Distribution............................................................ 176

6.3 The Normal Distribution ; 178

6.3.1 The Standard Normal Distribution................................................. 178

6.3.2 The Normal Distribution..................................................................... 179

6.3.3 Properties of The Normal Distribution......................................... 180

6.4 Approximating Binomials with Large N 182

6.4.1 Large N..............................................................

......................................... 183

6.4.2 Getting Normal<........................................................................................ 185

6.4.3 Using a Normal Approximation to the Binomial Distribution 187

6.5

You should . . . . . . . . . . . . .

. .

. . . .

. . . . .

. . .

. .

6.5.1 remember these definitions:

. .

. . . .

. . . . .

. . .

. .

6.5.2

remember these terms: .

. .

. .

. . . .

. . . . .

. . .

. .

6.5.3

remember these facts: .

. .

. .

. . . .

. . . . .

. . .

. .

6.5.4

remember these points:

. .

. .

.

. . .

. . . . .

. . .

. .<

. . . 188

. . . 188

. . . 188

. . . 188

. . . 188


III Inference

; 196

7 Samples and Populations 197

7.1 The Sample Mean................................................................................................. 197

7.1.1 The Sample Mean is an Estimate of the Population Mean . . 197

7.1.2 The Varianc

e of the Sample Mean.................................................. 198

7.1.3 When The Urn Model Works............................................................ 201

7.1.4 Distributions are Like Populations................................................. 202

7.2 Confidence Intervals............................................................................................ 203

7.2.1 Constructing Confidence Intervals.................................................. 203

7.2.2 Estimating the Variance of the Sample Mean............................ 204

7.2.3 The Probability Distribution of the Sample Mean..................... 206 &

lt;

7.2.4 Confidence Intervals for Population Means................................. 208

7.2.5 Standard Error Estimates from Simulation................................. 212

7.3 You should............................................................................................................... 216

7.3.1 remember these definitions:.............................................................. 216

7.3.2 remember these terms......................................................................... 216

7.3.3 remember these facts:........................................................................... 216

7.3.4 use these procedures............................................................................. 216

7.3.5 be able to.................................................................................................... 216

8 The Significance of Evidence 221

8.1 Significance..............................................................

................................................ 222

8.1.1 Evaluating Significance......................................................................... 223

8.1.2 P-values....................................................................................................... 225

8.2 Comparing the Mean of Two Populations.................................................. 230

8.2.1 Assuming Known Population Standard Deviations................... 231

8.2.2 Assuming Same, Unknown Population Standard Deviation . 233

8.2.3 Assuming Different, Unknown Population Stand

ard Deviation 235

8.3 Other Useful Tests of Significance................................................................. 237

8.3.1 F-tests and Standard Deviations...................................................... 237

8.3.2 2 Tests of Model Fit............................................................................ 239

8.4 Dangerous Behavior............................................................................................. 244

8.5 You should............................................................................................................... 246

8.5.1 remember these definitions:......................................

........................ 246

8.5.2 remember

8.5.3

remember these facts: . .

. . .

8.5.4

use these procedures: . . .

. . .

8.5.5

be able to: . . . . . . . . .

. . .

. . . . . . . . . . . . . . . . . 246

. . . . . . . . . . . . . . . . . 246

. . . . . . . . . . . . . . . . . 246

9 Experiments &nbs

p; 251

9.1 A Simple Experiment: The Effect of a Treatment.................................. 251

9.1.1 Randomized Balanced Experiments............................................... 252

9.1.2 Decomposing Error in Predictions.................................................. 253

9.1.3 Estimating the Noise Variance......................................................... 253

9.1.4 The ANOVA Table.................................................................................. 255

9.1.5 Unbalanced Experiments.................................................................... 257


9.1.6 Significant Differences.......................................................................... 259

9.2 Two Factor Experiments.................................................................................... 261

9.2.1 &n

bsp; Decomposing the Error........................................................................ 264

9.2.2 Interaction Between Effects................................................................ 265

9.2.3 The Effects of a Treatment................................................................. 266

9.2.4 Setting up an ANOVA Table.............................................................. 267

9.3 You should............................................................................................................... 272

9.3.1 remember these definitions:.............................................................. 272

9.3.2

remember these terms......................................................................... 272

9.3.3 remember these facts:........................................................................... 272

9.3.4 use these procedures............................................................................. 272

9.3.5 be able to.................................................................................................... 272

9.3.6 Two-Way Experiments.......................................................................... 274

10 Inferring Probability Models from Data &n

bsp; 275

10.1 Estimating Model Parameters with Maximum Likelihood.................. 275

10.1.1 The Maximum Likelihood Principle............................................... 277

10.1.2 Binomial, Geometric and Multinomial Distributions................ 278

10.1.3 Poisson and Normal Distributions................................................... 281

10.1.4 Confidence Intervals for Model Parameters................................ 286

10.1.5 Cautions about Maximum Likelihood............................................ 288

10.2 Incorporating Prio

rs with Bayesian Inference.......................................... 289

10.2.1 Conjugacy................................................................................................... 292

10.2.2 MAP Inference......................................................................................... 294

10.2.3 Cautions about Bayesian Inference................................................. 296

10.3 Bayesian Inference for Normal Distributions............................................ 296

10.3.1 Example: Measuring Depth of a Borehole................................... 296

10.3.2 Normal Prior and Normal Likelihood Yield Normal Posterior 297

10.3.3 Filtering....................................

.................................................................. 300

10.4 You should............................................................................................................... 303

10.4.1 remember these definitions:.............................................................. 303

10.4.2 remember these terms......................................................................... 303

10.4.3 remember these facts:........................................................................... 304

10.4.4 use these procedures............................................................................. 304

10.4.5 be able to.................................................................................................... 304

&

lt;

IV Tools 312

11 Extracting Important Relationships in High Dimensions 313

11.1 Summaries and Simple Plots........................................................................... 313

11.1.1 The Mean......................

............................................................................. 314

11.1.2 Stem Plots and Scatterplot Matrices.............................................. 315

11.1.3 Covariance.................................................................................................. 317

11.1.4 The Covariance Matrix......................................................................... 319

11.2 Using Mean and Covariance to Understand High Dimensional Data . 321

11.2.1 Mean and Covariance under Affine Transformations............... 322


11.2.2

. .

324

. .

325

. .

326

. .

327

. .

329

.

332

. .

334

. .

335

. .

335

. .

338

. .

339

. .

341

. .

<

345

. .

345

. .

345

. .

345

. .

345

. .

345

349

. .

349

. .

350

. .

350

. .

351

. .

351

. .

353

. .

355

. .

357

. .

358

. .

359

. .

<360

.< .

361

<

Eigenvectors and Diagonalization . . . . . . . . . . . . . .

11.2.3 Diagonalizing Covariance by Rotating Blobs . . . . . . . .

11.2.4 Approximating Blobs . . . . . . . . . . . . . . . . . . . .

11.2.5 Example: Transforming the Height-Weight Blob . . . . .

11.3 Principal Components Analysis . . . . . . . . . . . . . . . . . . .

11.3.1 Example: Representing Colors with Principal Components

11.3.2 Example: Representing Faces

with Principal Components

11.4 Multi-Dimensional Scaling . . . . . . . . . . . . . . . . . . . . . .

11.4.1 Choosing Low D Points using High D Distances . . . . . .

11.4.2 Factoring a Dot-Product Matrix . . . . . . . . . . . . . .

11.4.3 Example: Mapping with Multidimensional Scaling . . . .

11.5 Example: Understanding Height and Weight . . . . . . . . . . . 11.6 You should . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11.6.1 remember these definitions: . . . . . . . . . . . . . . . . .

11.6.2 remember these terms: . . . . . . . . . . . . . . . . . . . .

11.6.3 remember these facts: .&

nbsp; . . . . . . . . . . . . . . . . . . .

11.6.4 use these procedures: . . . . . . . . . . . . . . . . . . . . . 11.6.5 be able to: . . . . . . . . . . . . . . . . . . . . . . . . . . .

12 Learning to Classify

12.1 Classification: The Big Ideas . . . . . . . . . . . . . . . . . . . . 12.1.1 The Error Rate . . . . . . . . . . . . . . . . . . . . . . . . 12.1.2 Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.3 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . .

12.1.4 Is the Classifier Working Well? . . . . . . . . . . . . . . .

12.2 Classifying with Nearest Neighbors . . . . .

. . . . . . . . . . . .

12.3 Classifying with Naive Bayes . . . . . . . . . . . . . . . . . . . . 12.3.1 Missing Data . . . . . . . . . . . . . . . . . . . . . . . . .

12.4 The Support

12.4.1 Choosing a Classifier with the Hinge Loss . . . . . . . . .

12.4.2 Finding a Minimum: General Points . . . . . . . . . . . .

12.4.3 Finding a Minimum: Stochastic Gradient Descent . . . .

12.4.4 Example: Training an SVM with Stochastic Gradient Descent 363

12.4.5 Multi-Class Classification with SVMs.............................................. 366

12.5 Classifying with Random Forests................................................................... 367

12.5.1 Building a Decision Tree..................................................................... 367

12.5.2 Choosing a Split with Information Gain........................................ 370

12.5.3 Forests......................................................................................................... 373

12.5.4 Building and Evaluating a Decision Forest.................................. 374

12.5.5 Classifying Data Items with a Decision Forest........................... 375

12.6 You should...................................................................

............................................ 378

12.6.1 remember these definitions:.............................................................. 378

12.6.2 remember these terms......................................................................... 378

12.6.3 remember these facts:........................................................................... 379

12.6.4 use these procedures............................................................................. 379

12.6.5 be able to.................................................................................................... 379


<

13.1 The Curse of Dimension........................................................................

............. 384

13.1.1 The Curse: Data isn't Where You Think it is............................. 384

13.1.2 Minor Banes of Dimension.................................................................. 386

13.2 The Multivariate Normal Distribution......................................................... 387

13.2.1 Affine Transformations and Gaussians.......................................... 387

13.2.2 Plotting a 2D Gaussian: Covariance Ellipses.............................. 388

13.3 Agglomerative and Divisive Clustering........................................................ 389

13.3.1 Clustering and Distance....................................................................... 391

13.4 &

nbsp; The K-Means Algorithm and Variants......................................................... 392

13.4.1 How to choose K...................................................................................... 395

13.4.2 Soft Assignment....................................................................................... 397

13.4.3 General Comments on K-Means....................................................... 400

13.4.4 K-Mediods.................................................................................................. 400

13.5 Application Example: Clustering Documents........................................... 401

13.5.1 A Topic Model......................................................................................

.... 402

13.6 Describing Repetition with Vector Quantization...................................... 403

13.6.1 Vector Quantization............................................................................... 404

13.6.2 Example: Groceries in Portugal....................................................... 406

13.6.3 Efficient Clustering and Hierarchical K Means.......................... 409

13.6.4 Example: Activity from Accelerometer Data............................... 409

13.7 You should............................................................................................................... 413

13.7.1 remember these definitions:.............................................................. 413

13.7.2 remember these terms......................................................................... 413

13.7.3 remember these facts:........................................................................... 413

13.7.4 use these procedures............................................................................. 413

14 Regression &nbs

p; 417

14.1.1 Regression to Make Predictions....................................................... 417

14.1.2 Regression to Spot Trends.................................................................. 419

14.1 Linear Regression and Least Squares.......................................................... 421

14.1.1 Linear Regression................................................................................... 421

14.1.2 Choosing .................................................................................................. 422

14.1.3 Solving the Least Squares Problem................................................ 423

14.1.4 &n

bsp; Residuals..................................................................................................... 424

14.1.5 R-squared.................................................................................................... 424

14.2 Producing Good Linear Regressions............................................................. 427

14.2.1 Transforming Variables........................................................................ 428

14.2.2 Problem Data Points have Significant Impact............................ 431

14.2.3 Functions of One Explanatory Variable........................................ 433

14.2.4 Regularizing Linear Regressions...................................................... 435

14.3 &nbs

p; Exploiting Your Neighbors

14.3.1 Using your Neighbors to Predict More than a Number............ 441

14.3.2 Example: Filling Large Holes with Whole Images.................... 441

14.4

You should . . . . . . . . . . . . .

. .

. . . .

. . . . .

. . .

14.4.1 remember these definitions:

. .

. . . .

. . .

. .

. . .

14.4.2 remember these terms: . . .

. .

. . . .

. . . . .

. . .

. . . . . 444

. . . . . 444

. . . . . 444


14.4.3 remember these facts:........................................................................... 444

14.4.4 remember these procedures:............................................................. 444

15 Markov Chains and Hidden Markov Models &n

bsp; 454

15.1 Markov Chains........................................................................................................ 454

15.1.1 Transition Probability Matrices........................................................ 457

15.1.2 Stationary Distributions....................................................................... 459

15.1.3 Example: Markov Chain Models of Text...................................... 462

15.2 Estimating Properties of Markov Chains.................................................... 465

15.2.1 Simulation....................................................

.............................................. 465

15.2.2 Simulation Results as Random Variables..................................... 467

15.2.3 Simulating Markov Chains.................................................................. 469

15.3 Example: Ranking the Web by Simulating a Markov Chain................ 472

15.4 Hidden Markov Models and Dynamic Programming............................. 473

15.4.1 Hidden Markov Models........................................................................ 474

15.4.2 Picturing Inference with a Trellis.................................................... 474

15.4.3 Dynamic Programming for HMM's: Formalities....................... 478

15.4.4 &nb

sp; Example: Simple Communication Errors..................................... 478

15.5 You should............................................................................................................... 481

15.5.1 remember these definitions:.............................................................. 481

15.5.2 remember these terms......................................................................... 481

15.5.3 remember these facts:........................................................................... 481

15.5.4 be able to.................................................................................................... 481

V Some Mathematical Background &nb

sp; 484

16 Resources 485

16.1 Useful

Material about Matrices....................................................................... 485

16.1.1 The Singular Value Decomposition................................................. 486

16.1.2 Approximating A Symmetric Matrix............................................... 487

16.2 Some Special Functions..................................................................................... 489

16.3 Finding Nearest Neighbors............................................................................... 490

16.4 Entropy and Information Gain........................................................................ 493

Additional information

CIN3319644092G
9783319644097
3319644092
Probability and Statistics for Computer Science by David Forsyth
Used - Good
Hardback
Springer International Publishing AG
20180220
367
N/A
Book picture is for illustrative purposes only, actual binding, cover or edition may vary.
This is a used book - there is no escaping the fact it has been read by someone else and it will show signs of wear and previous use. Overall we expect it to be in good condition, but if you are not entirely satisfied please get in touch with us

Customer Reviews - Probability and Statistics for Computer Science