博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
1.Overview and Descriptive Statistics
阅读量:5306 次
发布时间:2019-06-14

本文共 5838 字,大约阅读时间需要 19 分钟。

1.Populations, Samples, and Processes

An investigation will typically focus on a well-defined collection of objects constituting a population of interest.

When desired information is available for all objects in the population, we have what is called a census.

A subset of the population -- a sample -- is selected in some prescribed manner.

 

A variable is any characteristic whose value may change from one object to another in the population.

  • Univariate data set consists of observations on a single variable.
  • Bivariate data is when observations are made on each of two variables.
  • Multivariate data arises when observations are made on more than two variables.

Branches of Statistics

  • An investigator who has collected data may wish simply to summarize and describe important features of the data. This entails using methods from descriptive statisitcs.
  • Techniques for generlizing from a sample to a population are gathered within the branch of our discipline called inferential statisitcs.

Enumerative Versus Analytic Studies

  • Enumerative studies, interest is focused on a finite, indetifiable, unchanging collection of individuals or objects that make up a population.
  • Analytic studies are often carried out with the objective of improving a future product by taking action on a process of some sort.

Collecting Data

 

2. Pictorial and Tabular Methods in Descriptive Statistics

Notation

The number of observations in a single sample will often be denoted by n.

Given a data set consisting of n obversations on some variable x, the individual observations will be denoted by x1,x2,x3,...,xn.

Stem-and-Leaf Displays

Steps for Constructing a Stem-and-Leaf Display

  1. Select one or more leading digits for the stem values. The trailing digits become the leaves.
  2. List possible stem values in a vertival column.
  3. Record the leaf for every observation beside the corresponding stem value.
  4. Indicate the units for stems and leaves somplace in the display.

Dotplots

A dotplots is an attractive summary of numerical data when the data set is reasonably samll or there ar relatively few distinct data values. Each observation is represented by a dot above the corresponding location on a horizontal measurement scale.

Histograms

  • A variable is discrete if its set of possible values either is finite or else can be list in an infinite sequence.
  • A variable is continuous if its possible values consist of an entire interval on the number line.

Consider data consisting of observations on a discrete variable x.

  • The frequency of any particular x value is the number of times that value occurs in the data set.
  • The relative frequency of a value is the fraction or proportion of time the value occurs.
  • A frequency distribution is a tabulation of the frequencies and/or relative frequency.

Histogram Shapes

Histograms come in a variety of shapes.

  • Unimodal histogram is one that rises to a single peak and then delines.
  • Bimodal histogram has two differernt peaks.
  • Multimodal histogram has more than two peaks.

 

  • A histogram is symmetric if the left half is the mirror image of the right half.
  • A unimodal is positively skewed if the right or upper tail is stretched our compared with the left or lower tail and negatively skewed if the stretching is to the left.

Qualitative Data

Multivariate Data

 

3. Measures of Location

The Mean

For a given set of number x1,x2,x3,...,xn, the most familiar and useful measure of the center is the mean, or arithmetic average of the set.

The Median

The word median is synonymous with "middle", and the sample median is indeed the middle value when the observations are ordered from smallest to largest.

Other Measures of Location: Quartiles, Percentiles, and Trimmed Means

A trimmed mean is a conpromise between mean and median. A 10% trimmed mean, for example, would be computed by eliminating the smallest 10% and the largest 10% of sample and then averaging what is left over.

Categorical Data and Sample Proportions

 

4. Measures of Variability

Measures of Variability for Sample Data

The simplest measure of variability in a sample is the range, which is the difference between the largest and smallest sample values.

The sample variance, denoted by s2;

The sample standard deviation, denoted by s.

Motivation for s2

We will use σ2 to denote the population variance and σ to denote the population standard deviation.

It is customary to refer to s2 as being based on n-1 degrees of freedom(df).

This terminology results from the fact that although s2 is based on the n quantities, these sum to 0, so specifying the values of any n-1 of the quantities determines the remaining value. For example, if n=4 and x1-x=8,x2-x=-6,x4-x=-4, then automatically we have x3-x=2, so only 3 of the 4 values of xi-x are freely determined(3df).

A Computing Formula for s2 

Boxplots

After the n observations in a data set are ordered from smallest to largest, the lower fourth and upper fourth are given by:

lower fourth:

  • median of the smallest n/2 observations, n even
  • median of the smallest (n+1)/2 observations, n odd

upper fourth:

  • median of the largest n/2 observations, n even
  • median of the largest (n+1)/2 observations, n odd

That is, the lower(upper) fourth is hte median of the smallest(largest) half of the data, where the median is included in both halves if n is odd. A measure of spread that is resistant to ourliersis th fourth spread ƒs, given by:

ƒ = upper fourth - lower fourth

Boxplots that Show Outliers

Any observation father than 1.5ƒs from the closest fourth is an outlier. An outlier is extreme if it is more than 3ƒs from the nearest fourth, and it is mild otherwise.

Comparative Boxplots

A comparative or side-by-side boxplot is a very effective way of revealing similarities and differences between two or more data sets consisting of observations on the same variable.

转载于:https://www.cnblogs.com/cyoutetsu/p/6801925.html

你可能感兴趣的文章
Java学习笔记(三)——静态导入,package-info,Fall-through
查看>>
51nod 1836:战忽局的手段(期望)
查看>>
java.sql.SQLException: Before start of result set
查看>>
SpringCloud——简介,5大组件
查看>>
如何让360、遨游、猎豹等双核浏览器默认以webkit内核渲染网页?
查看>>
windows下如何快速优雅的使用python的科学计算库?
查看>>
判断某一年是否为闰年
查看>>
Linux环境实现python远程可视编程
查看>>
【2017-12-09】c#基础-普通集合、泛型集合、哈希表集合、字典集合、队列集合、栈桥集合...
查看>>
android学习从模仿开始 —— 模仿UI 导航帖
查看>>
nginx优化 实现10万并发访问量
查看>>
IOS--时间 NSDate,NSTimeInterval的一些转换
查看>>
C语言- for 语句
查看>>
对Servlet请求或响应进行JMockit测试
查看>>
iOS---iPad开发及iPad特有的特技
查看>>
关于enum类型的本地化的一种方法探索:
查看>>
(转)625某电商网站数据库宕机故障解决实录(上,下)
查看>>
HDU3344(小广搜+小暴力
查看>>
hdu1151 二分图(无回路有向图)的最小路径覆盖 Air Raid
查看>>
哈希URAL 1941 - Scary Martian Word
查看>>