banner



Are Sample Size And Replication The Same Thing

Statistical mode determining sample size of population

Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practise, the sample size used in a study is unremarkably adamant based on the cost, time, or convenience of collecting the data, and the demand for information technology to offer sufficient statistical power. In complicated studies there may be several unlike sample sizes: for case, in a stratified survey there would be different sizes for each stratum. In a census, data is sought for an entire population, hence the intended sample size is equal to the population. In experimental blueprint, where a report may be divided into different treatment groups, in that location may be different sample sizes for each group.

Sample sizes may be chosen in several ways:

  • using experience – pocket-size samples, though sometimes unavoidable, can result in wide confidence intervals and risk of errors in statistical hypothesis testing.
  • using a target variance for an guess to exist derived from the sample eventually obtained, i.e. if a high precision is required (narrow confidence interval) this translates to a low target variance of the figurer.
  • using a target for the power of a statistical test to be practical in one case the sample is collected.
  • using a confidence level, i.eastward. the larger the required confidence level, the larger the sample size (given a constant precision requirement).

Introduction [edit]

Larger sample sizes generally pb to increased precision when estimating unknown parameters. For case, if we wish to know the proportion of a certain species of fish that is infected with a pathogen, we would generally have a more precise approximate of this proportion if nosotros sampled and examined 200 rather than 100 fish. Several fundamental facts of mathematical statistics describe this phenomenon, including the police force of large numbers and the central limit theorem.

In some situations, the increase in precision for larger sample sizes is minimal, or even not-existent. This can upshot from the presence of systematic errors or strong dependence in the data, or if the data follows a heavy-tailed distribution.

Sample sizes may be evaluated by the quality of the resulting estimates. For example, if a proportion is being estimated, one may wish to have the 95% conviction interval be less than 0.06 units broad. Alternatively, sample size may be assessed based on the power of a hypothesis test. For example, if we are comparing the back up for a certain political candidate amidst women with the back up for that candidate among men, we may wish to have fourscore% power to detect a divergence in the back up levels of 0.04 units.

Estimation [edit]

Estimation of a proportion [edit]

A relatively simple situation is estimation of a proportion. For example, we may wish to gauge the proportion of residents in a community who are at least 65 years one-time.

The estimator of a proportion is p ^ = Ten / n {\displaystyle {\hat {p}}=10/northward} , where X is the number of 'positive' observations (e.g. the number of people out of the n sampled people who are at least 65 years old). When the observations are independent, this estimator has a (scaled) binomial distribution (and is as well the sample mean of information from a Bernoulli distribution). The maximum variance of this distribution is 0.25, which occurs when the true parameter is p = 0.5. In practice, since p is unknown, the maximum variance is often used for sample size assessments. If a reasonable judge for p is known the quantity p ( 1 p ) {\displaystyle p(1-p)} may be used in place of 0.25.

For sufficiently large north, the distribution of p ^ {\displaystyle {\chapeau {p}}} will be closely approximated by a normal distribution.[one] Using this and the Wald method for the binomial distribution, yields a confidence interval of the form

( p ^ Z 0.25 northward , p ^ + Z 0.25 due north ) {\displaystyle \left({\widehat {p}}-Z{\sqrt {\frac {0.25}{due north}}},\quad {\widehat {p}}+Z{\sqrt {\frac {0.25}{northward}}}\right)} ,
where Z is a standard Z-score for the desired level of confidence (ane.96 for a 95% confidence interval).

If we wish to have a confidence interval that is W units total in width (W/2 on each side of the sample mean), nosotros would solve

Z 0.25 n = W / 2 {\displaystyle Z{\sqrt {\frac {0.25}{n}}}=W/2}

for n, yielding the sample size

sample sizes for binomial proportions given different confidence levels and margins of fault

north = Z 2 W 2 {\displaystyle north={\frac {Z^{2}}{West^{2}}}} , in the case of using .five every bit the most bourgeois estimate of the proportion. (Note: West/2 = margin of error.)

In the figure below ane can discover how sample sizes for binomial proportions change given different confidence levels and margins of error.

Otherwise, the formula would be Z p ( 1 p ) n = W / ii {\displaystyle Z{\sqrt {\frac {p(1-p)}{n}}}=Due west/2} , which yields n = 4 Z ii p ( 1 p ) W 2 {\displaystyle north={\frac {4Z^{2}p(1-p)}{W^{ii}}}} .

For example, if we are interested in estimating the proportion of the The states population who supports a item presidential candidate, and nosotros want the width of 95% confidence interval to be at most 2 percentage points (0.02), and then we would need a sample size of (1.962)/(0.02two) = 9604. Information technology is reasonable to use the 0.5 estimate for p in this case because the presidential races are often shut to 50/fifty, and information technology is also prudent to utilize a conservative estimate. The margin of mistake in this case is 1 per centum point (half of 0.02).

The foregoing is ordinarily simplified...

( p ^ 1.96 0.25 n , p ^ + 1.96 0.25 north ) {\displaystyle \left({\widehat {p}}-1.96{\sqrt {\frac {0.25}{northward}}},{\widehat {p}}+1.96{\sqrt {\frac {0.25}{n}}}\right)}

will course a 95% confidence interval for the true proportion. If this interval needs to exist no more than than W units wide, the equation

4 0.25 northward = W {\displaystyle 4{\sqrt {\frac {0.25}{n}}}=W}

tin exist solved for n, yielding[2] [3] north = 4/West two = one/B 2 where B is the error bound on the gauge, i.e., the guess is usually given as within ± B. So, for B = 10% i requires n = 100, for B = five% one needs n = 400, for B = 3% the requirement approximates to north = thousand, while for B = 1% a sample size of n = 10000 is required. These numbers are quoted oftentimes in news reports of opinion polls and other sample surveys. However, e'er remember that the results reported may not be the exact value as numbers are preferably rounded up. Knowing that the value of the n is the minimum number of sample points needed to acquire the desired consequence, the number of respondents then must lie on or above the minimum.

Estimation of a hateful [edit]

A proportion is a special case of a hateful. When estimating the population mean using an independent and identically distributed (iid) sample of size n, where each data value has variance σ 2, the standard error of the sample mean is:

σ n . {\displaystyle {\frac {\sigma }{\sqrt {northward}}}.}

This expression describes quantitatively how the estimate becomes more precise as the sample size increases. Using the cardinal limit theorem to justify approximating the sample hateful with a normal distribution yields a conviction interval of the form

( x ¯ Z σ due north , x ¯ + Z σ n ) {\displaystyle \left({\bar {ten}}-{\frac {Z\sigma }{\sqrt {due north}}},\quad {\bar {x}}+{\frac {Z\sigma }{\sqrt {north}}}\right)} ,
where Z is a standard Z-score for the desired level of confidence (1.96 for a 95% conviction interval).

If we wish to take a confidence interval that is W units full in width (W/2 on each side of the sample mean), we would solve

Z σ n = West / 2 {\displaystyle {\frac {Z\sigma }{\sqrt {n}}}=W/2}

for n, yielding the sample size

n = 4 Z 2 σ two West 2 {\displaystyle n={\frac {4Z^{2}\sigma ^{2}}{Westward^{ii}}}} . (Note: W/2 = margin of fault.)

For example, if nosotros are interested in estimating the corporeality by which a drug lowers a field of study's claret pressure with a 95% confidence interval that is half dozen units broad, and we know that the standard divergence of claret pressure in the population is 15, then the required sample size is 4 × 1.96 ii × 15 2 6 2 = 96.04 {\displaystyle {\frac {4\times 1.96^{ii}\times xv^{ii}}{6^{2}}}=96.04} , which would be rounded up to 97, considering the obtained value is the minimum sample size, and sample sizes must be integers and must lie on or above the calculated minimum.

Required sample sizes for hypothesis tests [edit]

A common trouble faced by statisticians is calculating the sample size required to yield a sure ability for a test, given a predetermined Type I error charge per unit α. As follows, this can be estimated past pre-determined tables for certain values, by Mead'southward resource equation, or, more generally, by the cumulative distribution part:

Tables [edit]

[4]

  Power

Cohen's d
0.2 0.5 0.8
0.25 84 14 half-dozen
0.fifty 193 32 xiii
0.60 246 forty 16
0.70 310 50 20
0.fourscore 393 64 26
0.90 526 85 34
0.95 651 105 42
0.99 920 148 58

The tabular array shown on the right can exist used in a ii-sample t-test to estimate the sample sizes of an experimental group and a control group that are of equal size, that is, the full number of individuals in the trial is twice that of the number given, and the desired significance level is 0.05.[4] The parameters used are:

  • The desired statistical ability of the trial, shown in column to the left.
  • Cohen's d (= effect size), which is the expected difference betwixt the ways of the target values betwixt the experimental group and the control group, divided by the expected standard deviation.

Mead's resource equation [edit]

Mead's resource equation is oftentimes used for estimating sample sizes of laboratory animals, too as in many other laboratory experiments. It may not exist as accurate as using other methods in estimating sample size, but gives a hint of what is the appropriate sample size where parameters such equally expected standard deviations or expected differences in values between groups are unknown or very hard to estimate.[v]

All the parameters in the equation are in fact the degrees of freedom of the number of their concepts, and hence, their numbers are subtracted by one before insertion into the equation.

The equation is:[5]

Due east = Northward B T , {\displaystyle Eastward=N-B-T,}

where:

  • Due north is the total number of individuals or units in the written report (minus ane)
  • B is the blocking component, representing environmental effects immune for in the blueprint (minus one)
  • T is the treatment component, corresponding to the number of treatment groups (including control group) beingness used, or the number of questions being asked (minus 1)
  • E is the degrees of liberty of the mistake component, and should be somewhere between 10 and 20.

For example, if a written report using laboratory animals is planned with four treatment groups (T=3), with eight animals per group, making 32 animals total (N=31), without any farther stratification (B=0), and so E would equal 28, which is above the cutoff of 20, indicating that sample size may be a bit too big, and six animals per group might be more than advisable.[6]

Cumulative distribution function [edit]

Permit Teni , i = 1, 2, ..., north exist contained observations taken from a normal distribution with unknown mean μ and known variance σ2. Consider 2 hypotheses, a null hypothesis:

H 0 : μ = 0 {\displaystyle H_{0}:\mu =0}

and an culling hypothesis:

H a : μ = μ {\displaystyle H_{a}:\mu =\mu ^{*}}

for some 'smallest significant difference' μ * > 0. This is the smallest value for which we care about observing a difference. At present, if we wish to (ane) reject H 0 with a probability of at least ane −β when H a is truthful (i.due east. a power of 1 −β), and (two) turn down H 0 with probability α when H 0 is truthful, so we need the following:

If z α is the upper α per centum point of the standard normal distribution, and so

Pr ( ten ¯ > z α σ / n H 0 ) = α {\displaystyle \Pr({\bar {10}}>z_{\alpha }\sigma /{\sqrt {n}}\mid H_{0})=\alpha }

and so

'Reject H 0 if our sample average ( x ¯ {\displaystyle {\bar {x}}} ) is more than z α σ / northward {\displaystyle z_{\alpha }\sigma /{\sqrt {northward}}} '

is a conclusion rule which satisfies (2). (This is a 1-tailed test.)

Now nosotros wish for this to happen with a probability at least 1 −β when H a is true. In this instance, our sample average will come up from a Normal distribution with mean μ*. Therefore, nosotros require

Pr ( x ¯ > z α σ / n H a ) one β {\displaystyle \Pr({\bar {x}}>z_{\blastoff }\sigma /{\sqrt {north}}\mid H_{a})\geq 1-\beta }

Through conscientious manipulation, this can be shown (see Statistical power#Instance) to happen when

n ( z α + Φ 1 ( 1 β ) μ / σ ) 2 {\displaystyle n\geq \left({\frac {z_{\blastoff }+\Phi ^{-1}(1-\beta )}{\mu ^{*}/\sigma }}\right)^{2}}

where Φ {\displaystyle \Phi } is the normal cumulative distribution function.

Stratified sample size [edit]

With more than complicated sampling techniques, such as stratified sampling, the sample can often be split up into sub-samples. Typically, if there are H such sub-samples (from H different strata) so each of them will accept a sample size nh , h = 1, 2, ..., H. These nh must conform to the dominion that n 1 + n 2 + ... + north H = n (i.due east. that the total sample size is given by the sum of the sub-sample sizes). Selecting these northwardh optimally can exist done in various ways, using (for example) Neyman'south optimal resource allotment.

In that location are many reasons to employ stratified sampling:[7] to decrease variances of sample estimates, to use partly not-random methods, or to study strata individually. A useful, partly non-random method would be to sample individuals where easily accessible, but, where not, sample clusters to save travel costs.[eight]

In general, for H strata, a weighted sample mean is

ten ¯ westward = h = 1 H Westward h x ¯ h , {\displaystyle {\bar {x}}_{westward}=\sum _{h=1}^{H}W_{h}{\bar {10}}_{h},}

with

Var ( x ¯ due west ) = h = one H Due west h two Var ( x ¯ h ) . {\displaystyle \operatorname {Var} ({\bar {x}}_{westward})=\sum _{h=i}^{H}W_{h}^{2}\operatorname {Var} ({\bar {x}}_{h}).} [ix]

The weights, Westward h {\displaystyle W_{h}} , frequently, but not always, correspond the proportions of the population elements in the strata, and W h = North h / North {\displaystyle W_{h}=N_{h}/N} . For a fixed sample size, that is n = northward h {\displaystyle northward=\sum n_{h}} ,

Var ( ten ¯ w ) = h = 1 H W h ii Var ( x ¯ h ) ( 1 due north h 1 N h ) , {\displaystyle \operatorname {Var} ({\bar {ten}}_{w})=\sum _{h=1}^{H}W_{h}^{2}\operatorname {Var} ({\bar {10}}_{h})\left({\frac {1}{n_{h}}}-{\frac {one}{N_{h}}}\right),} [10]

which can exist made a minimum if the sampling rate within each stratum is fabricated proportional to the standard deviation within each stratum: n h / North h = k S h {\displaystyle n_{h}/N_{h}=kS_{h}} , where S h = Var ( 10 ¯ h ) {\displaystyle S_{h}={\sqrt {\operatorname {Var} ({\bar {10}}_{h})}}} and 1000 {\displaystyle k} is a constant such that due north h = north {\displaystyle \sum {n_{h}}=northward} .

An "optimum allocation" is reached when the sampling rates within the strata are made direct proportional to the standard deviations inside the strata and inversely proportional to the square root of the sampling cost per chemical element within the strata, C h {\displaystyle C_{h}} :

n h N h = K Southward h C h , {\displaystyle {\frac {n_{h}}{N_{h}}}={\frac {KS_{h}}{\sqrt {C_{h}}}},} [11]

where K {\displaystyle K} is a constant such that n h = n {\displaystyle \sum {n_{h}}=n} , or, more mostly, when

n h = K W h Southward h C h . {\displaystyle n_{h}={\frac {K'W_{h}S_{h}}{\sqrt {C_{h}}}}.} [12]

Qualitative research [edit]

Sample size decision in qualitative studies takes a different approach. It is generally a subjective judgment, taken as the research proceeds.[13] One approach is to go along to include further participants or fabric until saturation is reached.[14] The number needed to reach saturation has been investigated empirically.[15] [16] [17] [eighteen]

There is a paucity of reliable guidance on estimating sample sizes before starting the research, with a range of suggestions given.[16] [nineteen] [20] [21] A tool akin to a quantitative power calculation, based on the negative binomial distribution, has been suggested for thematic analysis.[22] [21]

See also [edit]

  • Blueprint of experiments
  • Engineering science response surface example under Stepwise regression
  • Cohen'south h

Notes [edit]

  1. ^ NIST/SEMATECH, "7.two.4.ii. Sample sizes required", east-Handbook of Statistical Methods.
  2. ^ "Inference for Regression". utdallas.edu.
  3. ^ "Conviction Interval for a Proportion" Archived 2011-08-23 at the Wayback Auto
  4. ^ a b Chapter xiii, page 215, in: Kenny, David A. (1987). Statistics for the social and behavioral sciences. Boston: Little, Brownish. ISBN978-0-316-48915-seven.
  5. ^ a b Kirkwood, James; Robert Hubrecht (2010). The UFAW Handbook on the Care and Management of Laboratory and Other Research Animals. Wiley-Blackwell. p. 29. ISBN978-1-4051-7523-4. online Page 29
  6. ^ Isogenic.info > Resources equation by Michael FW Festing. Updated Sept. 2006
  7. ^ Kish (1965, Section iii.1)
  8. ^ Kish (1965), p. 148.
  9. ^ Kish (1965), p. 78.
  10. ^ Kish (1965), p. 81.
  11. ^ Kish (1965), p. 93.
  12. ^ Kish (1965), p. 94.
  13. ^ Sandelowski, Yard. (1995). Sample size in qualitative enquiry. Research in Nursing & Wellness, 18, 179–183
  14. ^ Glaser, B. (1965). The abiding comparative method of qualitative assay. Social Problems, 12, 436–445
  15. ^ Francis, Jill J.; Johnston, Marie; Robertson, Clare; Glidewell, Liz; Entwistle, Vikki; Eccles, Martin P.; Grimshaw, Jeremy Chiliad. (2010). "What is an adequate sample size? Operationalising data saturation for theory-based interview studies" (PDF). Psychology & Health. 25 (10): 1229–1245. doi:10.1080/08870440903194015. PMID 20204937. S2CID 28152749.
  16. ^ a b Guest, Greg; Bunce, Arwen; Johnson, Laura (2006). "How Many Interviews Are Enough?". Field Methods. xviii: 59–82. doi:10.1177/1525822X05279903. S2CID 62237589.
  17. ^ Wright, Adam; Maloney, Francine Fifty.; Feblowitz, Joshua C. (2011). "Clinician attitudes toward and employ of electronic problem lists: A thematic assay". BMC Medical Computer science and Determination Making. eleven: 36. doi:10.1186/1472-6947-11-36. PMC3120635. PMID 21612639.
  18. ^ Stonemason, Mark (2010). "Sample Size and Saturation in PhD Studies Using Qualitative Interviews". Forum Qualitative Sozialforschung. 11 (3): viii.
  19. ^ Emmel, N. (2013). Sampling and choosing cases in qualitative enquiry: A realist approach. London: Sage.
  20. ^ Onwuegbuzie, Anthony J.; Leech, Nancy L. (2007). "A Telephone call for Qualitative Ability Analyses". Quality & Quantity. 41: 105–121. doi:10.1007/s11135-005-1098-i. S2CID 62179911.
  21. ^ a b Fugard AJB; Potts HWW (10 February 2015). "Supporting thinking on sample sizes for thematic analyses: A quantitative tool" (PDF). International Journal of Social Research Methodology. xviii (vi): 669–684. doi:10.1080/13645579.2015.1005453. S2CID 59047474.
  22. ^ Galvin R (2015). How many interviews are plenty? Do qualitative interviews in building energy consumption research produce reliable knowledge? Journal of Edifice Engineering, 1:2–12.

References [edit]

  • Bartlett, J. Due east., II; Kotrlik, J. W.; Higgins, C. (2001). "Organizational research: Determining appropriate sample size for survey research" (PDF). It, Learning, and Performance Journal. 19 (1): 43–fifty.
  • Kish, Fifty. (1965). Survey Sampling . Wiley. ISBN978-0-471-48900-ix.
  • Smith, Scott (eight April 2013). "Determining Sample Size: How to Ensure Yous Become the Correct Sample Size". Qualtrics . Retrieved xix September 2018.
  • Israel, Glenn D. (1992). "Determining Sample Size". University of Florida, PEOD-vi . Retrieved 29 June 2019.
  • Rens van de Schoot, Milica Miočević (eds.). 2020. Small Sample Size Solutions (Open up Admission): A Guide for Applied Researchers and Practitioners. Routledge.

Further reading [edit]

  • NIST: Selecting Sample Sizes
  • ASTM E122-07: Standard Exercise for Computing Sample Size to Guess, With Specified Precision, the Average for a Characteristic of a Lot or Process

External links [edit]

  • A MATLAB script implementing Cochran'southward sample size formula

Are Sample Size And Replication The Same Thing,

Source: https://en.wikipedia.org/wiki/Sample_size_determination

Posted by: dexterworly1999.blogspot.com

0 Response to "Are Sample Size And Replication The Same Thing"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel