Confidence intervals for frequencies and proportions. Confidence interval. Confidence Probability Why a Confidence Interval is Needed

The calculation of the confidence interval is based on the average error of the corresponding parameter. Confidence interval shows within what limits with probability (1-a) is the true value of the estimated parameter. Here a is the significance level, (1-a) is also called the confidence level.

In the first chapter, we showed that, for example, for the arithmetic mean, the true population mean lies within 2 mean errors of the mean about 95% of the time. Thus, the boundaries of the 95% confidence interval for the mean will be from the sample mean by twice the mean error of the mean, i.e. we multiply the mean error of the mean by some factor that depends on the confidence level. For the mean and the difference of the means, the Student's coefficient (the critical value of the Student's criterion) is taken, for the share and difference of the shares, the critical value of the z criterion. The product of the coefficient and the average error can be called the marginal error of this parameter, i.e. the maximum that we can get when evaluating it.

Confidence interval for arithmetic mean : .

Here is the sample mean;

Average error of the arithmetic mean;

s- sample standard deviation;

n

f = n-1 (Student's coefficient).

Confidence interval for difference of arithmetic means :

Here, is the difference between the sample means;

- the average error of the difference of arithmetic means;

s 1 ,s 2 - sample standard deviations;

n1,n2

Critical value of the Student's criterion for a given level of significance a and the number of degrees of freedom f=n1 +n2-2 (Student's coefficient).

Confidence interval for shares :

.

Here d is the sample share;

– average share error;

n– sample size (group size);

Confidence interval for share differences :

Here, is the difference between the sample shares;

is the mean error of the difference between the arithmetic means;

n1,n2– sample sizes (number of groups);

The critical value of the criterion z at a given significance level a ( , , ).

By calculating the confidence intervals for the difference in indicators, we, firstly, directly see the possible values ​​of the effect, and not just its point estimate. Secondly, we can draw a conclusion about the acceptance or refutation of the null hypothesis and, thirdly, we can draw a conclusion about the power of the criterion.

When testing hypotheses using confidence intervals, the following rule should be followed:

If the 100(1-a)-percent confidence interval of the mean difference does not contain zero, then the differences are statistically significant at the a significance level; on the contrary, if this interval contains zero, then the differences are not statistically significant.

Indeed, if this interval contains zero, then, it means that the compared indicator can be either more or less in one of the groups compared to the other, i.e. the observed differences are random.

By the place where zero is located within the confidence interval, one can judge the power of the criterion. If zero is close to the lower or upper limit of the interval, then perhaps with a larger number of compared groups, the differences would reach statistical significance. If zero is close to the middle of the interval, then it means that both the increase and decrease of the indicator in the experimental group are equally probable, and, probably, there really are no differences.

Examples:

To compare surgical mortality when using two different types of anesthesia: 61 people were operated on using the first type of anesthesia, 8 died, using the second - 67 people, 10 died.

d 1 \u003d 8/61 \u003d 0.131; d 2 \u003d 10/67 \u003d 0.149; d1-d2 = - 0.018.

The difference in lethality of the compared methods will be in the range (-0.018 - 0.122; -0.018 + 0.122) or (-0.14; 0.104) with a probability of 100(1-a) = 95%. The interval contains zero, i.e. the hypothesis of the same lethality with two different types of anesthesia cannot be rejected.

Thus, mortality can and will decrease to 14% and increase to 10.4% with a probability of 95%, i.e. zero is approximately in the middle of the interval, so it can be argued that, most likely, these two methods really do not differ in lethality.

In the example considered earlier, the average tapping time was compared in four groups of students differing in their examination scores. Let's calculate the confidence intervals of the average pressing time for students who passed the exam for 2 and 5 and the confidence interval for the difference between these averages.

Student's coefficients are found from the tables of Student's distribution (see Appendix): for the first group: = t(0.05;48) = 2.011; for the second group: = t(0.05;61) = 2.000. Thus, confidence intervals for the first group: = (162.19-2.011 * 2.18; 162.19 + 2.011 * 2.18) = (157.8; 166.6) , for the second group (156.55- 2.000*1.88 ; 156.55+2.000*1.88) = (152.8 ; 160.3). So, for those who passed the exam for 2, the average pressing time ranges from 157.8 ms to 166.6 ms with a probability of 95%, for those who passed the exam for 5 - from 152.8 ms to 160.3 ms with a probability of 95%.

You can also test the null hypothesis using confidence intervals for the means, and not just for the difference in the means. For example, as in our case, if the confidence intervals for the means overlap, then the null hypothesis cannot be rejected. In order to reject a hypothesis at a chosen significance level, the corresponding confidence intervals must not overlap.

Let's find the confidence interval for the difference in the average pressing time in the groups who passed the exam for 2 and 5. The difference in the averages: 162.19 - 156.55 = 5.64. Student's coefficient: \u003d t (0.05; 49 + 62-2) \u003d t (0.05; 109) \u003d 1.982. Group standard deviations will be equal to: ; . We calculate the average error of the difference between the means: . Confidence interval: \u003d (5.64-1.982 * 2.87; 5.64 + 1.982 * 2.87) \u003d (-0.044; 11.33).

So, the difference in the average pressing time in the groups that passed the exam at 2 and at 5 will be in the range from -0.044 ms to 11.33 ms. This interval includes zero, i.e. the average pressing time for those who passed the exam with excellent results can both increase and decrease compared to those who passed the exam unsatisfactorily, i.e. the null hypothesis cannot be rejected. But zero is very close to the lower limit, the time of pressing is much more likely to decrease for excellent passers. Thus, we can conclude that there are still differences in the average click time between those who passed by 2 and by 5, we just could not detect them for a given change in the average time, the spread of the average time and sample sizes.

The power of the test is the probability of rejecting an incorrect null hypothesis, i.e. find differences where they really are.

The power of the test is determined based on the level of significance, the magnitude of differences between groups, the spread of values ​​in groups, and the sample size.

For Student's t-test and analysis of variance, you can use sensitivity charts.

The power of the criterion can be used in the preliminary determination of the required number of groups.

The confidence interval shows within what limits the true value of the estimated parameter lies with a given probability.

With the help of confidence intervals, you can test statistical hypotheses and draw conclusions about the sensitivity of the criteria.

LITERATURE.

Glantz S. - Chapter 6.7.

Rebrova O.Yu. - p.112-114, p.171-173, p.234-238.

Sidorenko E. V. - pp. 32-33.

Questions for self-examination of students.

1. What is the power of the criterion?

2. In what cases is it necessary to evaluate the power of criteria?

3. Methods for calculating power.

6. How to test a statistical hypothesis using a confidence interval?

7. What can be said about the power of the criterion when calculating the confidence interval?

Tasks.

Confidence interval(CI; in English, confidence interval - CI) obtained in the study at the sample gives a measure of the accuracy (or uncertainty) of the results of the study, in order to draw conclusions about the population of all such patients (general population). The correct definition of 95% CI can be formulated as follows: 95% of such intervals will contain the true value in the population. This interpretation is somewhat less accurate: CI is the range of values ​​within which you can be 95% sure that it contains the true value. When using CI, the emphasis is on determining the quantitative effect, as opposed to the P value, which is obtained as a result of testing for statistical significance. The P value does not evaluate any amount, but rather serves as a measure of the strength of the evidence against the null hypothesis of "no effect". The value of P by itself does not tell us anything about the magnitude of the difference, or even about its direction. Therefore, independent values ​​of P are absolutely uninformative in articles or abstracts. In contrast, CI indicates both the amount of effect of immediate interest, such as the usefulness of a treatment, and the strength of the evidence. Therefore, DI is directly related to the practice of DM.

The scoring approach to statistical analysis, illustrated by CI, aims to measure the magnitude of the effect of interest (sensitivity of the diagnostic test, predicted incidence, relative risk reduction with treatment, etc.) and to measure the uncertainty in that effect. Most often, the CI is the range of values ​​on either side of the estimate that the true value is likely to lie in, and you can be 95% sure of it. The convention to use the 95% probability is arbitrary, as well as the value of P<0,05 для оценки статистической значимости, и авторы иногда используют 90% или 99% ДИ. Заметим, что слово «интервал» означает диапазон величин и поэтому стоит в единственном числе. Две величины, которые ограничивают интервал, называются «доверительными пределами».

The CI is based on the idea that the same study performed on different sets of patients would not produce identical results, but that their results would be distributed around the true but unknown value. In other words, the CI describes this as "sample-dependent variability". The CI does not reflect additional uncertainty due to other causes; in particular, it does not include the effects of selective loss of patients on tracking, poor compliance or inaccurate outcome measurement, lack of blinding, etc. CI thus always underestimates the total amount of uncertainty.

Confidence Interval Calculation

Table A1.1. Standard errors and confidence intervals for some clinical measurements

Typically, CI is calculated from an observed estimate of a quantitative measure, such as the difference (d) between two proportions, and the standard error (SE) in the estimate of that difference. The approximate 95% CI thus obtained is d ± 1.96 SE. The formula changes according to the nature of the outcome measure and the coverage of the CI. For example, in a randomized placebo-controlled trial of acellular pertussis vaccine, whooping cough developed in 72 of 1670 (4.3%) infants who received the vaccine and 240 of 1665 (14.4%) in the control group. The percentage difference, known as the absolute risk reduction, is 10.1%. The SE of this difference is 0.99%. Accordingly, the 95% CI is 10.1% + 1.96 x 0.99%, i.e. from 8.2 to 12.0.

Despite different philosophical approaches, CIs and tests for statistical significance are closely related mathematically.

Thus, the value of P is “significant”, i.e. R<0,05 соответствует 95% ДИ, который исключает величину эффекта, указывающую на отсутствие различия. Например, для различия между двумя средними пропорциями это ноль, а для относительного риска или отношения шансов - единица. При некоторых обстоятельствах эти два подхода могут быть не совсем эквивалентны. Преобладающая точка зрения: оценка с помощью ДИ - предпочтительный подход к суммированию результатов исследования, но ДИ и величина Р взаимодополняющи, и во многих статьях используются оба способа представления результатов.

The uncertainty (inaccuracy) of the estimate, expressed in CI, is largely related to the square root of the sample size. Small samples provide less information than large samples, and CIs are correspondingly wider in smaller samples. For example, an article comparing the performance of three tests used to diagnose Helicobacter pylori infection reported a urea breath test sensitivity of 95.8% (95% CI 75-100). While the figure of 95.8% looks impressive, the small sample size of 24 adult H. pylori patients means that there is significant uncertainty in this estimate, as shown by the wide CI. Indeed, the lower limit of 75% is much lower than the 95.8% estimate. If the same sensitivity were observed in a sample of 240 people, then the 95% CI would be 92.5-98.0, giving more assurance that the test is highly sensitive.

In randomized controlled trials (RCTs), non-significant results (i.e., those with P > 0.05) are particularly susceptible to misinterpretation. The CI is particularly useful here as it indicates how compatible the results are with the clinically useful true effect. For example, in an RCT comparing suture versus staple anastomosis in the colon, wound infection developed in 10.9% and 13.5% of patients, respectively (P = 0.30). The 95% CI for this difference is 2.6% (-2 to +8). Even in this study, which included 652 patients, it remains likely that there is a modest difference in the incidence of infections resulting from the two procedures. The smaller the study, the greater the uncertainty. Sung et al. performed an RCT comparing octreotide infusion with emergency sclerotherapy for acute variceal bleeding in 100 patients. In the octreotide group, the bleeding arrest rate was 84%; in the sclerotherapy group - 90%, which gives P = 0.56. Note that rates of continued bleeding are similar to those of wound infection in the study mentioned. In this case, however, the 95% CI for difference in interventions is 6% (-7 to +19). This range is quite wide compared to a 5% difference that would be of clinical interest. It is clear that the study does not rule out a significant difference in efficacy. Therefore, the conclusion of the authors "octreotide infusion and sclerotherapy are equally effective in the treatment of bleeding from varices" is definitely not valid. In cases like this where the 95% CI for absolute risk reduction (ARR) includes zero, as here, the CI for NNT (number needed to treat) is rather difficult to interpret. . The NLP and its CI are obtained from the reciprocals of the ACP (multiplying them by 100 if these values ​​are given as percentages). Here we get NPP = 100: 6 = 16.6 with a 95% CI of -14.3 to 5.3. As can be seen from the footnote "d" in Table. A1.1, this CI includes values ​​for NTPP from 5.3 to infinity and NTLP from 14.3 to infinity.

CIs can be constructed for most commonly used statistical estimates or comparisons. For RCTs, it includes the difference between mean proportions, relative risks, odds ratios, and NRRs. Similarly, CIs can be obtained for all major estimates made in studies of diagnostic test accuracy—sensitivity, specificity, positive predictive value (all of which are simple proportions), and likelihood ratios—estimates obtained in meta-analyses and comparison-to-control studies. A personal computer program that covers many of these uses of DI is available with the second edition of Statistics with Confidence. Macros for calculating CIs for proportions are freely available for Excel and the statistical programs SPSS and Minitab at http://www.uwcm.ac.uk/study/medicine/epidemiology_statistics/research/statistics/proportions, htm.

Multiple evaluations of treatment effect

While the construction of CIs is desirable for primary outcomes of a study, they are not required for all outcomes. The CI concerns clinically important comparisons. For example, when comparing two groups, the correct CI is the one that is built for the difference between the groups, as shown in the examples above, and not the CI that can be built for the estimate in each group. Not only is it useless to give separate CIs for the scores in each group, this presentation can be misleading. Similarly, the correct approach when comparing treatment efficacy in different subgroups is to compare two (or more) subgroups directly. It is incorrect to assume that treatment is effective only in one subgroup if its CI excludes the value corresponding to no effect, while others do not. CIs are also useful when comparing results across multiple subgroups. On fig. A1.1 shows the relative risk of eclampsia in women with preeclampsia in subgroups of women from a placebo-controlled RCT of magnesium sulfate.

Rice. A1.2. The Forest Graph shows the results of 11 randomized clinical trials of bovine rotavirus vaccine for the prevention of diarrhea versus placebo. The 95% confidence interval was used to estimate the relative risk of diarrhea. The size of the black square is proportional to the amount of information. In addition, a summary estimate of treatment efficacy and a 95% confidence interval (indicated by a diamond) are shown. The meta-analysis used a random-effects model that exceeds some pre-established ones; for example, it could be the size used in calculating the sample size. Under a more stringent criterion, the entire range of CIs must show a benefit that exceeds a predetermined minimum.

We have already discussed the fallacy of taking the absence of statistical significance as an indication that two treatments are equally effective. It is equally important not to equate statistical significance with clinical significance. Clinical importance can be assumed when the result is statistically significant and the magnitude of the treatment response

Studies can show whether the results are statistically significant and which ones are clinically important and which are not. On fig. A1.2 shows the results of four trials for which the entire CI<1, т.е. их результаты статистически значимы при Р <0,05 , . После высказанного предположения о том, что клинически важным различием было бы сокращение риска диареи на 20% (ОР = 0,8), все эти испытания показали клинически значимую оценку сокращения риска, и лишь в исследовании Treanor весь 95% ДИ меньше этой величины. Два других РКИ показали клинически важные результаты, которые не были статистически значимыми. Обратите внимание, что в трёх испытаниях точечные оценки эффективности лечения были почти идентичны, но ширина ДИ различалась (отражает размер выборки). Таким образом, по отдельности доказательная сила этих РКИ различна.

In statistics, there are two types of estimates: point and interval. Point Estimation is a single sample statistic that is used to estimate a population parameter. For example, the sample mean is a point estimate of the population mean, and the sample variance S2- point estimate of the population variance σ2. it was shown that the sample mean is an unbiased estimate of the population expectation. The sample mean is called unbiased because the mean of all sample means (with the same sample size n) is equal to the mathematical expectation of the general population.

In order for the sample variance S2 became an unbiased estimator of the population variance σ2, the denominator of the sample variance should be set equal to n – 1 , but not n. In other words, the population variance is the average of all possible sample variances.

When estimating population parameters, it should be kept in mind that sample statistics such as , depend on specific samples. To take this fact into account, to obtain interval estimation the mathematical expectation of the general population analyze the distribution of sample means (for more details, see). The constructed interval is characterized by a certain confidence level, which is the probability that the true parameter of the general population is estimated correctly. Similar confidence intervals can be used to estimate the proportion of a feature R and the main distributed mass of the general population.

Download note in or format, examples in format

Construction of a confidence interval for the mathematical expectation of the general population with a known standard deviation

Building a confidence interval for the proportion of a trait in the general population

In this section, the concept of a confidence interval is extended to categorical data. This allows you to estimate the share of the trait in the general population R with a sample share RS= X/n. As mentioned, if the values nR and n(1 - p) exceed the number 5, the binomial distribution can be approximated by the normal one. Therefore, to estimate the share of a trait in the general population R it is possible to construct an interval whose confidence level is equal to (1 - α)x100%.


where pS- sample share of the feature, equal to X/n, i.e. the number of successes divided by the sample size, R- the share of the trait in the general population, Z is the critical value of the standardized normal distribution, n- sample size.

Example 3 Let's assume that a sample is extracted from the information system, consisting of 100 invoices completed during the last month. Let's say that 10 of these invoices are incorrect. In this way, R= 10/100 = 0.1. The 95% confidence level corresponds to the critical value Z = 1.96.

Thus, there is a 95% chance that between 4.12% and 15.88% of invoices contain errors.

For a given sample size, the confidence interval containing the proportion of the trait in the general population seems to be wider than for a continuous random variable. This is because measurements of a continuous random variable contain more information than measurements of categorical data. In other words, categorical data that takes only two values ​​contain insufficient information to estimate the parameters of their distribution.

ATcalculation of estimates drawn from a finite population

Estimation of mathematical expectation. Correction factor for the final population ( fpc) was used to reduce the standard error by a factor of . When calculating confidence intervals for estimates of population parameters, a correction factor is applied in situations where samples are drawn without replacement. Thus, the confidence interval for the mathematical expectation, having a confidence level equal to (1 - α)x100%, is calculated by the formula:

Example 4 To illustrate the application of a correction factor for a finite population, let us return to the problem of calculating the confidence interval for the average amount of invoices discussed in Example 3 above. Suppose that a company issues 5,000 invoices per month, and =110.27 USD, S= $28.95 N = 5000, n = 100, α = 0.05, t99 = 1.9842. According to formula (6) we get:

Estimation of the share of the feature. When choosing no return, the confidence interval for the proportion of the feature that has a confidence level equal to (1 - α)x100%, is calculated by the formula:

Confidence intervals and ethical issues

When sampling a population and formulating statistical inferences, ethical problems often arise. The main one is how the confidence intervals and point estimates of sample statistics agree. Publishing point estimates without specifying the appropriate confidence intervals (usually at 95% confidence levels) and the sample size from which they are derived can be misleading. This may give the user the impression that a point estimate is exactly what he needs to predict the properties of the entire population. Thus, it is necessary to understand that in any research, not point, but interval estimates should be put at the forefront. In addition, special attention should be paid to the correct choice of sample sizes.

Most often, the objects of statistical manipulations are the results of sociological surveys of the population on various political issues. At the same time, the results of the survey are placed on the front pages of newspapers, and the sampling error and the methodology of statistical analysis are printed somewhere in the middle. To prove the validity of the obtained point estimates, it is necessary to indicate the sample size on the basis of which they were obtained, the boundaries of the confidence interval and its significance level.

Next note

Materials from the book Levin et al. Statistics for managers are used. - M.: Williams, 2004. - p. 448–462

Central limit theorem states that for a sufficiently large sample size, the sample distribution of means can be approximated by a normal distribution. This property does not depend on the type of population distribution.

CONFIDENCE INTERVALS FOR FREQUENCIES AND PARTS

© 2008

National Institute of Public Health, Oslo, Norway

The article describes and discusses the calculation of confidence intervals for frequencies and proportions using the Wald, Wilson, Klopper-Pearson methods, using the angular transformation and the Wald method with Agresti-Cowll correction. The presented material provides general information about the methods for calculating confidence intervals for frequencies and proportions and is intended to arouse the interest of the journal's readers not only in using confidence intervals when presenting the results of their own research, but also in reading specialized literature before starting work on future publications.

Keywords: confidence interval, frequency, proportion

In one of the previous publications, the description of qualitative data was briefly mentioned and it was reported that their interval estimation is preferable to a point estimate for describing the frequency of occurrence of the studied characteristic in the general population. Indeed, since studies are conducted using sample data, the projection of the results on the general population must contain an element of inaccuracy in the sample estimate. The confidence interval is a measure of the accuracy of the estimated parameter. Interestingly, in some medical textbooks on basic statistics, the topic of confidence intervals for frequencies is completely ignored. In this article, we will consider several ways to calculate confidence intervals for frequencies, assuming sample characteristics such as non-recurrence and representativeness, as well as the independence of observations from each other. The frequency in this article is not understood as an absolute number showing how many times this or that value occurs in the aggregate, but a relative value that determines the proportion of study participants who have the trait under study.

In biomedical research, 95% confidence intervals are most commonly used. This confidence interval is the region within which the true proportion falls 95% of the time. In other words, it can be said with 95% certainty that the true value of the frequency of occurrence of a trait in the general population will be within the 95% confidence interval.

Most statistical textbooks for medical researchers report that the frequency error is calculated using the formula

where p is the frequency of occurrence of the feature in the sample (value from 0 to 1). In most domestic scientific articles, the value of the frequency of occurrence of a feature in the sample (p), as well as its error (s) in the form of p ± s, is indicated. It is more expedient, however, to present a 95% confidence interval for the frequency of occurrence of a trait in the general population, which will include values ​​from

before.

In some textbooks, it is recommended, for small samples, to replace the value of 1.96 with the value of t for N - 1 degrees of freedom, where N is the number of observations in the sample. The value of t is found in the tables for the t-distribution, which are available in almost all textbooks on statistics. The use of the distribution of t for the Wald method does not provide visible advantages over other methods discussed below, and therefore is not welcomed by some authors.

The above method for calculating confidence intervals for frequencies or fractions is named after Abraham Wald (Abraham Wald, 1902–1950), since it began to be widely used after the publication of Wald and Wolfowitz in 1939. However, the method itself was proposed by Pierre Simon Laplace (1749–1827) as early as 1812.

The Wald method is very popular, but its application is associated with significant problems. The method is not recommended for small sample sizes, as well as in cases where the frequency of occurrence of a feature tends to 0 or 1 (0% or 100%) and is simply not possible for frequencies of 0 and 1. In addition, the normal distribution approximation, which is used when calculating the error , "does not work" in cases where n p< 5 или n · (1 – p) < 5 . Более консервативные статистики считают, что n · p и n · (1 – p) должны быть не менее 10 . Более детальное рассмотрение метода Вальда показало, что полученные с его помощью доверительные интервалы в большинстве случаев слишком узки, то есть их применение ошибочно создает слишком оптимистичную картину, особенно при удалении частоты встречаемости признака от 0,5, или 50 % . К тому же при приближении частоты к 0 или 1 доверительный интревал может принимать отрицательные значения или превышать 1, что выглядит абсурдно для частот. Многие авторы совершенно справедливо не рекомендуют применять данный метод не только в уже упомянутых случаях, но и тогда, когда частота встречаемости признака менее 25 % или более 75 % . Таким образом, несмотря на простоту расчетов, метод Вальда может применяться лишь в очень ограниченном числе случаев. Зарубежные исследователи более категоричны в своих выводах и однозначно рекомендуют не применять этот метод для небольших выборок , а ведь именно с такими выборками часто приходится иметь дело исследователям-медикам.

Since the new variable is normally distributed, the lower and upper bounds of the 95% confidence interval for variable φ will be φ-1.96 and φ+1.96left">

Instead of 1.96 for small samples, it is recommended to substitute the value of t for N - 1 degrees of freedom. This method does not give negative values ​​and allows you to more accurately estimate the confidence intervals for frequencies than the Wald method. In addition, it is described in many domestic reference books on medical statistics, which, however, did not lead to its widespread use in medical research. Calculating confidence intervals using an angle transform is not recommended for frequencies approaching 0 or 1.

This is where the description of methods for estimating confidence intervals in most books on the basics of statistics for medical researchers usually ends, and this problem is typical not only for domestic, but also for foreign literature. Both methods are based on the central limit theorem, which implies a large sample.

Taking into account the shortcomings of estimating confidence intervals using the above methods, Clopper (Clopper) and Pearson (Pearson) proposed in 1934 a method for calculating the so-called exact confidence interval, taking into account the binomial distribution of the studied trait. This method is available in many online calculators, but the confidence intervals obtained in this way are in most cases too wide. At the same time, this method is recommended for use in cases where a conservative estimate is required. The degree of conservativeness of the method increases as the sample size decreases, especially for N< 15 . описывает применение функции биномиального распределения для анализа качественных данных с использованием MS Excel, в том числе и для определения доверительных интервалов, однако расчет последних для частот в электронных таблицах не «затабулирован» в удобном для пользователя виде, а потому, вероятно, и не используется большинством исследователей.

According to many statisticians, the most optimal estimate of confidence intervals for frequencies is carried out by the Wilson method, proposed back in 1927, but practically not used in domestic biomedical research. This method not only makes it possible to estimate confidence intervals for both very small and very high frequencies, but is also applicable to a small number of observations. In general, the confidence interval according to the Wilson formula has the form from



where it takes the value 1.96 when calculating the 95% confidence interval, N is the number of observations, and p is the frequency of the feature in the sample. This method is available in online calculators, so its application is not problematic. and do not recommend using this method for n p< 4 или n · (1 – p) < 4 по причине слишком грубого приближения распределения р к нормальному в такой ситуации, однако зарубежные статистики считают метод Уилсона применимым и для малых выборок .

In addition to the Wilson method, the Agresti–Caull-corrected Wald method is also believed to provide an optimal estimate of the confidence interval for frequencies. The Agresti-Coulle correction is a replacement in the Wald formula of the frequency of occurrence of a trait in the sample (p) by p`, when calculating which 2 is added to the numerator, and 4 is added to the denominator, that is, p` = (X + 2) / (N + 4), where X is the number of study participants who have the trait under study, and N is the sample size. This modification produces results very similar to those of the Wilson formula, except when the event rate approaches 0% or 100% and the sample is small. In addition to the above methods for calculating confidence intervals for frequencies, corrections for continuity have been proposed for both the Wald method and the Wilson method for small samples, but studies have shown that their use is inappropriate.

Consider the application of the above methods for calculating confidence intervals using two examples. In the first case, we study a large sample of 1,000 randomly selected study participants, of which 450 have the trait under study (it can be a risk factor, an outcome, or any other trait), which is a frequency of 0.45, or 45%. In the second case, the study is conducted using a small sample, say, only 20 people, and only 1 participant in the study (5%) has the trait under study. Confidence intervals for the Wald method, for the Wald method with Agresti-Coll correction, for the Wilson method were calculated using an online calculator developed by Jeff Sauro (http://www./wald.htm). Continuity-corrected Wilson confidence intervals were calculated using the calculator provided by Wassar Stats: Web Site for Statistical Computation (http://faculty.vassar.edu/lowry/prop1.html). Calculations using the Fisher angular transformation were performed "manually" using the critical value of t for 19 and 999 degrees of freedom, respectively. The calculation results are presented in the table for both examples.

Confidence intervals calculated in six different ways for the two examples described in the text

Confidence Interval Calculation Method

P=0.0500, or 5%

95% CI for X=450, N=1000, P=0.4500, or 45%

–0,0455–0,2541

Walda with Agresti-Coll correction

<,0001–0,2541

Wilson with continuity correction

Klopper-Pearson's "exact method"

Angular transformation

<0,0001–0,1967

As can be seen from the table, for the first example, the confidence interval calculated by the "generally accepted" Wald method goes into the negative region, which cannot be the case for frequencies. Unfortunately, such incidents are not uncommon in Russian literature. The traditional way of representing data as a frequency and its error partially masks this problem. For example, if the frequency of occurrence of a trait (in percent) is presented as 2.1 ± 1.4, then this is not as “irritating” as 2.1% (95% CI: –0.7; 4.9), although and means the same. The Wald method with the Agresti-Coll correction and the calculation using the angular transformation give a lower bound tending to zero. The Wilson method with continuity correction and the "exact method" give wider confidence intervals than the Wilson method. For the second example, all methods give approximately the same confidence intervals (differences appear only in thousandths), which is not surprising, since the frequency of the event in this example does not differ much from 50%, and the sample size is quite large.

For readers interested in this problem, we can recommend the works of R. G. Newcombe and Brown, Cai and Dasgupta, which give the pros and cons of using 7 and 10 different methods for calculating confidence intervals, respectively. From domestic manuals, the book and is recommended, in which, in addition to a detailed description of the theory, the Wald and Wilson methods are presented, as well as a method for calculating confidence intervals, taking into account the binomial frequency distribution. In addition to free online calculators (http://www./wald.htm and http://faculty.vassar.edu/lowry/prop1.html), confidence intervals for frequencies (and not only!) can be calculated using the CIA program ( Confidence Intervals Analysis), which can be downloaded from http://www. medschool. soton. ac. uk/cia/ .

The next article will look at univariate ways to compare qualitative data.

Bibliography

Medical statistics in plain language: an introductory course / A. Banerzhi. - M. : Practical medicine, 2007. - 287 p. Medical statistics / . - M. : Medical Information Agency, 2007. - 475 p. Medico-biological statistics / S. Glants. - M. : Practice, 1998. Data types, distribution verification and descriptive statistics / // Human Ecology - 2008. - No. 1. - P. 52–58. FROM. Medical statistics: textbook / . - Rostov n / D: Phoenix, 2007. - 160 p. Applied Medical Statistics / , . - St. Petersburg. : Folio, 2003. - 428 p. F. Biometrics / . - M. : Higher school, 1990. - 350 p. BUT. Mathematical statistics in medicine / , . - M. : Finance and statistics, 2007. - 798 p. Mathematical statistics in clinical research / , . - M. : GEOTAR-MED, 2001. - 256 p. Junkerov V. And. Medico-statistical processing of medical research data /,. - St. Petersburg. : VmedA, 2002. - 266 p. Agresti A. Approximate is better than exact for interval estimation of binomial proportions / A. Agresti, B. Coull // American statistician. - 1998. - N 52. - S. 119-126. Altman D. Statistics with confidence // D. Altman, D. Machin, T. Bryant, M. J. Gardner. - London: BMJ Books, 2000. - 240 p. Brown L.D. Interval estimation for a binomial proportion / L. D. Brown, T. T. Cai, A. Dasgupta // Statistical science. - 2001. - N 2. - P. 101-133. Clopper C.J. The use of confidence or fiducial limits illustrated in the case of the binomial / C. J. Clopper, E. S. Pearson // Biometrika. - 1934. - N 26. - P. 404-413. Garcia-Perez M. A. On the confidence interval for the binomial parameter / M. A. Garcia-Perez // Quality and quantity. - 2005. - N 39. - P. 467-481. Motulsky H. Intuitive biostatistics // H. Motulsky. - Oxford: Oxford University Press, 1995. - 386 p. Newcombe R.G. Two-Sided Confidence Intervals for the Single Proportion: Comparison of Seven Methods / R. G. Newcombe // Statistics in Medicine. - 1998. - N. 17. - P. 857–872. Sauro J. Estimating completion rates from small samples using binomial confidence intervals: comparisons and recommendations / J. Sauro, J. R. Lewis // Proceedings of the human factors and ergonomics society annual meeting. – Orlando, FL, 2005. Wald A. Confidence limits for continuous distribution functions // A. Wald, J. Wolfovitz // Annals of Mathematical Statistics. - 1939. - N 10. - P. 105–118. Wilson E. B. Probable inference, the law of succession, and statistical inference / E. B. Wilson // Journal of American Statistical Association. - 1927. - N 22. - P. 209-212.

CONFIDENCE INTERVALS FOR PROPORTIONS

A. M. Grjibovski

National Institute of Public Health, Oslo, Norway

The article presents several methods for calculations confidence intervals for binomial proportions, namely, Wald, Wilson, arcsine, Agresti-Coull and exact Clopper-Pearson methods. The paper gives only general introduction to the problem of confidence interval estimation of a binomial proportion and its aim is not only to stimulate the readers to use confidence intervals when presenting results of own empirical research, but also to encourage them to consult statistics books prior to analyzing own data and preparing manuscripts.

key words: confidence interval, proportion

Contact Information:

Senior Advisor, National Institute of Public Health, Oslo, Norway

For the vast majority of simple measurements, the so-called normal law of random errors is satisfied quite well ( Gauss law), derived from the following empirical provisions.

1) measurement errors can take a continuous series of values;

2) with a large number of measurements, errors of the same magnitude, but of a different sign, occur equally often,

3) the larger the random error, the lower the probability of its occurrence.

The graph of the normal Gaussian distribution is shown in Fig.1. The curve equation has the form

where is the distribution function of random errors (errors), which characterizes the probability of an error, σ is the root mean square error.

The value σ is not a random variable and characterizes the measurement process. If the measurement conditions do not change, then σ remains constant. The square of this quantity is called dispersion of measurements. The smaller the dispersion, the smaller the spread of individual values ​​and the higher the measurement accuracy.

The exact value of the root-mean-square error σ, as well as the true value of the measured quantity, is unknown. There is a so-called statistical estimate of this parameter, according to which the mean square error is equal to the mean square error of the arithmetic mean. The value of which is determined by the formula

where is the result i-th dimension; - arithmetic mean of the obtained values; n is the number of measurements.

The larger the number of measurements, the smaller and the more it approaches σ. If the true value of the measured value μ, its arithmetic mean value obtained as a result of measurements , and the random absolute error , then the measurement result will be written as .

The interval of values ​​from to , in which the true value of the measured quantity μ falls, is called confidence interval. Since it is a random variable, the true value falls into the confidence interval with a probability α, which is called confidence probability, or reliability measurements. This value is numerically equal to the area of ​​the shaded curvilinear trapezoid. (see pic.)

All this is true for a sufficiently large number of measurements, when is close to σ. To find the confidence interval and confidence level for a small number of measurements, which we deal with in the course of laboratory work, we use Student's probability distribution. This is the probability distribution of a random variable called Student's coefficient, gives the value of the confidence interval in fractions of the root mean square error of the arithmetic mean .


The probability distribution of this quantity does not depend on σ 2 , but essentially depends on the number of experiments n. With an increase in the number of experiments n Student's distribution tends to a Gaussian distribution.

The distribution function is tabulated (Table 1). The value of the Student's coefficient is at the intersection of the line corresponding to the number of measurements n, and the column corresponding to the confidence level α