Modified Regression-Cum-Dual Mean Imputation Schemes for Estimating Population Mean Under Two-Phase Simple Random Sampling

Modified Regression-Cum-Dual Mean Imputation Schemes for Estimating Population Mean Under Two-Phase Simple Random Sampling

By

B.B.Bayedo¹, K.E.Lasisi², A.Ahmed³ & A.A.Issa⁴

^1,2,3&4Department of Statistics, Abubakar Tafawa Balewa University, Bauchi, Nigeria.

Corresponding Author email: bbb.edu79@gmail.com.

ABSTRACT

In this research, It has been paramount for many researchers under sample survey to use auxiliary information incorporation with the study variables in an estimation stage to modify estimator in order to increase the precision of the estimated population mean. This research work proposed modified regression-cum-dual mean imputation schemes for estimating population mean under two-phase simple random sampling.it concluded that the modified estimators demonstrated a high level of efficiency over the existing estimators considered for both case one and two..

Keywords: Auxiliary variable, Population Mean, Mean Squared Errors (MSE) and Bias

1.0 Introduction

Sampling theory is the field of statistics that is concerned with the collection, analysis and interpretation of data gathered from sampling the population under study. The application of sampling theory is concerned not only with the proper selection of observations from the auxiliary variable. Many prominent authors developed several modiﬁed and improved ratio, regression, and exponential type estimators by using the population information of the auxiliary variable x. However, the information about the population mean of the auxiliary variable is not always available. In the environment mentioned above, the most popular sampling scheme is the two-phase sampling scheme which was first established by Neyman (1938) to accumulate information on sampling. It is customarily acquired when the accumulation of information on a study variable is very costly but relatively cheaper to accumulate information on auxiliary variables that are correlated with the study variables. Due to these reasons, the two-phase sampling becomes a powerful and cost-effective scheme for obtaining the authentic estimate in the one-phase sample for the unknown parameters of the auxiliary variable. Authors like Kumar and Bahl (2006), Singh and Vishwakarma (2007), Singh (2011), Ozgul and Cingi (2014), Kalita et al. (2016), Noor-Al-Amin et al. (2016), Bazad and Bazad (2019), Bhushan and Gupta (2019), Adamu et al (2019) and Bhushan et al. (2023) have extensively worked under two-phase sampling. However, the applicability of the aforementioned estimators depends on the complete availability of sample information measured on both the study and auxiliary variables. Reduction in the sizes of sample information due to non-response decreases the efficiency of these estimators. In literature, Sukhatme (1962), Choudhury and Singh (2015), Audu and Adewara (2017a,b), Adamu et al. (2019) investigated the classical ratio estimator in two-phase sampling. Following Srivenkataraman (1980), Kumar and Bahl (2006) envisaged a class of dual to exponential type ratio estimators using two phases. Singh and Vishwakarma (2007) investigated Bahl and Tuteja's (1991) exponential ratio and product estimators of population mean under two-phase sampling. Singh (2011), Ozgul and Cingi (2014) developed a class of exponential regression cum ratio estimator in two-phase sampling. Kalita et al. (2016) suggested exponential ratio-cum-exponential dual-to-ratio estimators using two-phase sampling. Following Kumar and Bahl (2006) and Kalita et al. (2016), Bazad and Bazad (2019) developed some classes of dual-to-ratio exponential-type estimators. Bhushan and Gupta (2019) provided some log-type estimators of population mean using two-phase sampling. Zaman and Kadilar (2021a) introduced a new class of exponential estimators for ﬁnite population mean in two-phase sampling whereas Zaman and Kadilar (2021b) examined an exponential ratio and product estimators of population mean under two-phase sampling. Bhushan et al. (2021a) suggested some efficient classes of estimators. Bhushan et al. (2021b) developed some efficient classes of estimators under two-phase sampling. Recently, Bhushan and Kumar (2023) suggested a new efficient class of estimators of population mean using two-phase sampling.

2.0 LITERATURE REVIEW

2.1 Some Existing Estimators under SRSWOR

The contributions of Cochran (1942), Deming (1956), Hurvitz (1952) and others helped in laying the foundation of modern sampling theory. The work done during these periods made important contribution to the modern sampling theory by suggesting methods of utilizing the auxiliary information for the purpose of estimation of the population mean in order to increase the precision of the estimates.

To discuss some of the developed estimators in literature based on auxiliary variables, the following descriptions about the population and sample units were considered.

Let be a population of size and be two real valued functions having values on the unit of .. Let and be the population means of and respectively with and as coefficients of variation of and.

Let a pair of simple random sample of size be drawn without replacement from the population and , be sample means based on the sample drawn.

The usual sample mean, traditional ratio estimator (Cochran, 1942), product estimator (Murthy, 1964) and Dual to ratio estimator (Strivenkataramana, 1980) with their respective bias and variance/mean square error are defined as

                                                                                                                 (2.1)

                                                                                                  (2.2)

                                                                                                                    (2.3)

                                                                           (2.4)

                                                               (2.5)

                                                                                                                    (2.6)

                                                               (2.7)

                                                              (2.8)

                                                                                                       (2.9)

                                                 (2.10)

                                    (2.11)

Where , , , , , ,.

The traditional ratio estimator and Dual to ratio estimator have higher efficiencies when the correlation between the study and auxiliary variables is strong and positive while product estimator has higher efficiency when the correlation between the study and auxiliary variables is strong and negative.

For estimating the population mean of the study variate y, Singh and Espejo (2003) considered a linear combination of ratio and product estimators’ type given by

(2.12) and obtained that estimator attained optimality when with , where Cy is the coefficient of variation of y and Cx is the coefficient of variation of x.

Singh and Espejo (2007), modified that when the population mean of x is not known, a first-phase sample of size should be drawn from the population on which only the x-characteristic is measured in order to furnish a good estimate of . Then a second-phase sample of size n is drawn on which both the variables y and x are measured. Let denote the sample mean of x based on the first-phase sample of size , Singh and Espejo (2007) considered a ratio–product type estimator in the two-phase sampling given by

      (2.13) Choudhury and Singh (2012) consider the work of Singh and Espejo (2007) and modified a class chain ratio-product type estimator for estimating population mean using two auxiliary characters under two conditions in Singh and Espejo (2007). The modified estimator, its bias and MSE are respectively given as

                                                                                 (2.14)

Subramani and Prabavathy (2014) modified two estimators of population mean based on median of study variable using auxiliary information as:

                                                                                           (2.35)

`           (2.36) where and are Median of the study variable, Median of the auxiliary variable and Sample median of study variable respectively.

The MSEs of modified estimators are given below;

                      (2.37) where and .

Singh (2015) modified an improved class of ratio type estimator for finite population mean using unknown weight and power transformation strategies. The particular cases of this estimator are Walsh (1970) estimator at and , Ray et al., (1979) estimator at and , Srivenkataramana and Tracy (1979) at and , Srivenkataramana and Tracy (1979) at and and Srivenkataramana (1980) at and . The modified estimator and its properties are given below;

                                                                    (2.38)

                   (2.39)

                                  (2.40)

In his study, the modified estimators attain the optimality when . The empirical study of in his work revealed that the modified estimator is more efficient than the estimators of Walsh (1979), Ray et al., (1979), Srivenkataramana and Tracy (1979), Srivenkataramana (1980).

3.0 MATERIALS AND METHODS

3.1 Robust Outlier-Free Measure to be used in the Study

The study will utilize robust, outlier-resistant parameter estimation techniques to minimize or eliminate the influence of extreme values in the sample data. These methods include:

i. Gini’s Mean Difference method

ii. Downton’s Method

iii. The Method of Probability-Weighted Moments

3.1.1 Gini’s Mean Method

Let, where is the order statistics so that is the distance between adjacent observations. Then Gini’s mean method proposed by Nair (1936) is given as

                                                                                             (3.1)

3.1.2 Downton’s Method

Let , be a random sample from a normal distribution with mean and variance ; that is, Let denote the corresponding order statistics. The Downton estimator, proposed by Downton (1966)

, is given as

                                                                                                         (3.2)

Where for a normal distribution and does not depend on the sample size n.

3.1.3 Method of probability weighted moments

Let , be a random sample from a normal distribution with mean and variance ; that is, Let denote the corresponding order statistics. The PWMs is defined by Greenwood et al. (1979) and is given as

                                                                                                (3.3)

4.0 Data Analysis and Discussion of result

For empirical validation, four simulated population datasets with varying sampling conditions were generated to assess the performance of the proposed estimators within Survey Sampling. Their efficiency was evaluated against existing estimators using Bias, Mean Square Error (MSE), and Percentage Relative Efficiency (PRE), which measure accuracy, variability, and relative performance in estimating the population mean. Bias and MSE assessed closeness to true parameters and estimation precision, while PRE enabled comparative efficiency analysis, providing a systematic framework that demonstrates the effectiveness and potential superiority of the proposed estimators within Statistics.

4.1 Populations used for Simulation Study for positive relationship

Table 1: Biases, MSEs and PREs of Existing Estimators and Proposed Estimators of the proposed schemes 1, 2 and 3 using N=2000, =800, n=300, r=200 (Case 1 & 2)

Estimators

Case 1

Case 2

Biases

MSEs

PREs

Biases

MSEs

PREs

Sample mean T0

0.00974

0.029

100.0000

0.00016

0.0298

100

Lee et al. (1994)    T1

-0.00107

0.0102

284.2900

-0.00471

0.01019

292.3500

Singh & Horn (2000)   T2

-0.00265

0.00994

291.8400

-0.00543

0.01088

273.8100

Kadilar & Cingi (2008) T3

0.01267

0.02634

110.1000

0.02825

0.04539

65.6500

Singh (2009) T4

0.00133

0.01002

289.5600

0.00058

0.01134

262.8300

Audu et al. (2021b) T5

-0.02768

0.01163

249.4300

-0.04152

0.01544

192.9900

Musa et al. (2023)

T6   (a=Cx,b=Sx)

-1.08465

1.25058

2.3200

-1.09581

1.26833

2.3500

T6   (a=Cx,b=Skw)

-1.18521

1.47997

1.9600

-1.19667

1.50113

1.9800

T6   (a=Cx,b=Kurt)

-1.20351

1.52412

1.9000

-1.2151

1.5463

1.9300

T6   (a=Sx,b=Cx)

-1.2069

1.5324

1.8900

-1.21852

1.55478

1.9200

T6    (a=Sx,b=Skw)

-1.20694

1.53248

1.8900

-1.21855

1.55486

1.9200

T6    (a=Sx,b=Kurt)

-1.20916

1.53792

1.8900

-1.22079

1.56043

1.9100

T6   (a=Skw,b=Sx)

-1.08394

1.24905

2.3200

-1.09511

1.26679

2.3500

T6 (a=Skw,b=Cx)

-1.1847

1.47874

1.9600

-1.19615

1.49987

1.9900

T6 (a=Skw,b=Kurt)

-1.20344

1.52395

1.9000

-1.21502

1.54612

1.9300

T6   (a=Kurt,b=Cx)

-1.13035

1.35207

2.1400

-1.14157

1.37098

2.1700

T6   (a=Kurt,b=Sx)

-1.00982

1.09422

2.6500

-1.0211

1.11102

2.6800

T6   (a=Kurt,b=Skw)

-1.13097

1.35346

2.1400

-1.14219

1.3724

2.1700

Estimators of Proposed Scheme 1

Tp11    (G,D)

-0.00293

0.00959

302.3200

-0.00521

0.00993

300.0000

Tp12    (G,S)

-0.00311

0.00965

300.6500

-0.005

0.01024

290.9600

Tp13    (D,G)

-0.00323

0.00972

298.4500

-0.00479

0.01053

282.8800

Tp14    (D,S)

-0.00328

0.00976

297.2700

-0.00468

0.01067

279.2400

Tp15    (S,G)

-0.00303

0.00962

301.6000

-0.0051

0.01009

295.4400

Tp16    (S,D)

-0.0029

0.00959

302.4800

-0.00524

0.00989

301.3200

Estimators of Proposed Scheme 2

Tp21    (G,D)

0.01144

0.03097

93.6300

0.00188

0.03217

92.6300

Tp22    (G,S)

0.01264

0.0324

89.5200

0.00312

0.0339

87.9000

Tp23    (D,G)

0.01363

0.0336

86.3000

0.00416

0.03538

84.2300

Tp24    (D,S)

0.01407

0.03413

84.9600

0.00461

0.03603

82.6900

Tp25    (S,G)

0.01206

0.0317

91.4700

0.00252

0.03305

90.1500

Tp26    (S,D)

0.01126

0.03075

94.2900

0.0017

0.0319

93.3900

Estimators of Proposed Scheme 3

Tp31    (G,D)

-0.00219

0.00965

300.5000

-0.00558

0.00934

319.0300

Tp32    (G,S)

-0.0019

0.00974

297.6400

-0.00564

0.00924

322.4900

Tp33    (D,G)

-0.00165

0.00985

294.5000

-0.00566

0.0092

323.9800

Tp34    (D,S)

-0.00154

0.0099

292.9400

-0.00566

0.00919

324.2300

Tp35    (S,G)

-0.00204

0.00969

299.1500

-0.00561

0.00928

321.0400

Tp36    (S,D)

-0.00224

0.00964

300.8400

-0.00557

0.00936

318.3400

Table 2: Biases, MSEs and PREs of Existing Estimators and Proposed Estimators of the proposed schemes 1, 2 and 3 using N=2000, =1000, n=500, r=300 (Case 1 & 2)

Estimators

Case 1

Case 2

Biases

MSEs

PREs

Biases

MSEs

PREs

Sample mean T0

-0.00227

0.01853

100

0.00277

0.01875

100

Lee et al. (1994)    T1

-0.00414

0.00423

438.36

-0.00157

0.00645

290.86

Singh & Horn (2000)   T2

-0.00441

0.00407

454.81

-0.00221

0.00685

273.55

Kadilar & Cingi (2008) T3

0.01484

0.01824

101.6

0.01666

0.02677

70.04

Singh (2009) T4

-0.00115

0.00411

450.69

0.00135

0.00699

268.38

Audu et al. (2021b) T5

-0.02433

0.00531

348.68

-0.0247

0.00846

221.71

Musa et al. (2023)

T6   (a=Cx,b=Sx)

-0.99812

1.02416

1.81

-0.97778

0.99538

1.88

T6   (a=Cx,b=Skw)

-1.12083

1.28358

1.44

-1.1005

1.25366

1.5

T6   (a=Cx,b=Kurt)

-1.14291

1.33356

1.39

-1.12257

1.30354

1.44

T6   (a=Sx,b=Cx)

-1.14699

1.34293

1.38

-1.12666

1.31289

1.43

T6    (a=Sx,b=Skw)

-1.14703

1.34303

1.38

-1.1267

1.31299

1.43

T6    (a=Sx,b=Kurt)

-1.14971

1.34918

1.37

-1.12938

1.31913

1.42

T6   (a=Skw,b=Sx)

-0.99725

1.02243

1.81

-0.97691

0.99367

1.89

T6 (a=Skw,b=Cx)

-1.12021

1.28219

1.45

-1.09988

1.25227

1.5

T6 (a=Skw,b=Kurt)

-1.14282

1.33337

1.39

-1.12249

1.30334

1.44

T6   (a=Kurt,b=Cx)

-1.05418

1.13884

1.63

-1.03384

1.10943

1.69

T6   (a=Kurt,b=Sx)

-0.90522

0.84804

2.18

-0.88493

0.82079

2.28

T6   (a=Kurt,b=Skw)

-1.05493

1.14042

1.62

-1.03459

1.111

1.69

Estimators of Proposed Scheme 1

Tp11    (G,D)

-0.00472

0.00392

472.18

-0.00221

0.00697

268.78

Tp12    (G,S)

-0.00462

0.00402

460.55

-0.0021

0.00726

258.38

Tp13    (D,G)

-0.00452

0.00414

447.8

-0.00198

0.00752

249.15

Tp14    (D,S)

-0.00446

0.0042

441.54

-0.00193

0.00765

245.03

Tp15    (S,G)

-0.00467

0.00397

466.73

-0.00216

0.00711

263.53

Tp16    (S,D)

-0.00473

0.00391

473.57

-0.00223

0.00694

270.31

Estimators of Proposed Scheme 2

Tp21    (G,D)

-0.00096

0.0203

91.26

0.00419

0.02055

91.22

Tp22    (G,S)

-0.00002

0.0216

85.78

0.00524

0.02188

85.67

Tp23    (D,G)

0.00079

0.02271

81.58

0.00614

0.02303

81.4

Tp24    (D,S)

0.00114

0.0232

79.85

0.00654

0.02354

79.63

Tp25    (S,G)

-0.00048

0.02097

88.37

0.00473

0.02123

88.3

Tp26    (S,D)

-0.00111

0.02011

92.15

0.00404

0.02035

92.12

Estimators of Proposed Scheme 3

Tp31    (G,D)

-0.00481

0.00389

476.51

-0.00237

0.00645

290.58

Tp32    (G,S)

-0.00477

0.00396

467.51

-0.00233

0.00637

294.16

Tp33    (D,G)

-0.00471

0.00406

456.56

-0.00226

0.00635

295.35

Tp34    (D,S)

-0.00468

0.00411

450.95

-0.00222

0.00635

295.36

Tp35    (S,G)

-0.00479

0.00392

472.48

-0.00235

0.0064

292.72

Tp36    (S,D)

-0.00481

0.00388

477.45

-0.00237

0.00647

289.82

4. 2 Interpretation of the Results

The interpretation of the simulation results is carried out based on the key aims of the proposed modifications, namely the reduction of bias, minimization of Mean Square Error (MSE), and improvement in efficiency in estimating the population mean within Survey Sampling. To evaluate the performance of the proposed estimators with respect to these objectives, simulation data were generated and analyzed under different sampling configurations. The results obtained from the simulation study are summarized in Tables 1–2 for Cases 1 and 2.

These tables present the computed values of Biases, Mean Square Errors (MSEs), and Percentage Relative Efficiencies (PREs) of the proposed class of estimators and compare them with those of several existing estimators developed by Lee et al. (1994), Singh and Horn (2000), Kadilar and Cingi (2008), Singh (2009), Audu et al. (2021b) and Musa et al. (2023). Through these comparisons, the effectiveness of the modified regression-cum-dual mean imputation estimators is assessed in terms of their ability to produce estimates with smaller bias, lower MSE, and higher efficiency relative to the existing estimators. The subsequent interpretation therefore examines the results in relation to each modification aim in order to demonstrate the statistical advantages of the proposed estimators within the framework of Statistics.

5.0 Conclusion

This study examined the statistical properties of the proposed modified estimators for estimating the population mean in the presence of non-response within the framework of Survey Sampling. The analytical properties of the estimators, particularly their Biases and Mean Square Errors (MSEs), were derived up to the third-order approximation using the Taylor Series Expansion. These derivations provided a theoretical basis for evaluating the performance of the modified imputation estimators and for establishing conditions under which they outperform some existing estimators considered in the study.

Furthermore, the efficiency conditions of the modified estimators were obtained by comparing the minimum MSE expressions of the proposed estimators with the MSE (or minimum MSE) expressions of the competing estimators. The bias and MSE expressions of the proposed imputation schemes were derived using binomial expansion techniques up to the third order, which allowed for a more accurate approximation of the estimators’ sampling properties. This analytical framework enabled the determination of efficiency conditions and provided insight into the circumstances under which the proposed estimators yield improved estimation performance.

In addition to the theoretical derivations, an empirical study based on simulated data was conducted to further evaluate the performance of the modified estimators. The simulation results confirmed the theoretical findings, demonstrating that the proposed estimators consistently exhibit smaller biases, lower MSEs, and higher Percentage Relative Efficiencies (PREs) when compared with the existing estimators. Overall, the results of both the theoretical analysis and the simulation study indicate that the proposed imputation estimators provide more accurate and efficient estimates of the population mean, thereby offering a useful methodological contribution to Statistics in handling missing data under two-phase sampling designs.

                                                                      References

Adejumobi, A., Audu, A., Yunusa, M. A. and Singh, R. V. K. (2022). Efficiency of Modified Generalized Imputation Scheme for Estimating Population Mean with Known Auxiliary Information. Bayero Journal of Pure and Applied Sciences, 15(1): 105-112

Ahmed, M.S., Al-Titi, O., Al-Rawi and Abu-Dayyeh, W. (2006). Estimation of a population mean using different imputation methods. Statistics in Transition, 7(6): 1247 -1264.

Al-Omari, A. I., Bouza, C. N. and Herrera, C. (2013). Imputation methods of missing data for estimating the population mean using simple random sampling with known correlation coefficient. Quality and Quantity, 47, 353-365.

Audu, A., Ishaq, O. O., Singh, R. V. K., Danbaba, A. and Manu, F. (2023a). On the study of efficiency of exponential-type estimator of population mean using robust regression methods. Quality and Reliability Engineering International, 39,190-205.

Audu, A., Singh, R. and Khare, S. (2023b): New Regression-Type Compromised Imputation      Class of Estimators with known Parameters of Auxiliary Variable. Communication in      Statistics-Simulation and Computation, 52:10, 4789-4801.

Audu, A., Ishaq, O.O., Isah, U., Muhammed, S., Akintola,K. A., A. Rashida, A., and Abubakar. A. (2020a) On the Class of Exponential-Type Imputation Estimators of Population Mean with Known Population Mean of Auxiliary Variable. NIPES Journal of Science and Technology Research 2(4) 2020 pp.1 - 11

Audu, A., Ishaq O. O., Zakari, Y., Wisdom, D. D., Muili, J. and Ndatsu. A. M. (2020b): Regression-cum-exponential ratio imputation class of estimators of population mean in the presence of non-response. Science Forum Journal of Pure and Applied Science, 20, 58-63

Audu, A., Ishaq, O. O., Abubakar, A. Akintola, K. A., Isah, U., Rashida, A. and Muhammad, S.           (2021a): Regression-type Imputation Class of Estimators using Auxiliary Attribute. Asian      Research Journal of Mathematics, 17(5): 1-13.

Audu, A. Danbaba, A., Ahmad, S. K., Musa, N., Shehu, A., Ndatsu, A. M. and Joseph, A. O. (2021b). On the Efficient of Almost Unbiased Mean Imputation When the Population       Mean of Auxiliary Variable is Unknown. Asian Journal of Probability and Statistics, 15(4): 235-250.