Una solución para la multicolinealidad en modelos de función de producción de frontera estocástica

Elkin Castaño; Santiago Gallón

doi:10.17533/udea.le.n86a01

Una solución para la multicolinealidad en modelos de función de producción de frontera estocástica

Autores/as

Elkin Castaño Universidad de Antioquia
Santiago Gallón Universidad de Antioquia

DOI:

https://doi.org/10.17533/udea.le.n86a01

Palabras clave:

análisis de frontera estocástica, eficiencia técnica, productividad, multicolinealidad, estimación de componentes principales.

Resumen

Este artículo considera el problema de colinealidad entre insumos en un modelo de producción de frontera estocástica, un tema que ha recibido poca atención en la literatura econométrica. Para abordar el problema, se propone una solución basada en componentes principales que permite interpretar conjuntamente la eficiencia técnica y los parámetros de tecnología del modelo. Los resultados de la aplicación del método con datos simulados y reales muestran que éste es fácil de usar y presenta un buen desempeño.

|Resumen

= 613 veces | PDF (ENGLISH)

= 237 veces| | XML (ENGLISH)

= 9 veces|

Descargas

Los datos de descargas todavía no están disponibles.

Biografía del autor/a

Elkin Castaño, Universidad de Antioquia

Profesor adjunto. Departamento de Economía, Facultad de Ciencias Económicas, Universidad de Antioquia, y Facultad de Estadística, Facultad de Ciencias, Universidad Nacional de Colombia, Medellín, Colombia.

Santiago Gallón, Universidad de Antioquia

Assistant Professor. Departamento de Matemáticas y Estadística, Facultad de Ciencias Económicas, Universidad de Antioquia, Medellín, Colombia. Postal address: Calle 67 No. 53-108, Oficina 13-116.

Citas

Aigner, Dennis; Lovell, Knox & Schmidt, Peater (1977). “Formulation and estimation of stochastic frontier production function models”, Journal of Econometrics, Vol. 6, Issue 1, pp. 21-37.

Belsley, David; Kuh, Edwin & Welsh, Roy (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: John Wiley & Sons, Inc.

Coelli, Timothy & Henningsen, Arne (2013). Frontier: Stochastic Frontier Analysis. Retrieved from: http://CRAN.R-Project.org/package=frontier. R package version 1.1-0. (Accessed on July 2014).

Coelli, Timothy; Rao, Prasada D.S.; O’Donnell, Christopher J. & Battese, George E. (2005). An Introduction to Efficiency and Productivity Analysis (2nd. Ed.). New York: Springer.

Filippini, Massimo; Hrovatin, Nevenka & Zoric, Jelena (2008). “Cost efficiency of slovenian water distribution utilities: an application of stochastic frontier methods”, Journal of Productivity Analysis, Vol. 29. Issue 2, pp. 169-182.

Fomby, Thomas B.; Johnson, Stanley R. & Hill, Carter (1984). Advanced Econometric Methods. New York: Springer.

Greene, William (1980a). “Maximum likelihood estimation of econometric frontier functions”, Journal of Econometrics, Vol. 13, Issue 1, pp. 27-56.

Greene, William (1980b). “On the estimation of a flexible frontier production model”, Journal of Econometrics, Vol. 13, Issue 1, pp. 101-115.

Greene, William (2008). “The econometric approach to efficiency analysis”. In: Fried, Harold; Lovell, Knox & Schmidt, Shelton (Eds.), The Measurement of Productive Efficiency and Productivity Growth (pp. 92-150). New York, Oxford University Press.

Groß, Jürgen (2003). “Linear Regression”, Lecture Notes in Statistics, Vol. 75. Springer.

Hwang, Gene J. T. & Nettleton, Dan (2003). “Principal components regression with data chosen components and related methods”, Technometrics, Vol. 45, No. 1, pp. 70-79.

Jolliffe, Ian T. (1982). “A note on the use of principal components in regression”, Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 31, No. 3, pp. 300-303.

Jolliffe, Ian T. (2002). Principal Component Analysis (2nd Ed.). New York: Springer.

Kumbhakar, Subal C. & Lovell, C. Knox (2000). Stochastic Frontier Analysis. Cambridge: Cambridge University Press.

Mason, Robert & Gunst, Richard (1985). “Selecting principal components in regression”, Statistics and Probability Letters, Vol. 3, Issue 6, pp. 299-301.

Massy, William F. (1965). “Principal components regression in exploratory

statistical research”, Journal of the American Statistical Association, Vol. 60, Issue 309, pp. 234-256.

Meeusen, Wim & van Den Broeck, Julien (1977). “Efficiency estimation from Cobb-Douglas production functions with composed error”, International Economic Review, Vol. 18, No. 2, pp. 435-444.

Puig-Junoy, Jaume (2001). “Technical inefficiency and public capital in U.S. states: A stochastic frontier approach”, Journal of Regional Science, Vol. 41, Issue 1, pp. 75-96.

Stevenson, Rodney (1980). “Likelihood functions for generalized stochastic frontier estimation”, Journal of Econometrics, Vol. 13, Issue 1, pp. 58-66.

Descargas

Publicado

31-01-2017

Cómo citar

Castaño, E., & Gallón, S. (2017). Una solución para la multicolinealidad en modelos de función de producción de frontera estocástica. Lecturas De Economía, (86), 9–23. https://doi.org/10.17533/udea.le.n86a01

Descargar cita

Número

Núm. 86 (2017): Enero-Junio

Sección

Artículos

Licencia

Derechos de autor 2017 Elkin Castaño, Santiago Gallón

Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial-CompartirIgual 4.0.

Este sitio web, por Universidad de Antioquia, está licenciado bajo una Creative Commons Attribution License.

Los autores que publiquen en esta revista aceptan que conservan los derechos de autor y ceden a la revista el derecho de la primera publicación, con el trabajo registrado con una Licencia de Atribución-NoComercial-CompartirIgual de Creative Commons, que permite a terceros utilizar lo publicado siempre que mencionen su autoría y a la publicación original en esta revista.

Los autores pueden realizar acuerdos contractuales independientes y adicionales para la distribución no exclusiva de la versión del trabajo publicada en la revista (por ejemplo, incluirla en un repositorio institucional o publicarla en un libro) siempre que sea con fines no comerciales y se reconozca de manera clara y explícita que el artículo ha sido originalmente publicado en esta revista.

Se permite y recomienda a los autores publicar sus artículos en Internet (por ejemplo, en páginas institucionales o personales), ya que puede conducir a intercambios provechosos y a una mayor difusión y citación de los trabajos publicados.

Artículos más leídos del mismo autor/a

Elkin Castaño, Santiago Gallón, Karoll Gómez, Johanna Vásquez, Deserción estudiantil universitaria: una aplicación de modelos de duración , Lecturas de Economía: Núm. 60 (2004): Enero-Junio
Elkin Castaño, Una estimación no paramétrica y robusta de la transformación Box-Cox para el modelo de regresión , Lecturas de Economía: Núm. 75 (2011): Julio-Diciembre
Jesús Botero, Elkin Castaño, Carlos Eduardo Vélez, Modelo económica de demanda de energía eléctrica en la industria colombiana , Lecturas de Economía: Núm. 32-33 (1990)
Héctor Arango, Elkin Castaño, Wilman Gómez, Ramón Javier Mesa, Freddy Pérez, Remberto Rhenals, Determinantes de la cuenta corriente en Colombia: un enfoque intertemporal , Lecturas de Economía: Núm. 50 (1999)
Elkin Castaño, Proxy means test index for targeting social programs : two methodologies and empirical evidence , Lecturas de Economía: Núm. 56 (2002): Enero-Junio
Diego Lemus, Elkin Castaño, Prueba de hipótesis sobre la existencia de una raíz fraccional en una serie de tiempo no estacionaria , Lecturas de Economía: Núm. 78 (2013): Enero-Junio
Elkin Castaño, El "efecto calendario" y la especificación de un modelo de series de tiempo para la demanda de electricidad , Lecturas de Economía: Núm. 35 (1991)
Gustavo López, Elkin Castaño, Carlos Eduardo Vélez, La demanda residencial del servicio de acueducto en Medellín, 1985-1991 , Lecturas de Economía: Núm. 37 (1992)
Elkin Castaño, Jorge Sierra, Sobre la existencia de una raíz unitaria en la serie de tiempo mensual del precio de la electricidad en Colombia , Lecturas de Economía: Núm. 76 (2012): Enero-Junio
Elkin Castaño, Luz Angela Valencia, Indicador de calidad de los estratos para el Área Metropolitana de Medellín , Lecturas de Economía: Núm. 50 (1999)

1 2 > >>

Introduction

It is well known that the production frontier and technical efficiency anal yses on a productive unit assume that deviations of the observed product from its maximum (or potential) attainable output, located on the produc tion frontier, are due exclusively to inefficiencies of the productive unit (see, e.g., Kumbhakar & Lovell, 2000; Coelli, et al., 2005). For instance, if the as sumed production function is a Cobb-Douglas technology ^{_{y = x⊤}} β + v, where y and x are the logarithms of the observed output and the input vec tor respectively, then the production frontier ^_x⊤ β is deterministic, and ^{_{v = y −x⊤}} β corresponds to the production inefficiency. The lack of randomness in the production frontier of this kind of models does not correspond to the real economic life, where uncontrollable random production shocks occur commonly.

The stochastic frontier production model (Aigner, Lovell & Schmidt, 1977; Meeusen & van den Broeck, 1977) is specified as

(1)

where y_i is the observed output and x_i the k-dimensional vector of inputs

for the ith firm, represent the deterministic and noise components of the frontier respectively, x_i ^⊤ β + v_i is the maximum output reached by the firm which constitutes the stochastic frontier, and u_i is the non-negative random technical inefficiency component (i.e., the amount by which the firm fails to achieve its optimum). A symmetric distribution, such as the normal distribution, is usually assumed for vi. It is also common to assume that v_i and u_i are independent, and that both errors are uncorre lated with x_i . Typically, the production function relies on a Cobb-Douglas, translog, or any other logarithmic production model log(yi)= x_i ^⊤ β + v_i - u_i , where the components of x_i are logarithms of inputs, its squares and cross products.

Most of the proposed stochastic frontier models in the literature differ mainly on the assumed probability distribution function for the inefficiency component u >= 0 in order to apply the maximum likelihood estimation method. In this regard, Kumbhakar and Lovell (2000), Coelli, et al. (2005), and Greene (2008) present an extensive literature about some distributions. Some instances are the half-normal model u ~ ^_N+ (0,θ² _u), where N⁺ denotes the non-negative half-normal distribution (Aigner, Lovell & Schmidt, 1977); the exponential model u ~ Exp(λ), λ > 0 (Meeusen & van den Broeck, 1977; Aigner, Lovell & Schmidt, 1977); the gamma model u ~ Γ(λ, θ), λ > 0 and θ > 0 (Stevenson, 1980; Greene, 1980a; Greene, 1980b); and the truncated normal u ~ ^_N+ (µ_u, σ_u ²⁾ (Stevenson, 1980).

An issue with applications of stochastic frontier analysis emerges when inputs are highly correlated, from which the multicollinearity problem arises, leading to precision loss in estimates. This loss is also given by low input variability. In the presence of collinearity, it is known that: (i) separating the individual effects of each independent variable could be a difficult task; (ii) the precision loss is expressed in large estimated variances of estimates, and hence the parameters could be non-statistically significant; (iii) the esti mated coefficients can have incorrect signs and impossible magnitudes; and (iv) there are instability problems in the sense that small changes in obser vations, or eliminating an apparently insignificant variable, can produce large changes in estimates (see, e.g., Belsley, Kuh & Welsh, 1980; Fomby, Johnson & Hill, 1984; Groß, 2003). Therefore, it is clear that multicollinearity is a data-driven issue rather than a statistical one (Belsley, Kuh & Welsh, 1980), which can have harmful implications for the estimation of technology coeffi cients due to their relation with the scale returns generated by the production model.

Despite these drawbacks, a great extent of literature on stochastic fron tier analysis considers the multicollinearity problem as unimportant or uses a non-statistical solution. For example, Filippini, et al. (2008) exclude the input whose correlation with other inputs is quite high in order to prevent multicollinearity. Other studies sacrifice the advantages of flexible functional forms for the deterministic component due to the cost of statistically insignif icant estimates generated by unreliable parameter estimates resulting from lin ear dependencies between inputs (Kumbhakar & Lovell, 2000; Puig & Junoy, 2001; Filippini, 2008). Finally, others argue that, when technical inefficiency estimation is the main aim, multicollinearity is not necessarily a serious prob lem and the interpretation of estimates is secondary (Puig & Junoy, 2001). To the best of our knowledge, no theoretical research has been reported on studying both the stochastic frontier analysis and multicollinearity jointly.

In this paper, we propose a principal-component-based solution for mul ticollinearity in a stochastic frontier model. Basically, we use a re-paramete rization of the model in terms of all k principal components and restrict the corresponding coefficient vector to those principal components associated to the r < k nonzero eigenvalues. Finally, estimates of the original model are recovered. The solution permits a joint estimation of the technical effi ciency and parameters through this better specified model. Also, through a simulation experiment, the proposed estimator is shown to be consistent and has less mean square error with respect to the traditional stochastic frontier analysis.

The rest of the paper is organized as follows. In Section I., the solution is described, and its performance is studied by a Monte Carlo simulation ex periment in Section II. In Section III., an application with real data is carried out. Finally, some conclusions are given.

I. The principal component solution

For the case where there is only near exact multicollinearity (i.e., when one or more nearly exact linear relations exist among the regressors), we consider the matrix representation of the stochastic frontier production model (1),

(2)

where y, v, u, and 1 are n-dimensional vectors of observed outputs, produc tion and inefficiency random errors, and ones respectively; X is the n × k design matrix of inputs; and β the corresponding k-dimensional vector of coefficients. For clarity and notational simplicity, all inputs are assumed to be standardized in the sequel.

Now, based on the spectral decomposition of the k × k symmetric matrix ^_X⊤X ,

^_X⊤X = P Λ P^⊤ ,

where Λ = diag(λ₁, λ₂,..., λ_k) is the diagonal eigenvalues matrix (with λ₁ ≥ λ₂ ≥··· ≥ λ_k), and P =(p ₁, p ₂,...,p_k ) the corresponding orthogonal eigenvectors matrix.

By the orthogonality of P (i.e., PP ^⊤ = P ^⊤ P = I), the regression model (2) can be re-parameterized as

(3)

where Z = XP = (z₁, z₂,..., z_k) is the matrix of principal components z_j = Xpj with the property ^_zT _j z_j = λ _j , ∀j, and θ = P ^⊤ β.

From the theory of principal component analysis -PCA- (see, e.g., Jol liffe, 2002), it is well known that the principal components z_j = Xp_j are orthogonal, where the first principal component z₁ has the maximal variance (i.e., the largest amount of information) of the original variables, the second principal component z₂ has the next maximal variance after the first prin cipal component, and so on. Note that if the jth characteristic root λ_j is approximately equal to zero, then z_j ≈ 0.

Additionally, if all k principal components are used, the same parameter vector β is obtained, which is unreliable under collinearity among the exoge nous variables as was pointed out in the introduction. In other words, fairly small eigenvalues of the ^_X⊤X matrix generate imprecisions in the OLS esti mator Therefore, the strategy consists in preventing that the estimate goes in directions λ_ip_j associated to fairly small λ_j (see Fomby, Johnson & Hill, 1984; Groß, 2003).

Thus, to deploy the strategy, we restrict β into the subspace spanned by the columns λ ₁p₁, λ ₂p₂,..., λ_rp_r , where λ ₁ ≥ λ ₂ ≥ · ·· ≥ λ _r > 0 are the r<k largest eigenvalues of X ^⊤ X and λ _r+1 ≈ λ _r+2 ≈ ... ≈ λ _k ≈ 0. This means that range ( X ) = r. Hence, in order to eliminate imprecisions, Massy (1965), Jolliffe (1982), Mason and Gunst (1985), and Hwang and Nettleton (2003) suggest using (i) the first principal components with the largest vari ance and highly correlated with output y, and (ii) those principal components of low variance but with high output correlation.

Therefore, the model (3) can be re-expressed using the subdivision of the eigenvalues into groups λ₁ ≥ λ₂ ≥··· ≥ λ_r > 0 and λ_r+1 ≈ λ_r+2 ≈ ··· ≈ λ_k ≈ 0 and defining the corresponding partition Z = (Z ₁, Z ₂) = (XP ₁, XP ₂), where Z₁ is the n × r matrix with principal components as sociated to the nonzero eigenvalues and Z₂ the n × (k − r) matrix with the rest of the principal components associated to the eigenvalues approximately equal to zero. Then, assuming that the first r principal components are highly correlated with y in order to simplify the notation, and using Z₂ ≈ 0, the re parameterized model (3) can be expressed as

where θ = (θ₁ ^T, θ₂ ^T) ^T, with θ₁ = P₁ ^T β ₁ and θ₂ = P ^T ₂ β ₂. The constraint

Z₂ ≈ 0 is equivalent to θ₂ ≈ 0.

Finally, the least squares estimator of θ₁ is Thus, the principal component estimator of β in (2) is given by

(4)

with covariance matrix

II. Simulation study

To evaluate the performance of the proposed principal-component-based method, we carried out a Monte Carlo simulation experiment with 20,000 replications on the stochastic frontier model

(5)

with a half-normal/normal specification, where σ _u = 3, σ _v = 2.5, σ ² = σ ² _u + σ ² _v = 15.25, r = σ ² _u/σ ² =0.59, (β ₀, β ₁, β ₂) = (1, 0.8, 0.7); and (x ₁, x ₂) ~ N (µ, Σ) with µ = (20, 25) and Σ = DRD , where D = diag(σ _x1 , σ _x2 )= diag(1, 2); and with ρ = Corr(x₁,x₂) = 0.7, 0.8, 0.9. For the most severe multicollinearity prob lem, where ρ = 0.9, we performed the simulations with n = 1000 to study the large sample properties of the estimator. We used the frontier: Stochastic Frontier Analysis R package version 1.1-0 by Coelli and Henningsen (2013).

Tables 1-3 show the means, biases, and mean squared errors −MSE− of estimators of β ₁ and β ₂ approximated by the principal-component-based and the usual stochastic frontier analysis methods for the assumed values of ρ. Results indicate that, in general, the coefficient estimators obtained with the principal-component-based method are biased, as these biases do not decrease asymptotically. However, the estimators have less MSE with respect to the ones obtained by the traditional method, even in large samples. The usual estimators are biased for finite samples with greater biases than for the proposed method, although these decrease asymptotically. The estimations for γ and σ ₂ remain unaffected if the principal components are chosen correctly. Finally, when keeping fixed the number of principal components, the biases increase as the linear relationship among variables decreases.

III. Application

To see how the proposed solution behaves with real data, we use the production data of the agricultural and livestock sector with a sample of n = 23 livestock farms. The output variable is the total income, and inputs are labor, capital and other inputs; all have been measured in nominal Colombian −COL− pesos.

Then, a stochastic frontier production model was fitted assuming a Cobb-Douglas functional form with normal-exponential specification, Estimations were carried out us ing the LIMited DEPendent −LIMDEP− econometric software (version 10). As can be seen in Table 4 the only statistically significant parameter is the input corresponding to log(Other inputs₂). Although the variable log(Capital) is insignificant, its estimated coefficient has an unexpected opposite sign, indicating a signal of possible multicollinearity.

To detect multicollinearity, we computed the scaled condition in dexes. Table 5 shows there are two harmful condition indexes (with values greater than 30), indicating two possible near-linear dependencies among inputs. Thus, under the multicollinearity problem, we applied the proposed principal-component-based solution. The proportion of vari ance explained by the first principal component was 88.6%. Therefore, we applied the solution using this principal component. Table 6 displays the corresponding results. Based on these results, the estimates of the principal-component-based stochastic frontier using the equation (4) are in Table 7. Results show that all inputs are statistically significant with correct signs in accordance to production theory.

Table 5:

Condition Indexes

Condition Index
1.000
12.829
42.981
101.730

Source: author's elaboration.

Conclusions

Based on simulation results, the estimators for inputs obtained under the proposed principal-component-based solution are biased, and such biases do not decrease asymptotically. Besides, the estimators have less MSE with respect to the usual ones even in large samples. For finite sam ples, the estimators are biased, and seem to have greater biases than the principal-component-based estimators. Also, the bias diminishes when the sample size increases. If the principal components are correct, the estimation of remains are correct, the proposed method. Furthermore, when keeping fixed the number of prin cipal components, the biases of the proposed estimator increase as the linear relation between covariates decreases. The choice of the number of principal components is critical to the estimation of β, γ and σ², as well as for the efficiency component. After applying the proposed method on real data from the agricultural and livestock sectors to evaluate its tech

nical inefficiency, our method seems to provide better estimation results for the coefficients, as well as for the scale returns, in comparison with the traditional method.

[1] Aigner, D., Lovell, K., Schmidt, P. (1977). Formulation and estimation of stochastic frontier production function models. Journal of Econometrics 6(1), 21-37.

[2] Belsley, D., Kuh, E., Welsh, R. (1980). . . New York: John Wiley & Sons, Inc. .

[3] Coelli, T., Henningsen, A.Frontier: Stochastic Frontier Analysis. Retrieved from: http://CRAN.R-Project.org/package=frontier R package version 1.1-0 (accessed July 2014) software

[4] Coelli, T., Rao, P., O'Donnell, C., Bat tese, G. (2005). . (2nd). New York: Springer. .

[5] Filippini, M., Hrovatin, N., Zoric, J. (2008). Cost eﬃ ciency of slovenian water distribution utilities: an application of stochas tic frontier methods. Journal of Productivity Analysis 29(2), 169-182.

[6] Fomby, T., Johnson, S., Hill, C. (1984). . . New York: Springer. .

[7] Greene, W. (1980a). Maximum likelihood estimation of econometric frontier functions. Journal of Econometrics 13(1), 27-56.

[8] Greene, W. (1980b). On the estimation of a ﬂexible frontier produc tion model. Journal of Econometrics 13(1), 101-115.

[9] Greene, W. (2008). . Fried, H., Lovell, K., Schmidt, S., eds. . New York: Oxford University Press. .92-150.

[10] Groß, J. (2003). Linear Regression. Lecture Notes in Statistics 175

[11] Hwang, G., Nettleton, D. (2003). Principal components re gression with data chosen components and related methods. Techno metrics 45(1), 70-79.

[12] Jolliffe, I. (1982). A note on the use of principal components in regres sion. Journal of the Royal Statistical Society. Series C (Applied Statistics) 31(3), 300-303.

[13] Jolliffe, I. (2002). . (2nd). New York: Springer. .

[14] Kumbhakar, S., Lovell, C. (2000). . . Cambridge: Cambridge University Press. .

[15] Mason, R., Gunst, R. (1985). Selecting principal components in regression. Statistics and Probability Letters 3(6), 299 -2301.

[16] Massy, W. (1965). Principal components regression in exploratory statistical research. Journal of the American Statistical Association 60(309), 234-256.

[17] Meeusen, W., van Den Broeck, J. (1977). Eﬃciency estimation from Cobb-Douglas production functions with composed error. Inter national Economic Review 18(2), 435-444.

[18] Puig-Junoy, J. (2001). Technical ineﬃciency and public capital in U.S. states: A stochastic frontier approach. Journal of Regional Science 41(1), 75-96.

[19] Stevenson, R. (1980). Likelihood functions for generalized stochastic frontier estimation. Journal of Econometrics 13(1), 58-66.

Una solución para la multicolinealidad en modelos de función de producción de frontera estocástica

Autores/as

DOI:

Palabras clave:

Resumen

Descargas

Biografía del autor/a

Elkin Castaño, Universidad de Antioquia

Santiago Gallón, Universidad de Antioquia

Citas

Descargas

Publicado

Cómo citar

Número

Sección

Licencia

Artículos más leídos del mismo autor/a

Introduction

I. The principal component solution

II. Simulation study

III. Application

Conclusions

Idioma

Enviar un artículo

Scores

Palabras clave