Tuesday, May 6, 2014

Overanalysis and Analytic Quackery


Theoharakis and Skordia (2003) noted that “the recognition and development of an academic institution depends heavily on its faculty’s publication record in prestigious journals. As a result, an increased emphasis is placed on publishing in refereed journals and promotion criteria rest heavily on the faculty’s publication record.” The most prestigious scientific journals are more likely to accept for publication an article that includes a sophisticated-looking statistical model that serves as a hook. Thus, many scientific researchers are often under pressure to produce sophisticated-looking statistical models. Indeed, an article that used relatively simple statistical techniques would be difficult to pass through the review process of many scientific journals. Thus, statistical consultants who do not wish to be labeled as “incompetents” are obliged to come up with models that provide an air of sophistication.
Vardeman and Morris (2003) provided some advice regarding statistics and ethics for young statisticians: “Resolve that if you submit work for publication, it will be complete and represent your best effort. Submitting papers of little intrinsic value, half-done work, or work sliced into small pieces sent to multiple venues is an abuse of an important communication system and is not honorable scholarship…never borrow published/copyrighted words, even of your own authorship, without acknowledgement. To do so is plagiarism and is completely unacceptable,” etc. etc. etc. Vardeman and Morris (2003) go on to say: “Society also recognizes that when statistical arguments are abused, whether through malice or incompetence, genuine harm is done…Society disdains hypocrisy…(and)…has contempt for statisticians and statistical work that lack integrity…Principled people consistently do principled work, regardless of whether it serves their short-term personal interests. Integrity is not something that is turned on and off at one’s convenience. It cannot be generally lacking and yet be counted on to appear in the nick of time when the greater good calls.” Yada yada yada.
Society may have contempt for statisticians (and for statistical work that lacks integrity). Principled people may, in principle, consistently do principled work. The abuse of statistical arguments may occasionally cause genuine harm. However, society has considerable tolerance for hypocrisy. Integrity is frequently a matter of convenience--particularly when it comes to getting published in prestigious journals. And, plagiarism is generally well accepted, as long as it is done in a subtle manner. You don’t want to draw too much attention to it, so that no-one is likely to notice or care.
To provide a tangible example: a paper on the association between bovine-leukosis virus (BLV) and herd-level productivity on US dairy farms (Ott, Johnson and Wells, 2003) represents at least the fourth in a series of very similar statistical analyses that came from the same set of data (i.e., the Dairy ’96 Survey, which was conducted by the National Animal Health Monitoring System, NAHMS, of the United States Department of Agriculture, USDA). The first analysis, which examined the economics of Johne’s disease on dairy farms, was presented initially in a report that was published by the USDA (1997), and which is also available on line (http://www.aphis.usda.gov/vs/ceah/ncahs/nahms/dairy/Dairy96/DR96john.pdf). The same methods and results were presented again by Ott, Wells and Wagner (1999). Articles on economic impacts of Bovine Somatotropin (BST) (Ott and Rendleman, 2000) and bulk-tank somatic-cell counts (BTSCC) (Ott and Novak, 2001) followed. The statistical models of Ott, Johnson and Wells (2003) are given in Table 1, and the other statistical models appear in Table 2.
In most of the papers, the principal variable for analysis was what Ott, Johson and Wells (2003) termed the “Annual Value of Production” (AVP), which Ott, Wells and Wagner (1999) called “Annual Adjusted Value of Production,” and which Ott and Novak (2001) called the “Value of Dairy Herd Productivity.” AVP was derived on an annual per-cow basis as the sum of the value of milk production (milk priced at 28.6 cents/kg) and the value of newborn calves (valued at $50 each), minus the net replacement cost. The “net replacement cost” was the cost of replacements (priced at $1100 each), minus the value of cows sold to other producers (priced at $1100 each) and to slaughter ($400 for cows in good condition, $250 for poor-condition cows). Ott and Rendleman (2000) analyzed “Non-Milk Productivity” (AVP minus the value of milk production). In addition, Ott and Rendleman (2000), Ott and Novak (2001), and Ott, Johnson and Wells (2003) used milk production per cow as a variable for analysis.
Creating a dependent variable by combining dollar-values attributed to various input- and output-quantities (as Ott, Johnson and Wells, 2003, did for AVP) is a rather unusual technique for analyzing multi-output production. In theory, producers make production decisions based upon the prices (and other constraints) that they face. Prices that producers receive and pay may vary considerably from one producer to the next. One producer may make production decisions that are very different from another, but that are appropriate given that producer’s conditions. Assigning the same dollar-value to outputs for all producers, and then summing the results to generate a dependent variable for analysis, may lead to erroneous conclusions that some producers are achieving higher profits than others based upon certain independent variables when, in fact, all may be maximizing profit given their particular constraints.
In proceeding from Ott, Wells and Wagner’s (1999) models to the models of Ott and Rendleman (2000) (Table 2), the “Johne’s Disease” variables were dropped, and the functional form for percent BST use was transformed from the square root to a quadratic expression. Ott, Wells and Wagner (1999) chose a square-root representation for percent BST use because “initial analysis demonstrated a non-linear relationship between milk production and percent BST use,” and “in part because of the large number of herds that did no use any BST.” Ott and Rendleman (2000) used a quadratic term “to measure a potential declining marginal physical product of milk production as rBST increases.” Ott and Novak (2001) used a simple linear term for percent BST use. Because “Ott and Rendleman (2000) found that as the percentage of cows being administered BST rose, the associated marginal increase in milk yield became smaller,” Ott, Johnson and Wells (2003) reverted to a square-root representation for percent BST use.
A new variable introduced by Ott, Johnson and Wells (2003) was the percent of cows in third or greater lactation (via “piece-wise regression”). In addition, Ott, Johnson and Wells (2003) added two new “management index” variables that resulted from a “correspondence analysis” that combined 24 variables into 2. In previous analyses, the use of Dairy Herd Improvement Association (DHIA) records “served as a proxy measure for management capability” (Ott, Wells and Wagner, 1999). Ott and Novak (2001) stated that they had attempted to combine 18 variables of management practice into four management indices, using factor analyses, to account for the influence of management ability on AVP. Ott and Novak (2001) decided to use DHIA records as a measure of management ability because 83% of the increase in the R-squared value could be obtained from the use of DHIA records, and because including additional management variables reduced the number of respondents with complete information by 6%. The R-squared values for the various models were not substantially different across analyses for the same dependent variable (Table 2).
Ott, Johnson and Wells (2003) described the creation of the sample weights for use in the analysis. The sample weight indicates the number of farms in the population that each farm in the sample represents. Because large farms (that account for a large portion of the animal population) are sampled at a much higher rate than the more numerous small farms (that account for a small proportion of the animals), large farms typically receive much smaller sample weights than small farms in NAHMS national studies (Losinger, 2002). Thus, responses from small farms tend to have a greater impact on farm-level estimates than responses from large farms. For animal-level estimates from NAHMS surveys, it is customary to modify the sample weights to reflect the number of animals (rather than the number of farms represented) by multiplying the sample weight by the number of animals (Losinger, 2002). Thus, large farms tend to receive much higher animal-level weights (which are emblematic of the number of animals that each participating operation represents) than small farms. Ott, Johnson and Wells (2003) used farm-level rather than animal-level weights in their AVP and milk-production models. The model for milk production is in terms of kg per cow per operation (rather than kg per cow). Using farm weights for animal-level estimates can yield highly inaccurate results.
Ott, Johnson and Wells’ (2003) computation of the reduction in equilibrium milk production was based on a $59 decline in AVP for BLV-positive herds, in addition to the demand and price elasticities for milk (Ott, Johnson and Wells, 2003). Ott, Johnson and Wells (2003) should have used the decline in milk production for BLV-positive herds, rather than the decline in AVP, because they were analyzing changes in equilibrium milk production. Analyses for the demand and supply of calves and culled cows should have been performed separately.
Ott, Johnson and Wells’ (2003) determination of a $285 million economic-surplus loss for producers, a $240 million economic-surplus loss for consumers, and a consequent sum-total loss to the economy of $525 million (due to reduced milk production in BLV-positive herds), differs substantially from what economic theory would ordinarily suggest. The presence of BLV in dairy cows may reduce milk production. Reduced milk production causes the equilibrium market price for milk to rise while the quantity falls. While a loss in economic surplus accrues to consumers, a portion of this loss is transferred to producers as an economic gain (Nicholson, 1995). Therefore, the loss to the economy is not the sum-total of economic-surplus losses experienced by producers and consumers. Ott, Johnson and Wells (2003) failed to provide precise details on how exactly they measured economic losses from reduced milk production due to BLV. Ott, Johnson and Wells (2003) made reference to the procedure described by Ott, Seitzinger and Hueston (1995), but state that they did not include “losses associated with potential lost international trade” (which Ott, Seitzinger and Hueston, 1995, had emphasized). The fact that Ott, Johnson and Wells’ (2003) estimate of total loss to the economy (as a result of reduced milk production attributed to BLV in dairy cows) equaled the sum of the changes in producer and consumer surplus, suggests that Ott, Johnson and Wells (2003) may have either double-counted the economic surplus that transferred between consumers and producers as a result of reduced milk production attributed to BLV in dairy cows, or ignored the transferred surplus when computing the change in either producer or consumer surplus.
Ott, Johnson and Wells (2003) determined “marginal effects associated with a percentage-point change in herd-level seropositivity of BLV” as the coefficient associated with the “BLV-prevalence” variable that resulted from replacing their model’s dependent variable (AVP) with the various individual components of AVP (in terms of both quantity and attributed-dollar value) (Table 3). Ott, Wells and Wagner (1999) followed the same procedure to establish the “marginal impact of Johne’s disease on dairy production parameters” (Table 4). This procedure is inappropriate, because factors that influence one component of production would be expected to differ substantially from factors that influence another. Some components, particularly the number of calves born, would not be expected to have a normal distribution (therefore, a linear-regression model would not apply). A Poisson distribution would have been more likely for this variable, and the authors should have considered a Poisson regression. The R-squared values were quite low for some of the components (0.08 for the number of calves born, 0.09 for cow mortality, and 0.11 for cows sold to other producers), and demonstrated that the predictive power of the model of Ott, Johnson and Wells (2003) was rather poor when applied to many of the individual components of AVP.
The models of Ott, Wells and Wagner (1999) had Johne’s disease in terms of positive or negative herds, and in terms of no culled cows with clinical signs, >0 but < 10% of culled cows with clinical signs, and >10% of culled cows with clinical signs (Table 2). The models of Ott and Novak (2001) were based on a low, medium and high differentiation for BTSCC. Some milk processors pay producers less when BTSCC is elevated, or pay premiums for milk with low BTSCC levels (Ott and Novak, 2001). This implies different demand curves for milk based on the level of BTSCC. Differences in the construction of the variable of interest, in addition to the fact that this was the first analysis that incorporated elasticities, render questionable the comparisons offered by Ott, Johnson and Wells (2003).
Results from separate model equations analyzing the economic costs of Johne’s disease, BTSCC and BLV do not imply that the economic benefit of eliminating all three conditions would equal the sum of the economic costs associated with each condition. Each regression model invokes the ceteris paribus assumption. If Johne’s Disease is eliminated before (or in tandem with) BTSCC and BLV, then ceteris are no longer paribus. Estimating the cumulative impact of eliminating all of these conditions would require a model that incorporated all of these variables, and that included a covariance analysis. Ott, Johnson and Wells (2003) did perform a multicollinearity test for their explanatory variables (which included BLV and BTSCC, but not Johne’s disease), and considered multicollinearity “not to be a problem” because “the maximum association of any single explanatory variable with the others was <50 also="" and="" appeared="" applying="" associations="" blv.="" blv="" both="" btscc="" but="" disease="" examination="" examined="" far="" font="" found="" has="" included="" johne="" multicollinearity="" no="" not="" of="" ott="" s="" same="" so="" test="" that="" the="" wagner="" wells="" were="">
Finally, the limitations inherent in performing repeated analyses from the same set of data must be vigorously emphasized. When one carries out multiple analyses to develop models that fit the data well, the ability of the models to make predictions from new data may be considerably less than the R-squared values would suggest (Neter and Wasserman, 1974). The models of Ott, Johnson and Wells (2003) (and of the preceding economic analyses from the Dairy ’96 Study) do indicate some relationships between disease and production. However, over-analysis and excessive data-tweaking can cause “statistical significance” to lose its meaning, however impressive the final results may appear.
Some researchers may question the ethics of using similar statistical models multiple times. For example, Vardeman and Morris (2003) state: “Resolve that if you submit work for publication, it will be complete and represent your best effort. Submitting papers of little intrinsic value, half-done work, or work sliced into small pieces sent to multiple venues is an abuse of an important communication system and is not honorable scholarship.” Many of the methods and results that formed the basis of the four economics articles that came from the NAHMS Dairy ’96 Study were very similar, and probably could have been combined into one paper. Vardeman and Morris (2003) also say: “never borrow published/copyrighted words, even of your own authorship, without acknowledgement. To do so is plagiarism and is completely unacceptable.” Parts of the descriptions of the analytic procedures in various places across the four economics papers from the NAHMS Dairy ’96 Study are almost identical. For example, in describing the multicollinearity tests, Ott, Wells and Wagner (1999) wrote: “The maximum association of any one explanatory variable with the others was <50 1999="" 50="" a="" added="" already="" analysis="" and="" annual="" any="" are="" associated="" assumed="" aximum="" be="" been="" begun="" by="" citing="" computing="" correspondence="" could="" cows="" dairy="" described="" design="" detail="" easier="" especially="" explain="" follow="" font="" for="" from="" greater="" had="" have="" if="" in="" information="" is="" johne="" johnson="" lactation="" less="" management="" methods="" models="" more="" multicollinearity="" not="" of="" or="" ott="" percent="" piece-wise="" plus="" practices="" previous="" problem.="" production="" reduction="" regression="" removed="" rendleman="" repeating="" restating="" s-disease="" selection="" simply="" space="" stated="" study="" test="" than="" that="" the="" then="" they="" third="" this="" thus="" to="" two="" unnecessary.="" value="" variable="" variables.="" variables="" wagner="" was="" wells="" which="" without="" work="" would="" wrote:="">
Finally, most scientific researchers would agree with that statement of the National Institute of Standards and Technology (1994) that “a measurement result is complete only when accompanied by a quantitative statement of its uncertainty. The uncertainty is required in order to decide if the result is adequate for its intended purpose and to ascertain if it is consistent with other similar results.” Ott, Johnson and Wells (2003) concluded that BLV in dairy cows caused a $525 million loss to the economy because of reduced milk production, and provided no statement of their estimate’s uncertainty. The computation was based partially on an elasticity of demand (for milk) provided by Wohlgenant (1989), and on an elasticity of supply (for milk) provided by Adelaja (1991), neither of whom examined the uncertainty of their elasticities. Computer programs are widely available for computing uncertainties. For example, @RISK 4.5 (Palisade Corporation, 2002) allows users to specify the uncertainty involved in all key variables, with numerous probability density functions. The GUM Workbench (Metrodata GmbH, 1999) follows guidelines established by the European Co-operation for Accreditation (1999) for computing, combining, and expressing uncertainty in measurement.

 

References

Adelaja, A.O., 1991. Price changes, supply elasticities, industry organization, and dairy output distribution. Am. J. Agric. Econ. 73, 89-102.
Debertin, D.L., 1986. Agricultural Production Economics. Macmillan Publishing Company, New York.
European Co-operation for Accreditation, 1999. Expression of the Uncertainty of Measurement in Calibration. EA-4/02, European Co-operation for Accreditation, Utrecht, The Netherlands. 79 pp.
King, L J., 1990. The National Animal Health Monitoring System: fulfilling a commitment. Prev. Vet. Med. 8, 89-95.
Losinger, W.C., 2002. A look at raking for weight adjustment. Stats: The Magazine for Students of Statistics, 33(1): 8-12.
Metrodata GmbH, 1999. GUM Workbench: The Tool for Expression of Uncertainty in Measurement. Manual for version 1.2 English Edition. Teknologisk Institut, Taastrup, Denmark.
National Institute of Standards and Technology, 1994. Guidelines for evaluating and expressing the uncertainty of NIST measurement results. NIST Technology Note 1297. National Institute of Standards and Technology, Gaithersburg, Maryland, USA.
Netter, J., Wasserman, W., 1974. Applied Linear Statistical Models. Richard D. Irwin, Inc., Homewood, Illinois.
Nicholson, W., 1995. Microeconomic Theory Basic Principles and Extensions, 6th edn. Dryden Press, Fort Worth.
Ott, S.L., Johnson, R., Wells, S.J., 2003. Association between bovine-leukosis virus seroprevalence and herd-level productivity on US dairy farms. Prev. Vet. Med., 61, 249-262.
Ott S.L., Novak ,P.R., 2001. Association of herd productivity and bulk-tank somatic cell counts in US dairy herds in 1996. J. Am. Vet. Med. Assoc. 218, 1325-1330.
Ott, S.L., Rendleman, C.M., 2000. Economic impacts associated with bovine somatotropin (BST) use based on a survey of US dairy herds. AgBioForum 3, 173-180.
Ott, S.L., Seitzinger, A.H., Hueston, W.D., 1995. Measuring the national economic benefits of reducing livestock mortality. Prev. Vet. Med. 24, 203-211.
Ott, S.L., Wells, S.J., Wagner, B.A., 1999. Herd-level economic losses associated with Johne’s disease on US dairy operations. Prev. Vet. Med. 40, 179-192.
Palisade Corporation, 2002. Guide to Using @RISK Risk Analysis and Simulation Add-In Software for Microsoft Excel, Version 4.5. Palisade Corporation, Newfield, New York.
Pollock, S., 2002. Recursive Estimation in Econometrics. Queen Mary University of London, Working Paper No. 462.
Theoharakis, V., and Skordia, M. (2003), “How do Statisticians Perceive Statistical Journals?” The American Statistician, 57, 115-123.
US Department of Agriculture, Animal and Plant Health Inspection Service, 1996. Part I: Reference of 1996 Dairy Management Practices. USDA:APHIS:VS, Centers for Epidemiology and Animal Health, Fort Collins, Colorado.
US Department of Agriculture, Animal and Plant Health Inspection Service, 1997. Johne’s disease on US dairy operations. USDA:APHIS:VS, Centers for Epidemiology and Animal Health, Fort Collins, Colorado.
Vardeman, S.B., Morris, M.D., 2003. Statistics and Ethics: Some Advice for Young Statisticians. The American Statistician 57, 21-26.
Wineland, N.E., Dargatz, D.A., 1998. The National Animal Health Monitoring System a source of on-farm information. Veterinary Clinics of North America 14, 127-139.
Wohlengant, M.K., 1989. Demand for farm output in a complete system of demand functions. Am. J. Agric. Econ. 71, 241-252.

Table 1. Model showing associations between explanatory variables and annual value of production and milk production. Standard errors are in parentheses.

Annual value of production Milk production
Variable (US$ per cow) (kg/cow)
BLV prevalence (% seropositive) -1.28 (0.49) -4.7 (1.7)

Herd size (natural log) 65.33 (21.57) 220.9 (75.2)

Region
Midwest Reference Reference
West 9.37 (44.95) 49.3 (156.4)
Southeast -157.06 (67.69) -547.8 (220.4)
Northeast -12.68 (34.23) -54.0 (117.0)

Bulk-tank somatic cell count (thousands of cells/ml)
Low (<200 font="" reference="">
Medium (200-399) -75.45 (32.26) -229.9 (109.7)
High (400+) -261.94 (43.26) -759.0 (146.5)

Intensive pasture grazing (pastures supply >90%
(of summer forage) -107.33 (42.85) -409.1 (145.7)

% of cows administered rBST
Square root 29.45 (5.06) 110.7 (16.5)

% Holstein breed 7.30 (0.54) 25.5 (1.9)

Days dry, >70 days -78.63 (39.28) -280.7 (133.0)

Cows in third lactation
% of herd 6.24 (2.30) 11.9 (7.9)
% in excess of 37% -10.91 (3.14) -36.5 (10.6)

Management practices
Dimension 1 -207.29 (35.87) -755.2 (123.8)
Dimension 2 -213.24 (52.90) -867.7 (179.2)

>90% of cows registered 76.70 (40.57) 193.5 (141.8)

% change in dairy cow inventory -9.87 (0.73) -4.9 (2.6)

Intercept 1139.60 (124.72) 5014.6 (436.5)

R-squared 0.534 0.535
--------------------------------------------------------------------------------------------------------------------
Source: Ott, Johnson and Wells, 2003