Surgical Planning Laboratory - Brigham & Women's Hospital - Boston, Massachusetts USA - a teaching affiliate of Harvard Medical School

Surgical Planning Laboratory

The Publication Database hosted by SPL

All Publications | Upload | Advanced Search | Gallery View | Download Statistics | Help | Import | Log in

Instrumentation Bias in the Use and Evaluation of Scientific Software: Recommendations for Reproducible Practices in the Computational Sciences

Department of Radiology and Medical Imaging, University of Virginia, Charlottesville, VA, USA.
Publication Date:
Front Neurosci
Volume Number:
Front Neurosci. 2013 Sep 9;7:162.
PubMed ID:
best practices, comparative evaluations, confirmation bias, open science, reproducibility
Appears in Collections:
U54 EB005149/EB/NIBIB NIH HHS/United States
Generated Citation:
Tustison N.J., Johnson H.J., Rohlfing T., Klein A., Ghosh S.S., Ibanez L., Avants B.B. Instrumentation Bias in the Use and Evaluation of Scientific Software: Recommendations for Reproducible Practices in the Computational Sciences. Front Neurosci. 2013 Sep 9;7:162. PMID: 24058331. PMCID: PMC3766821.
Downloaded: 749 times. [view map]
Paper: Download, View online
Export citation:
Google Scholar: link

The neuroscience community significantly benefits from the proliferation of imaging-related analysis software packages. Established packages such as SPM (Ashburner, 2012), the FMRIB Software Library (FSL) (Jenkinson et al., 2012), Freesurfer (Fischl, 2012), Slicer (Fedorov et al., 2012), and the AFNI toolkit (Cox, 2012) aid neuroimaging researchers around the world in performing complex analyses as part of ongoing neuroscience research. In conjunction with distributing robust software tools, neuroimaging packages also continue to incorporate algorithmic innovation for improvement in analysis tools. As fellow scientists who actively participate in neuroscience research through our contributions to the Insight Toolkit1 (e.g., Johnson et al., 2007; Ibanez et al., 2009; Tustison and Avants, 2012) and other packages such as MindBoggle,2 Nipype3 (Gorgolewski et al., 2011), and the Advanced Normalization Tools (ANTs),4 (Avants et al., 2010, 2011) we notice an increasing number of publications that intend a fair comparison of algorithms which, in principle, is a good thing. Our concern is the lack of detail with which these comparisons are often presented and the corresponding possibility of instrumentation bias (Sackett, 1979) where “defects in the calibration or maintenance of measurement instruments may lead to systematic deviations from true values” (considering software as a type of instrument requiring proper “calibration” and “maintenance” for accurate measurements). Based on our experience (including our own mistakes), we propose a preliminary set of guidelines that seek to minimize such bias with the understanding that the discussion will require a more comprehensive response from the larger neuroscience community. Our intent is to raise awareness in both authors and reviewers to issues that arise when comparing quantitative algorithms. Although herein we focus largely on image registration, these recommendations are relevant for other application areas in biologically-focused computational image analysis, and for reproducible computational science in general. This commentary complements recent papers that highlight statistical bias (Kriegeskorte et al., 2009; Vul and Pashler, 2012), bias induced by registration metrics (Tustison et al., 2012), and registration strategy (Yushkevich et al., 2010) and guideline papers for software development (Prlic and Procter, 2012).