Distributed Analytics and Machine Learning for Large-Scale Medical Image Processing: The scale of data being generated in medicine and research can easily overwhelm typical analytic capabilities. This is particularly true with MRI/fMRI scanning, where: 1) large file sizes often preclude studies of the magnitude needed for overcoming the inherent noise, 2) currently no gold-standard protocol exists for extraction of standardized characteristics from MRI/fMRI files, and, 3) traditional methods for group-wise comparison can often result in spurious findings. Here we have addressed these challenges by generating an easily deployable, scalable image processing pipeline capable of quickly permuting multiple options for fMRI/MRI processing, determining the optimal set of parameters for each study. Uniquely, our approach leverages the rapid model building capabilities of our real time machine learning software to iterate through normalization parameters for each disease class. Our optimized pipeline exceeded classification accuracy seen with previous analyses of comparable scope and allowed easy integration with other medical data types (genome sequence, phenotypic, and metabolic data) allowing generation of more comprehensive disease classification models. The ability to standardize and pre-process imaging data for machine learning, no matter the source and type, and effectively combine it with other data types is a powerful capability and holds promise for the future of diagnostics and precision medicine.