The prevalence of smartphones and wearable devices and the widespread use of electronic health records have led to a surge in multimodal health data that is noisy, non-uniform, and collected at an unprecedented scale. My research focuses on machine learning techniques that learn expressive representations of multimodal, heterogeneous data for biomedical predictive models designed to interact with domain experts. In the first part of the talk, the focus is on techniques for partitioning data and leveraging low-dimensional structure to enable visualization and annotation by humans. The latter part addresses the construction of hybrid models that combine deep learning with random forests, and the fusing of structured information into temporal representation learning. This array of methods obviates the need for feature engineering while improving on the state of the art for diverse biomedical applications. Use cases include the classification of alerts in a vital sign monitoring system, the prediction of surgical outcomes in children with cerebral palsy, and forecasting the progression of osteoarthritis from subjects’ physical activity. Finally, I will present the use of weak supervision for the classification of rare aortic valve malformations from unlabeled cardiac MRI sequences.