
CheXpert Medical Imaging Competition
CS 156b (2025)
This project tackled automated anomaly detection in chest radiographs using over 100 thousand images from the CheXpert dataset, a large-scale labeled dataset curated for real-world medical imaging research. Our goal was to predict the presence of 9 different conditions from chest X-rays (e.g., Cardiomegaly, Lung Opacity, Pleural Effusion) using supervised learning, submitting a CSV with probability predictions for each finding per image. Evaluation was based on mean squared error (MSE), scaled by class-wise variance.
​
Our team worked on the design, training, and evaluation of multiple deep learning models to identify optimal architecture-performance trade-offs:
-
Vanilla CNN: Built and trained a baseline convolutional model to establish a performance benchmark.
-
ResNet (Residual Networks): Implemented deeper architectures using skip connections (ResNet-18 and ResNet-50) to capture complex radiographic patterns.
-
DenseNet: Leveraged densely connected CNNs to enhance feature propagation and minimize vanishing gradients in deeper layers.
​
To optimize model performance, we iteratively tuned:
-
Learning rate schedules (step decay, cosine annealing)
-
Optimizer selection (Adam vs. SGD with momentum)
-
Batch size, weight decay, and dropout rates to reduce overfitting
-
Loss functions (tested both BCEWithLogits and class-weighted variants due to label imbalance)
We also incorporated early stopping and model checkpointing to manage training stability and maximize generalization performance.
​
In terms of data processing and augmentation, we:
-
Rescaled and normalized grayscale chest X-ray images to suit pretrained model input layers.
-
Applied data augmentation techniques such as random horizontal flips, rotations, and brightness adjustments to improve robustness.
​
For the evaluation phase, we:
-
Created scripts to compute scaled MSE across each of the 10 clinical labels.
-
Visualized label-wise prediction distributions and investigated error trends across patient subgroups.
-
Identified which conditions were most prone to false positives or false negatives and refined models accordingly.
​
This competition provided a hands-on opportunity to apply deep learning to a high-impact healthcare domain. We gained practical experience working with multi-label classification, medical image preprocessing, model performance debugging and optimization, and team-based ML pipeline development
TOOLS





