Project Overview

The CAMELS Multifield Dataset Analysis project utilizes deep learning techniques to predict fundamental cosmological parameters from simulated astrophysical datasets. Focusing on a subset of the CAMELS dataset, this analysis explores the potential of machine learning in uncovering insights about the universe.

This project is a demonstration of interdisciplinary research at the intersection of software development, data analysis, and computational physics.

Key Objectives:

Train machine learning models to predict cosmological parameters from individual 2D maps.
Explore the effectiveness of combining multiple maps for enhanced predictions.
Experiment with transferring learning across different astrophysical fields.

The full project report can be found here or viewed in the pdf viewer further down this page.

Dataset Description

The project uses a subset of the IllustrisTNG simulation from the CAMELS dataset, focusing on 2D maps where specific astrophysical parameters are varied. The key predicted cosmological and astrophysical parameters can be seen in the table below as well as the astrophysical fields the model was trained on.

Parameter Descriptions

Variable	Description
Ω_m	Matter density in the universe
σ₈	Smoothness of the matter distribution
A_SN1, A_SN2	Parameters controlling supernova feedback
A_AGN1, A_AGN2	Parameters controlling black-hole feedback

Astrophysical Fields

Field	Prefix
Gas density	Mgas
Gas velocity	Vgas
Gas temperature	T
Gas metallicity	Z
Neutral hydrogen density	HI
Electron number density	ne
Magnetic fields	B
Magnesium over Iron	MgFe
Dark matter density	Mcdm
Dark matter velocity	Vcdm
Stellar mass density	Mstar
Total matter density	Mtot

Simulation video courtesy of Illustris Collaboration. Time evolution of four separate sub-regions of the Illustris volume, from high redshift to z=0, showing gas temperature. Each region is several Mpc on a side, and each samples a different over-density relative to the cosmic mean, from highly over-dense (top left) to under-dense (counter-clockwise).

Methodology

The project methodology revolves around deep learning model development, leveraging PyTorch for training. The workflow included:

Data Preparation:
- Loaded and normalized the dataset using a custom CAMELS_Dataset class.
- Applied data augmentation techniques but found no significant improvement in accuracy.
Model Selection and Training:
- Developed a deep learning model using PyTorch for image-to-value regression.
- Conducted extensive hyperparameter tuning to optimize model performance.
- Evaluated the models based on both validation loss and a custom cost function to balance model size and accuracy.
Multi-Map Training:
- Experimented with training models on combinations of multiple maps, finding that including additional maps as channels yielded better results.