Project Overview

The CAMELS Multifield Dataset Analysis project utilizes deep learning techniques to predict fundamental cosmological parameters from simulated astrophysical datasets. Focusing on a subset of the CAMELS dataset, this analysis explores the potential of machine learning in uncovering insights about the universe.

This project is a demonstration of interdisciplinary research at the intersection of software development, data analysis, and computational physics.

Key Objectives:

  • Train machine learning models to predict cosmological parameters from individual 2D maps.
  • Explore the effectiveness of combining multiple maps for enhanced predictions.
  • Experiment with transferring learning across different astrophysical fields.

The full project report can be found here or viewed in the pdf viewer further down this page.

Dataset Description

The project uses a subset of the IllustrisTNG simulation from the CAMELS dataset, focusing on 2D maps where specific astrophysical parameters are varied. The key predicted cosmological and astrophysical parameters can be seen in the table below as well as the astrophysical fields the model was trained on.

Parameter Descriptions

Variable Description
Ωm Matter density in the universe
σ8 Smoothness of the matter distribution
ASN1, ASN2 Parameters controlling supernova feedback
AAGN1, AAGN2 Parameters controlling black-hole feedback

Astrophysical Fields

Field Prefix
Gas density Mgas
Gas velocity Vgas
Gas temperature T
Gas metallicity Z
Neutral hydrogen density HI
Electron number density ne
Magnetic fields B
Magnesium over Iron MgFe
Dark matter density Mcdm
Dark matter velocity Vcdm
Stellar mass density Mstar
Total matter density Mtot
Simulation video courtesy of Illustris Collaboration. Time evolution of four separate sub-regions of the Illustris volume, from high redshift to z=0, showing gas temperature. Each region is several Mpc on a side, and each samples a different over-density relative to the cosmic mean, from highly over-dense (top left) to under-dense (counter-clockwise).

Methodology

The project methodology revolves around deep learning model development, leveraging PyTorch for training. The workflow included:

  1. Data Preparation:
    • Loaded and normalized the dataset using a custom CAMELS_Dataset class.
    • Applied data augmentation techniques but found no significant improvement in accuracy.
  2. Model Selection and Training:
    • Developed a deep learning model using PyTorch for image-to-value regression.
    • Conducted extensive hyperparameter tuning to optimize model performance.
    • Evaluated the models based on both validation loss and a custom cost function to balance model size and accuracy.
  3. Multi-Map Training:
    • Experimented with training models on combinations of multiple maps, finding that including additional maps as channels yielded better results.

Network Architecture and Hyperparameter Tuning Figures:

Network Architecture Hyperparameter tuning flow chart

Results

The analysis revealed the potential of deep learning models in accurately predicting cosmological parameters from astrophysical datasets. Key findings include:

  • Effective Multi-Map Training: Adding multiple maps as channels enhances prediction accuracy.
  • Model Performance: Balanced between validation loss and custom cost functions.

Example training graph for model Test loss for model trained on one map for other maps Difference between models trained on more images or more channels

Report Summary

The final report (Download PDF) provides a comprehensive analysis of the project, covering:

  • Introduction: Overview of objectives and problem statement.
  • Data: Detailed preprocessing steps and dataset exploration.
  • Methodology: In-depth discussion of the models and training process.
  • Results: Analysis of model performance and key findings.
  • Conclusion: Summary and future work recommendations.

Feedback

Feedback on Report:

  • Strengths:
    • Professionally formatted with clear figures, captions, and tables.
    • Justified model choice with a well-described training process.
    • Clear presentation of results.
  • Areas for Improvement:
    • Stronger connection with the underlying physics and parameters could enhance the report.
    • Inclusion of visualizations of the input data maps is recommended.

Feedback on Code

  • Strengths:
    • Reasonably well-formatted and easy-to-navigate notebooks.
    • Clear code with a good use of machine learning techniques.
  • Areas for Improvement:
    • More markdown or commenting to clarify the purpose of each code block would be beneficial.

Potential Extensions

Future work could explore:

  • Transfer Learning: Evaluating models trained on one field and tested on another.
  • Generative Modeling: Developing models to predict one field from another.