Welcome to AI4ChemMat Hands-On Series

Hands-On Series organized by Chemistry and Materials working group at Argonne Nat Lab

Summer and Fall 2023

Motivation

The integration of artificial intelligence (AI) into science has disrupted almost every domain, from accelerating the discovery of new materials to inspiring innovation in solutions to critical problems. In less than a decade, the synergy of AI with chemistry and materials science has gained significant momentum in challenging tasks, such as exploring the chemical compound space, revealing complex patterns in atomistic models, accelerating simulations, or predicting the properties of experimentaly unknown molecules. This has lead to an increase in publications every year and emerging new peer-reviewed journals focused on this topic.

We are organizing a virtual hands-on training series to create a common ground in the chemistry and materials science community to discuss and learn basic elements of AI/ML, and prepare the participants to identify opportunities for adoption of this technology. Overall, this workshop would foster the discussion of AI/ML within ANL and with external collaborators, and, more importantly, spark new inter divisional collaborations. This workshop would promote teamwork by solving small projects in groups. This workshop would be divided into about 6 presentations during the summer of 2023. Each lecture would be composed of an introductory presentation and a hands-on session, for approximately 90 min.

The main difference with other similar events, is that this will have demonstrations for using codes and share ideas on how to transfer this knowledge to related projects and ongoing research.

The target audience would be domain experts with beginner and intermediate level of experience in data science and programming. The lectures would be self-contained and could be followed on Google Colab and jupyter.alcf.anl.gov platforms. This workshop seeks to supplement other ALCF workshops, but ours has a focus on chemistry and materials science.



Topics to be discussed


  • Datasets - Where to find them, common Formats, using API, how to compile them, etc.
  • Fingerprints and descriptors (MorganFP/rdkit - molecular properties - MOLRES, SOAP, Belher, GCN)
  • Unsupervised learning - Dimension reduction - Data compression - Molecular maps
  • Sampling datasets
  • Intro to ML methods (Gaussian Regression, Kernel Ridge, Random Forest, etc)
  • Intro to Deep Leading (Artificial Neural Networks, CNN, VAE, etc)
  • Structure/property training and prediction
  • Interatomic potentials
  • Data mining
  • Parsers
  • Workflows

Schedule

August - September 2023, Titles and Abstracts here Here


  • August 16, Hyun Park, Argonne National Laboratory & University of Illinois at Urbana - Champaign Campus

    Title: End-to-end AI Framework for Interpretable Prediction of Molecular and Crystal Properties

    Abstract: In this talk, I will target audience who are interested in understanding the basics of how to train, perform hyperparameter search and infer an ML model for molecular structures such as small molecules or MOFs. Moreover, with a bit of Python programming knowledge, my AI framework can help users who want to learn how to visualize learned molecular representations and highlight atoms important for prediction. I also demonstrate how ML potential can be used to perform molecular dynamics (MD). Overall, the audience can expect to learn comprehensive ML techniques developed and applied for molecular studies.

    Bio: I am a 5th year PhD candidate at Professor Emad Tajkhorshid’s lab in Biophysics and Quantitative Biology program at University of Illinois at Urbana-Champaign. At Argonne National Lab, I work with Dr. Eliu Huerta on applications at the interface of AI and supercomputing for physics and biophysics. My main research interests are developing and applying machine learning (ML) methods to molecular structures. I have applied ML methods to proteins, polymers and metal-organic frameworks (MOF). I like playing tennis during spare time.

  • August 30, Daniil Boiko, Carnegie Mellon University

    Title: Emergent autonomous scientific research capabilities of large language models

    Abstract: In this talk, we will discuss an intelligent agent system that integrates multiple large language models for autonomous design, planning, and execution of scientific experiments. We will demonstrate the Agent's scientific research abilities using several examples, with the most complex one involving the successful execution of catalyzed cross-coupling reactions. Lastly, we address the safety concerns related to such systems and suggest measures to prevent their potential misuse.

    Bio: Daniil Boiko obtained his MSc in organic chemistry from Lomonosov Moscow State University, researching machine learning applications in chemistry, including electron microscopy, mass spectrometry, and reaction discovery. He also worked at VK, developing machine learning models for web search. Now, he's pursuing a PhD in Chemical Engineering at Carnegie Mellon University, focusing on molecular machine learning, biocatalyst discovery recommender systems, and language model applications in natural sciences.

  • September 6, Esther Heid, Technical University of Vienna

    Title: Deep learning of reaction properties via graph-convolutional neural nets

    Abstract: Machine learning models are very successful in predicting various chemical properties. Graph-convolutional neural networks (GCNNs) are routinely used for the prediction of molecular properties, but their application to chemical reactions is largely unexplored. GCNNs allow for a learned extraction of important characteristics of a molecule and enable end-to-end learning, instead of relying on expert, system-dependent knowledge. However, the properties of chemical reactions, i.e. the combination of reactant and product molecules, are not readily accessible with current GCNNs which are designed to take molecular graphs as input. Recently, GCNNs based on the condensed graph of reaction (CGR) were shown to unlock the full potential of GCNNs also for reactions, where reactants and products are merged into a single pseudo-molecular graph, i.e. an artificial graph transition state. In this workshop, the anatomy of molecular GCNNs will be discussed in detail, as well as the changes necessary to encode reactions instead of molecules, including hands-on exercises to build your own reaction GCNN. Compared to previous approaches, GCNNs on CGRs offer a comparable or better performance with a lower number of parameters. We showcase the performance on different tasks, such as the prediction of barrier heights or rate constants, as well as the chemo- and regioselectivity of reactions.

    Bio: Esther Heid obtained her Bachelor’s (2014), Master’s (2016) and PhD (2019) degree in Chemistry from the University of Vienna, Austria. Her thesis focused on the molecular dynamics simulation of soft matter, as well as quantum mechanical calculations for obtaining force field parameters. In 2020 she joined the Massachusetts Institute of Technology, holding an Erwin-Schroedinger Postdoctoral Fellowship from the Austrian Science Fund, which enables her to conduct research on the development of computer-aided tools for finding novel multi-enzyme networks which yield a specified target molecule. The project utilizes recent developments in machine learning, bioretrosynthesis, and cheminformatics, and aims toward a more efficient, selective and environmentally favorable synthesis of compounds through the inclusion of biocatalytic transformations. A major part of the project is concerned with developing new machine learning methods for molecular and reaction property predictions. Her postdoc fellowship includes a one-year return-phase in Austria to finish up the project, which she started in 2022 at the Vienna University of Technology.

  • October 18, Lars Leon Schaaf, University of Cambridge

    Title: Machine Learning Force Fields for Heterogeneous Catalysis

    Abstract: Machine learning force fields (MLFFs) are set to become an indispensable tool in computational catalysis. In this talk, we provide a detailed walkthrough on how to train an MLFF to accurately predict energy barriers for catalytic reaction pathways. We demonstrate the capabilities of the resulting interatomic potential that offers near ab-initio accuracy at a fraction of the cost. Specifically, we illustrate that MLFFs not only speed up routine catalytic tasks by orders of magnitude but also allow for a more realistic treatment of catalytic systems, identifying lower energy barriers and capturing finite temperature effects. We also present a Jupyter notebook that highlights the simplicity of training a state-of-the-art many-body equivariant graph neural network, namely MACE. The capacity of MLFFs to deepen our understanding of extensively studied catalysts emphasizes the importance of fast and accurate alternatives to direct ab-initio simulations. Automated training procedures are paramount in enhancing the accessibility of MLFFs for both academic and industrial applications, and for effective use of HPC resources.

    Bio: Lars is a 4th-year PhD student specializing in machine learning force fields with an emphasis on catalysis and non-local effects. His academic background is rooted in theoretical physics, which he studied at the University of Birmingham with a concentration on Astrophysics. During his internship at the Max Plank Institute for Nuclear Physics, Lars made his first contact with scientific computing while working on a high energy camera that is set to observe x-rays emitted by cosmic particle accelerators. Changing to the University of Cambridge for his masters, Lars started focusing on condensed matter physics with his thesis on quantum information. Here he discovered his passion for computational modelling at the atomic scale.

  • November 8, Venkata Surya Chaitanya Kolluru & Joshua Paul, Argonne National Laboratory

    Title: Structure determination of nanoscale materials using theory and experimental characterization data

    Abstract: The atomistic structure determines the stability and properties of a material and its potential use in applications. We develop software tools such as Ingrained and FANTASTX (Fully Automated Nanoscale To Atomistic Structure from Theory and eXperiments) to find the atomistic structure from experimental data. Ingrained software can construct a grain boundary structure or a surface structure based on the experimentally obtained TEM or STM images, respectively. And FANTASTX is a multi-objective evolutionary algorithm that helps find the thermodynamically or kinetically stabilized structures observed experimentally. In this talk, we will show examples of – the Ingrained-STM simulation tool with (111) Cu2O and CdTe grain boundary structures created using Ingrained-TEM. We also show the FANTASTX tool to search for the tellurene atomistic structure at the interface of CdTe grain boundary system. These tools provide a path to understand complex mechanisms in experimental systems using theory and further allow to tailor the local structure to the required effect.

    Bio:
    Venkata Surya Chaitanya Kolluru is a Postdoc at the Center for Nanoscale Materials at Argonne, working with Dr. Maria Chan. He completed his Ph.D. in Materials Science and Engineering at the University of Florida in 2021. His research focuses on combining atomistic simulation methods with AI/ML and computer vision tools to address fundamental materials challenges such as structure inversion from experimental characterization data, materials discovery, and theoretical characterization of complex nanoscale materials systems.
    Joshua Paul is a Postdoc joint appointed at Northwestern University and Argonne National Laboratory under Dr. Maria Chan. After graduating from the University of Florida in 2020 with a Ph.D. in Materials Science and Engineering, he joined the Center for Nanoscale Materials. His research focuses on high-throughput computational methods for materials discovery and characterization. By utilizing Density Functional Theory and experimental results, the conditions of materials interfaces and surfaces are better understood, characterizing them with greater certainty than either approach alone.

  • November 15, Aikaterini Vriza , Argonne National Laboratory

    Title: Extracting and utilizing multimodal datasets of images and text with large language models

    Abstract: With the recent exponential growth in publication rates, it has become impossible for a scientist to keep up with all publications related to a specific topic. Although there are notable efforts to automate text parsing from literature, there are many instances where important information is communicated through images or tables in papers.1 In this talk, I will present the latest developments in two software tools developed at the Center of Nanoscale Materials (CNM): i) EXSCLAIM! for data mining from scientific literature2, and ii) Plot2Spectra for image segmentation related to spectral images, with the aim of creating metadata.3 EXSCLAIM! has been enhanced with Large Language Models (LLMs), i.e., ChatGPT and appropriate prompt engineering to extract image-text pairs from scientific journals, which can be foundational for creating multimodal models and advancing semantic searches. In this presentation, I will demonstrate various applications of the extracted multimodal datasets in building knowledge graphs, conducting semantic searches, and performing topic modelling. Additionally, I will illustrate how to utilize the image segmentation workflow in Plot2Spectra to extract additional metadata and create datasets suitable for machine learning (ML) and high-throughput experimentation.

    • (1) Olivetti, E. A.; Cole, J. M.; Kim, E.; Kononova, O.; Ceder, G.; Han, T. Y.-J.; Hiszpanski, A. M. Data-Driven Materials Research Enabled by Natural Language Processing and Information Extraction. Appl Phys Rev 2020, 7 (4), 041317. https://doi.org/10.1063/5.0021106.
    • (2) Schwenker, E.; Jiang, W.; Spreadbury, T.; Ferrier, N.; Cossairt, O.; Chan, M. K. Y.; Chan, M. EXSCLAIM!-An Automated Pipeline for the Construction of Labeled Materials Imaging Datasets from Literature. Patterns (2023). https://arxiv.org/abs/2103.10631.
    • (3) Jiang, W., Li, K., Spreadbury, T., Schwenker, E., Cossairt, O., & Chan, M. K. Y. (2022). Plot2Spectra: an automatic spectra extraction tool. Digital Discovery, 1(5), 719–731. https://doi.org/10.1039/d1dd00036e.

    Bio: Aikaterini Vriza is a postdoctoral appointee at the Center of Nanoscale Materials at Argonne National Laboratory. She obtained her PhD from the Material Innovation Factory at the University of Liverpool in 2022 and a Master in Green Chemistry and Sustainable Industrial Technology from the University of York. Prior to that she was an Aviation engineer in the Hellenic Airforce. Her research expertise lies between AI/ML, ‘green’ chemistry, and laboratory automation and has worked on several related projects in both industrial and academic settings.

Registration

Registration is free and everybody is welcome. CLOSED for 2023!

https://events.cels.anl.gov/event/413/

Contact us

Website : ai4chemmat.github.io

Github : Argonne-lcf/ai4chemmat

Email : cps.chemmat@anl.gov

Code of conduct

All attendees, speakers, and sponsors, at the AI4ChemMat Hands-On Series and any associated event are required to agree with the following code of conduct.

Organizers will enforce this code throughout the event. We are expecting cooperation from all participants to help ensure a safe environment for everybody: Be excellent to each other, show empathy, and help make this a safe space to explore tangible, equitable solutions.

AI4ChemMat Hands-On Series is dedicated to providing a harassment-free experience for everyone, regardless of gender, age, sexual orientation, disability, physical appearance, body size, race, ethnicity, experience, or religion (or lack thereof). We do not tolerate harassment of meeting participants in any form. Harassment includes, but is not limited to any inappropriate actions or statements based on individual characteristics such as age, race, ethnicity, sexual orientation, gender identity, gender expression, marital status, nationality, political affiliation, ability status, educational background, or any other characteristic protected by law. Participants asked to stop any harassing behavior are expected to comply immediately. If a participant engages in harassing behavior, the workshop organizers may take any action they deem appropriate, including warning the offender, expulsion from the workshop, notification to employers or academic institutions, and authorities. If you are being harassed, notice that someone else is being harassed, or have any other concerns, please contact a member of the Organizing Committee immediately and/or write to cps.chemmat@anl.gov with Subject: BEHAVIOR COMPLAINT, take any evidence that might be useful, and if this is not possible, do not worry and do not hesitate on warning the organizers.