e-space
Manchester Metropolitan University's Research Repository

    Dealing with the big data challenges in AI for thermoelectric materials

    Jia, Xue, Aziz, Aziz ORCID logoORCID: https://orcid.org/0000-0002-6723-9871, Hashimoto, Yusuke and Li, Hao (2024) Dealing with the big data challenges in AI for thermoelectric materials. Science China Materials, 67 (4). pp. 1173-1182. ISSN 2095-8226

    [img] Published Version
    File not available for download.
    Available under License In Copyright.

    Download (2MB)

    Abstract

    The development of artificial intelligence (AI), particularly, data science and machine learning (ML), is revolutionizing the field of material science. Yet, some inevitable key challenges remain, including errors contained in large-scale material datasets and the overfitting of predicted temperature-dependent properties. In this work, using thermoelectric (TE) materials as an archetypal example, we firstly performed a series of rational actions to identify and discard questionable data, and obtained 92,291 data points consisting of 7295 compositions and different temperatures from the Starrydata2 database. Next, we proposed a composition-based cross-validation method to emphasize that the data points with the same compositions but different temperatures should not be split into different sets to avoid overfitting. Then, we built ML models using the gradient boosting decision tree (GBDT) method, and achieved remarkable R2 values of ∼0.89, ∼0.90, and ∼0.89 on the training dataset, test dataset, and new out-of-sample experimental data published in 2023, verifying the model’s high accuracy in predicting newly available materials. Using this ML model, we carried out a large-scale evaluation of the stable materials from the Materials Project database, and Ge2Te5As2 and Ge3(Te3As)2 were predicted to exhibit high zT values. Density functional theory calculations were then executed and the calculated maximum zT values were 1.98 and 2.12 for n- and p-type Ge2Te5As2, and 0.58 and 0.74 for n- and p-type Ge3(Te3As)2, respectively, indicating their potential as TE materials and supporting our ML model. This work presents an example of dealing with and overcoming big data challenges in AI for materials science.

    Impact and Reach

    Statistics

    Activity Overview
    6 month trend
    1Download
    6 month trend
    24Hits

    Additional statistics for this dataset are available via IRStats2.

    Altmetric

    Repository staff only

    Edit record Edit record