A Data-Mining Model for Predicting Low Birth Weight with a High AUC

Chapter in book

Authors/Editors

Selvaraj, Rajalakshmi

Research Areas

No matching items found.

Publication Details

Author list: Hange U, Selvaraj R, Galani M, Letsholo K

Publisher: Springer Verlag (Germany)

Place: BERLIN

Publication year: 2018

Journal: Studies in Computational Intelligence (1860-949X)

Journal acronym: STUD COMPUT INTELL

Volume number: 719

Start page: 109

End page: 121

Number of pages: 13

ISBN: 978-3-319-60169-4

eISBN: 978-3-319-60170-0

ISSN: 1860-949X

Languages: English-Great Britain (EN-GB)

View in Web of Science | View on publisher site | View citing articles in Web of Science

Abstract

Birth weight is a significant determinant of a newborn's probability of survival. Data-mining models are receiving considerable attention for identifying low birth weight risk factors. However, prediction of actual birth weight values based on the identified risk factors, which can play a significant role in the identification of mothers at the risk of delivering low birth weight infants, remains unsolved. This paper presents a study of data-mining models that predict the actual birth weight, with particular emphasis on achieving a higher area under the receiver operating characteristic (AUC). The prediction is based on birth data from the North Carolina State Center for Health Statistics of 2006. The steps followed to extract meaningful patterns from the data were data selection, handling missing values, handling imbalanced data, model building, feature selection, and model evaluation. Decision trees were used for classifying birth weight and tested on the actual imbalanced dataset and the balanced dataset using synthetic minority oversampling technique (SMOTE). The results highlighted that models built with balanced datasets using the SMOTE algorithm produce a relatively higher AUC compared to models built with imbalanced datasets. The J48 model built with balanced data outperformed REPTree and Random tree with an AUC of 90.3%, and thus it was selected as the best model. In conclusion, the feasibility of using J48 in birth weight prediction would offer the possibility to reduce obstetric-related complications and thus improving the overall obstetric health care.

Keywords

Birth weight, Data-mining, Imbalanced dataset, Low birth weight, SMOTE

Documents