This project is designed to conduct complete data analysis with the goal of developing a model to classify children based on their immunization status and to identify the most influential features associated with immunization status of children. For the analysis, a dataset derived from raw data collected in Ethiopia in 2022 is used. This data was collected as part of a baseline survey designed to assess childhood immunization coverage and its associated factors among selected host and refugee populations in one of the remote regions (Gambella Region) of Ethiopia. The survey was conducted following the World Health Organization's (WHO's) Vaccination Coverage Cluster Surveys Reference Manual. Accordingly, the data were collected in the sample of 3,200 children aged 12–23 months and their mothers or caretakers. It has a total of 84 variables (Subset of the dataset) that captured information on demographic characteristics, socioeconomic status, health‑care access, and knowledge, attitudes, and practices (KAP) related to childhood immunization are used for the present analysis.
The analysis will be presented in two parts:
- Part 1 involve data cleaning and exploration to understand the nature of the data where descriptive statistics are used to summarize and present the data.
- Part 2 present the application of three machine learning concepts (Logistic regression, Random forest, and Artificial Neural Network) to answer the research questions.