Predicting Household Income Using Cultural Infrastructure and Demographic Features

Understanding and predicting household income distribution is fundamental to urban planning, policy development, and socioeconomic research. Our work presents a novel approach that leverages machine learning techniques to predict household income by incorporating diverse data sources. We combine traditional demographic indicators with innovative cultural metrics to create a comprehensive prediction model.

The significance of this work lies in its potential applications for urban planning, policy development, and understanding socioeconomic disparities. By incorporating cultural infrastructure metrics alongside traditional demographic data, we provide new insights into the relationship between cultural access and economic wellbeing.

Explore Our Methodology

Dataset Preparation Methodology

Our research leverages a comprehensive dataset combining cultural infrastructure, geographic coordinates, and demographic variables to predict household income across New York City. Below we outline our data collection and processing methodology.

Explore the GeoSpatial Data in Details

Geographic Grid Creation

We implemented a sophisticated spatial sampling approach using a 500x500 grid system spanning the geographic bounds of NYC. This high-resolution grid enables detailed spatial analysis and captures local variations in both cultural and demographic characteristics.

Cultural Score Computation

We developed an innovative approach to quantify cultural accessibility using an exponential decay function. This formula incorporates distance-based influence where cultural site weights diminish with distance from grid points.

Demographic Integration

Demographic features were integrated using U.S. Census data, including population density, racial composition statistics, and socioeconomic indicators. Spatial joins matched grid points with census polygons.

Feature Correlation Matrix

Feature correlation matrix highlighting relationships between variables in the dataset.

Final Dataset Description

Research Findings

Our analysis reveals several significant findings regarding the prediction of household income. The Random Forest model emerged as the superior performer, significantly outperforming other approaches. This exceptional performance can be attributed to the model's ability to capture complex, non-linear relationships between features.

0.9127

R² Score

Our Random Forest model achieved exceptional explanatory power, accounting for over 91% of the variance in household income.

$11,566

RMSE

The Root Mean Squared Error indicates high prediction accuracy relative to the income scale.

$7,771

MAE

Mean Absolute Error demonstrates robust prediction capabilities across diverse neighborhoods.

Model Performance Comparison

Model R² Score RMSE MAE Training Time (s) Parameter Count
Linear Regression 0.4083 30,118.41 22,650.40 <0.01 17
Random Forest 0.9127 11,566.72 7,771.45 6.76 4,800
Gradient Boosting 0.6702 22,484.62 16,798.52 <0.01 5,040
XGBoost 0.7517 19,511.74 14,481.06 0.09 10,800
FCNN 0.7611 19,136.01 13,580.19 32.86 3,331
SHAP Summary Plot

SHAP summary plot showing feature importance for the Random Forest model.

Actual vs Predicted Values

Actual vs predicted household incomes for the Random Forest model.

Key Findings from SHAP Analysis

Model Complexity Chart

R² score vs number of parameters for all models evaluated in this study (log scale).

About Us

Our interdisciplinary research team combines expertise in urban planning, machine learning, and data science to understand the complex relationships between cultural infrastructure and socioeconomic factors.

Minhazul Islam

Minhazul Islam

PhD Candidate
Civil and Environmental Engineering
Arizona State University

mislam23@asu.edu

Ripon Saha

Ripon Saha

PhD Student
School of Computing and Augmented Intelligence
Arizona State University

rsaha8@asu.edu

Protik Bose Pranto

Protik Bose Pranto

PhD Student
School of Computing and Augmented Intelligence
Arizona State University

ppranto@asu.edu