The Problem
Financial institutions face immense risk when distributing loans. An inaccurate assessment of a customer's creditworthiness leads to direct capital loss through defaults. Conversely, over-strict policies reject safe borrowers, losing potential interest revenue.
Aprovify seeks to bridge this gap using Machine Learning to identify the subtle, non-linear relationships in a user's financial profile that traditional scoring mechanisms miss.
Business Objective
- Minimize False Positives(Approving a risky borrower).
- Maximize True Positives (Approving safe borrowers).
- Automate and speed-up the initial screening process.
The Machine Learning Approach
1. Data Preprocessing & Leakage Prevention
A strict Scikit-Learn `Pipeline` coupled with `ColumnTransformer` is used. Numerical features are scaled via `StandardScaler`, and categorical via `OrdinalEncoder`. By encapsulating these steps, we ensure that parameters (like mean/variance) are derived strictly from the training data, eliminating data leakage.
2. Modeling Strategy
We benchmarked Logistic Regression, KNN, Naive Bayes, Decision Trees, and SVM against a Random Forest Classifier. The Random Forest was chosen because it mitigates overfitting via bagging and handles multicollinearity robustly. It was validated using Stratified 5-Fold Cross-Validation to maintain class balance.
3. Feature Importance
The model identified metrics like CIBIL Score, Loan Term, and Loan Amount as the most critical determinants of default probability.