The Problem

Financial institutions face immense risk when distributing loans. An inaccurate assessment of a customer's creditworthiness leads to direct capital loss through defaults. Conversely, over-strict policies reject safe borrowers, losing potential interest revenue.

Aprovify seeks to bridge this gap using Machine Learning to identify the subtle, non-linear relationships in a user's financial profile that traditional scoring mechanisms miss.

Business Objective

Minimize False Positives(Approving a risky borrower).
Maximize True Positives (Approving safe borrowers).
Automate and speed-up the initial screening process.

The Machine Learning Approach

1. Data Preprocessing & Leakage Prevention

A strict Scikit-Learn `Pipeline` coupled with `ColumnTransformer` is used. Numerical features are scaled via `StandardScaler`, and categorical via `OrdinalEncoder`. By encapsulating these steps, we ensure that parameters (like mean/variance) are derived strictly from the training data, eliminating data leakage.

2. Modeling Strategy

We benchmarked Logistic Regression, KNN, Naive Bayes, Decision Trees, and SVM against a Random Forest Classifier. The Random Forest was chosen because it mitigates overfitting via bagging and handles multicollinearity robustly. It was validated using Stratified 5-Fold Cross-Validation to maintain class balance.

3. Feature Importance

The model identified metrics like CIBIL Score, Loan Term, and Loan Amount as the most critical determinants of default probability.