Loan origination systems with integrated scoring models help businesses improve underwriting decisions and provide faster customer service. In this post we will show, how lenders can introduce more effective underwriting software by improving accuracy of the predictive models, or scorecards.

Consider every stage of scorecard development

To prevent potential errors and ensure best performance of the scoring models, we need to consider all stages of scorecard development process.

Stages of scorecard development:

  1. Data sampling
  2. Processing, statistical evaluations of data characteristics
  3. Dividing data into training and validation datasets
  4. Scorecard development: performing regression procedure
  5. Scorecard validation

To ensure that the scorecard can help us evaluate risks and build a comprehensive customer profile, we must focus on the stages 1-3. It is sensible to focus on improving scorecard performance before its development is finished.

In order to build most robust loan application processing system we need to focus on improving on correct preparation of the applications data and on the processing of the loan portfolio data.

The most common scorecard development errors.

Errors during the data sampling stage

During data sampling, loan applications data is sourced from the data warehouse.

Pay close attention to maintaining main requirements for the working sample: randomness and representativeness.

The factor of randomness implies that all data from loan applications needs to be added into the working dataset independently.

To follow the rule of representativeness, you need to ensure maximum accuracy of the data characteristics in the dataset related to the actual characteristics of the potential borrowers in the loan applications portfolio. Taking into account the need of scorecard to provide insights on the population specifics, this requirement is totally sensible.

Performance and efficiency of the scoring model will be significantly impacted if requirements for representativeness and randomness are not met.

Working dataset can be randomly formed by sourcing data from the loan applications database according to a certain time frame.

This way, time frame selected for the dataset is working as a decisive factor.Let us illustrate this by showing mechanism of estimating according to the time factor:


During Performance Period we are sourcing information on the loan performance of a certain borrower.

The Observation point is the representing the moment of the forecast. Loan performance is evaluated and determined at the Outcome point.

For instance: if the observation period lasts for 12 months and the time horizon equals to 6 months, then we can estimate the state of the borrower in 6 months by analyzing performance of tha loan case during 12 months.

Please keep in mind that the observation period cannot be interrupted, and the historical data used has to be close to the data in the loan portfolio.

Interrupting the observation period will have negative impact on the scorecard performance.

Data samples obtained during at least last three years is required for achieving best forecasting results.

To prevent data sampling errors, take twofold approach. First, you should take under direct control the procedures of sourcing applications’ data.   Secondly, make sure to evaluate statistically characteristics of the potential borrowers. 

Continue reading on improving scorecard accuracy in loan origination software…