Random Forest Capital: Re-Pricing Each Loan with Machine Learning 2018

Date：2018-06-20 15:19:37

Random Forest Capital was founded in 2016 and is a San Francisco-based cross-platform machine learning and data engineering investment management company. In January 2017, Random Forest Capital secured $ 1.75 million U.S dollars from angel investors.

Background

With the rapid development of online lending in recent years there, many institutional investors have begun to seek investment opportunities, and the demand for high-yield bond investments has increased. The core competitiveness of investment is reflected in underwriting and risk pricing. Yet traditional personal credit rating methods including FICO have been considered unreliable by many lending institutions. With the development of the Internet and smart phones in the last two years, a great deal of new potential credit information has emerged which can be obtained through automated technology. FICO's variables also have imitations and do not take macro factors into account. For example, some people have high credit scores and short credit histories of 2 to 5 years, so they do not have enough time to default. Moreover, some people's credit histories are not coherent enough, but they obtain high credit scores and may default.

These problems of credit information, underwriting and risk pricing are fundamental issues that Random Forest Capital wants to solve. According to Random Forest, the existing underwriting methods are expensive, inefficient and inaccurate, and this makes it difficult to accurately assess credit risk. To tackle this challenge, Random Forest uses cross-platform machine learning algorithms to price bonds, which they believe greatly improves assessment accuracy and efficiency while resolving conflicts of interests between investors and borrowers. With the growing demands for high-yield non-equity investments from investors in the insurance industry, the development of a fair and accurate debt risk pricing system will have great opportunities for growth.

Business Model

Random Forest invests in different products to stock-based quantitative funds. The company focuses on three types of online lending (P2P) products: unsecured consumer debt, secured home debt, and secured commercial debt. Random Forest utilizes machine learning and other algorithms to invest funds in online lending products on different platforms. Currently, Random Forest only accepts institutional investment.

According to the platform’s statistics, the default rate can be reduced by 50% and the average return can be raised by 4% to 6% after the loans are screened by the platform. Packaged loan products also have low volatility and low correlation with the market index. The platform's alpha is not derived from risky loan products, but comes from C-class products in Lending Club and Prosper risk ratings.

Creativity

Innovative data sources: Random Forest gets its data from public data platforms, other platforms and purchased data. Random Forest is creative in the use of unstructured, open data including health conditions, house prices, and average income levels, which can be obtained from public data platforms.

Innovation in data characteristics: The data used by Random Forest includes general data and community data, including health data and crime data. Random Forest gets the user’s default rate by analyzing the different tags around applicants. Due to the distinct classes of streets in American cities, the average user portraits collected based on geographic locations have better predictive effects. For example, the platform found that residents living in some community had a low crime rate, a low cancer diagnosis rate and an average default rate that was 40% lower than that of the surrounding area. Another example is that the default rate of non-home decoration loans for employees working in some company is 50% lower than the average rate.

Below is our interview with Kevin Farrelly, Co-founder and COO of Random Forest Capital:

Q&A

Q: What are the community data variables in risk assessment?

A: The community data variables include community poverty rate, income level, health data, ethnicity data, sex ratio, etc. We can know the distribution of this community and then consider the impact of macroeconomics on it. For example, if the oil price falls, then it will have influence on the workforce whose work is related to oil.

Q: Where does those data come from?

A: It comes from public platforms. As data scientists, we use crawlers to get data. Buying data is important of course, but it can’t become the core competency.

Q: Isn’t health data privacy?

A: Personally, it is privacy, but in the United States, many public platforms can provide statistical data. You can focus on the community (such as HIV rate) and locate it by zip code. Another point is that our role is different. We are investors, not lenders. The lenders want to issue loans and sometimes they can’t price reasonably. The lenders need to follow the fair lending pact, while we operate according to the level of loans and re-pricing.

Q: What kind of personal data do you get?

A: We mainly get consumption data. For example, we can get the gym where you often go to, your license plate number, etc. We use zip code, business, industry and education information to analyze personal facts and characteristics of the community.

Q: Is that right that you don’t need to consider the second error type while modeling, and do you just consider the first error type? (In other words, the company considers it important to avoid misjudging bad users as good users, rather than worrying about misjudging good users as bad users, which reduces the difficulty of modeling).

A: Exactly.

Q: Can’t customers purchase your loans as a package?

A: No, we need to choose from different platforms. In the future, all investments will be data driven and automated. For example, the result of our analysis is that 0.5% of all loans on the Lending Club platform are worth to buy, from which customers can get a 10%-12% return. For the platforms that do not have APIs, we will simulate registration and then grab data after login using crawlers.

Q: Currently, how many institutions have cooperated with Random Forest?

A: Random Forest has cooperated with six online platforms and four offline platforms (with mortgage loans), and the number of limited partners has increased by 10 times in the past six months. Now, Random Forest is in contact with many top agencies and may increase by another 10 times in the future.

Q: Do you think that the FICO credit assessment method will change in the future because people are starting to generate more data?

A: Yes, there are more ways to use data now. In fact, many platforms are already using their own data and algorithms. For example, Zillow and FICO have started to use other types of data. Traditional scoring methods are also problematic, because they only focus on users’ credit history instead of their accounts and assets. For instance, many people will choose to keep their cars instead of houses. The car is a tool for one person to go back to work, so even if their credit score is 500 points, their performance may even be better than those people who have high scores. FICO does not pay attention to the purpose of users’ application and their assets.

Random Forest Capital: Re-Pricing Each Loan with Machine Learning 2018

Nova Credit: Providing Former Credit Reports of Immigrants 2018