This blog post on the decision tree algorithm is part of the blog post series Understanding AI Algorithms. Decision trees are classification algorithms.
A decision tree is a way of classifying information by sorting data based on attributes. We can think of this sorting like a set of decisions a person makes.
For example, let’s say you want to purchase a new car. The dealer can help find you the right model by asking a series of questions. Are you single or married? Do you have kids? What is your age and salary? Do you drive mostly in the city or in the country?
With each question, the dealer helps to classify you into a certain group. If you are single with a high salary living in a city, the dealer might suggest a sporty convertible, but if you have kids and a moderate salary, they might suggest a more affordable minivan.
This type of progression can be expressed in data science as well.
An algorithm can create interpretable rules similar to if/then/else statements by examining the structure of data. In our example, ifyou are single, thenyou need less space in the car. If you have a high salary, thenyou might want to buy a flashier vehicle.
Another example is a bank considering a loan to a client. If the client has the right credit score, then the bank will offer a loan; otherwise (else), the loan will not be approved. By creating a complex decision tree with many layered variables about the client (income, payment history, assets, etc.), the model can be used to predict if they are likely to repay their loan, and with what conditions.
Decision trees divide data into a number of nodes where every decision rule is a branch. At the beginning, all the data are gathered in the root node. The algorithm then scans the data’s attributes to find the one that would give the best partition.
Let’s consider a simplified example of a business problem using the figure below.
The tree is trying to sort customers into the groups ‘responding to newsletters’ (Yes) vs. ‘not responding’ (No). The branch to the right leads directly to an output with the rule: Has the customer purchased an item in the last week? Yes. This means that the customer is directly classified as ‘responding to newsletters.’
On the left branch, we see that the customer did not purchase an item in the last week. It then branches again to give us more information about these people by dividing them into new and existing customers. The decision tree in the figure would tell us to send the newsletter to two types of customers: those who have purchased an item in the last week and those who haven’t purchased an item last week and are new customers.
This type of algorithm chooses the attribute that generates the cleanest nodes when divided, meaning those with the clearest divisions of data (repayment vs non-repayment, or responders vs non-responders).
When there are no more useful branches, meaning the attributes of a group of people are not dividing data any further, the data has reached a leaf node. This represents a classification of a specific type of person, separated by their attributes from the other leaf nodes.
When constructing a decision tree, the aim is to classify and predict the outcome with as few decisions (branches) as possible to reduce complexity and avoid overfitting. This makes the algorithm effective and efficient when just a few input variables contribute the majority of the required information.
Because it is important to use as few branches as possible, it is crucial to choose the right variable to divide on, meaning which aspect of the group will divide the data the most. Let’s say that we have a hundred people, where fifty of them are customers who respond to emails and fifty do not.
We then want to find which variable separates responders and non-responders the most cleanly into different nodes.
Are responders more likely to be elderly? Are non-responders likely to live in the country with lower internet speeds? The attribute that helps us divide the data cleanly will help us understand our customers, and why some people respond to outreach while others do not.
Different measurements can be used in an algorithm to determine what variable to use to divide the data and when examining a tree, the goal is to determine how important certain branches are and how many branches are needed to explain the data.
Decision trees do not need too much training data to become accurate, and it does not take long to collect the necessary data.
Decision trees can handle both categorical (like age groups or gender) and continuous variables (which can take on any value, like the horsepower of a car). They can also work well with outliers, which simplifies the process and thus decreases time and money spent in the modeling phase.
The way decision trees classify data is easy to understand.
If you want to read all the related articles on the topic of AI algorithms, here is the list of all blog posts in this article series: