This is the third installment in a series of blog posts in which I discuss data science related topics with an overly familiar stranger. Here are parts 1 and 2.
Ayo my man, who or what is a random forest?
Well it’s a—
Is it a gang? Are people after you? If people are after you, just lemme know cause I know people too, and those people can help you with the people who are after you.
What?
Are. People. After. You?
Why would people be after me?
You tell me. I’m not the one saying random forest, random forest, random forest in my sleep.
I did no such thing.
Do I look like I make up stories to you?
Well…
Well, is it a gang or is this some sorta MK Ultra thing? Are people doing some kinda science experiment on you against your will? Cause if they are, I know people who can take care of those people for you.
Wait, how did you get in here?
What do you mean?
I mean, how did you get in here? You don't have a key and I changed my locks two days ago.
That don’t matter. I had to use the bathroom.
But your bathroom is bigger than mine.
And how am I going to keep it clean if I use it all the time?
And how am I going to keep my bathroom clean if you’re using it all the time?
That sounds like a you problem, not a me problem, to me.
...
So you gonna tell me or...?
Sigh.
Less sighing and more talking my man.
Okay, you know those people you keep telling me can take care of other people for me?
Yeah
Do you ever ask them for their opinions when you’re making a decision because you’re too close to something to see it clearly?
Sometimes.
Well, in machine learning we have a technique that does that too. It’s called ensemble learning. We use it to help safeguard against a model learning the wrong thing or mistaking noise for signal.
Like when Tommy T tells me anything and then I have to get 10 people to confirm itm because Tommy T always be telling stories and half them stories ain't true.
Right.
Anyway, one type of ensemble method is bagging. Bagging is basically training a bunch of models in parallel to each other, but each is trained on a random subset of data.
This is interesting and all, but what does this have to do with you mumbling “random forest, random forest, random forest” all night in your sleep?
Random forest is an ensemble method that uses bagging and decision trees as their individual models in order to classify things.
Decision trees? Those flowchart looking things?
Yes!
So you pick how many subsets you want to make from your training data and train a decision tree on each one.
Then each tree makes a prediction based on its data and uses that prediction to cast a vote.
Once you tally all the votes, Random Forest makes its final prediction based on the class with the most votes.
Oh, aight. So what I’m hearing you say here is you don’t have anyone after you and you’re not part of some weird government experiment.
That’s all you got from that?
Nah, I got other stuff too.
I’m just saying, since you don’t got people coming after you or anything like that, you probably wanna go to the grocery store, because I used your last roll of toilet paper.