Image for post
Image for post
Photo by Leio McLaren (@leiomclaren) on Unsplash

Abstract

One can find numerous articles today on Explainable AI, some of which can be found here. The most standard guide for Explainable AI will undoubtedly be this book by Christoph Molnar. When I came across the recent paper Pitfalls to Avoid when Interpreting Machine Learning Models, I decided to write some blogs out of it. This is a take on one of the aspects presented in the paper.

This article is focused on the pitfalls we need to avoid while interpreting Partial Dependence Plots(PDPs)/Individual Conditional Expectation(ICE) plots. These are post hoc techniques used to observe how the model takes a decision by keeping all exogenous variables fixed, except one(also two in case of PDPs) which is regarded as the feature of interest. This variable is allowed to take all the possible values and we observe its marginal effect on the model’s decision making. …


To statistically test if population proportions of two groups are significantly different.

Image for post
Image for post
Photo by Evgeny Smirnov on Unsplash

We get the data on which we will be testing the hypothesis. The data being considered here is the famous Titanic data-set which can be found on Kaggle.

Importing the libraries:

import numpy as np
import pandas as pd
import scipy.stats.distributions as dist

Parameter of Interest

We read the data and select only two of the relevant columns from the data set. The column ‘Survived’(1 if the individual survived the titanic disaster, else 0) and the variable ‘Sex’(indicates the gender of the individual). We set the parameter of interest as the difference in proportions of the individuals who survived the Titanic disaster, based on their gender. …


Image for post
Image for post
Photo by Alex Perez on Unsplash

Decision trees are prone to over-fitting. Pruning techniques ensure that decision trees tend to generalize better on ‘unseen’ data. A Decision tree can be pruned before or/and after constructing it. However, either one of the pruning methods is sufficient to remove over-fitting. Post Pruning is a more scientific way to prune Decision trees.

In this post, we focus on two things:

  • Understanding the gist of Cost Complexity Pruning which is a type of Post Pruning.
  • It’s implementation using Python.

Post pruning a Decision tree as the name suggests ‘prunes’ the tree after it has fully grown. It removes a sub-tree and replaces it with a leaf node, the most frequent class of the sub-tree determines the label of the new leaf. …

About

Satya Pattnaik

Data Scientist(By hobby and profession)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store