Image for post
Image for post
Photo by Nathan Dumlao on Unsplash


I came across a new and promising Python Library for Time Series — Sktime. It provides a plethora of Time Series Functionalities like Transformations, Forecasting algorithms, the Composition of Forecasters, Model Validation, Pipelining the entire flow, and many more. In this article, we explore some features the library provides, the most important one being how to make a Machine Learning model — a Light GBM fit for time series forecasting.


When it comes to Time Series Forecasting, ARIMA and its variants dominate the domain(simple yet powerful methods). However, having a strong personal liking for Ensemble Tree Models it is always…

Image for post
Image for post
Photo by Leio McLaren (@leiomclaren) on Unsplash


One can find numerous articles today on Explainable AI, some of which can be found here. The most standard guide for Explainable AI will undoubtedly be this book by Christoph Molnar. When I came across the recent paper Pitfalls to Avoid when Interpreting Machine Learning Models, I decided to write some blogs out of it. This is a take on one of the aspects presented in the paper.

This article is focused on the pitfalls we need to avoid while interpreting Partial Dependence Plots(PDPs)/Individual Conditional Expectation(ICE) plots. These are post hoc techniques used to observe how the model takes a…

To statistically test if population proportions of two groups are significantly different.

Image for post
Image for post
Photo by Evgeny Smirnov on Unsplash

We get the data on which we will be testing the hypothesis. The data being considered here is the famous Titanic data-set which can be found on Kaggle.

Importing the libraries:

import numpy as np
import pandas as pd
import scipy.stats.distributions as dist

Parameter of Interest

We read the data and select only two of the relevant columns from the data set. The column ‘Survived’(1 if the individual survived the titanic disaster, else 0) and the variable ‘Sex’(indicates the gender of the individual). We set the parameter of…

Image for post
Image for post
Photo by Alex Perez on Unsplash

Decision trees are prone to over-fitting. Pruning techniques ensure that decision trees tend to generalize better on ‘unseen’ data. A Decision tree can be pruned before or/and after constructing it. However, either one of the pruning methods is sufficient to remove over-fitting. Post Pruning is a more scientific way to prune Decision trees.

In this post, we focus on two things:

  • Understanding the gist of Cost Complexity Pruning which is a type of Post Pruning.
  • It’s implementation using Python.

Post pruning a Decision tree as the name suggests ‘prunes’ the tree after it has fully grown. It removes a sub-tree…

Satya Pattnaik

Data Scientist(By hobby and profession)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store