Data Science & Anlaytics

Description

This is where I delve into the field of data science, modelling and data analytics. Here I analyse data using Machine Learning, Statistics and other analytical techniques in order to extract insights from the data or answer questions about data or the world.

Vertical Divider

Coming Soon!

Vertical Divider

Coming Soon!

StackerPy - Model Stacking for
Scikit-Learn Models (including blending)

Here I have developed an API that enable people to stack models to build and ensemble of models that add meta-features to your main data to then use with a final stacking model. This implementation allows you to dictate which columns each model should look at as you can optimise individual models and see that different combinations of features will work better/worse with different models. This implementation also give you the ability to blend. Blending is a technique that is used to overcome overfitting by Kfold production of the meta-features and aggregating each models result during the prediction phase.

Vertical Divider

Keep It Plain And Simplex: Linear Programming for the PL Fantasy Football

Having been invited by many people in many social circles to participate in a Fantasy Football League, I decided to take 30 minutes out of my day to approach it like I would any problem... with the data! In this post, I detail how I employed the simplex method, a linear optimisation technique, that looks to optimise an objective function with respect to a number of constraints in a linear manner.

Vertical Divider

Optimisation of Feature Selection in Machine Learning using Genetic Algorithm

In the world of data science, there are thousands of variables that you can choose to build models and there are techniques which you can apply to find out which are the best features. I came up with this idea when I had 2000 variables to choose from in order to build a regression model. With 2^2000 potential combinations, I thought of using an optimisation technique to pick the best features.

Kaggle - Titanic: Machine Learning From Disaster

This is an infamous challenge hosted by Kaggle designed to acquaint people to competitions on their platform and how to compete. In this challenge, they ask you to complete the analysis of what sorts of people were likely to survive. In particular, they ask you to apply the tools of machine learning to predict which passengers survived the tragedy. In this post, I show how I visualise the data to complete the analysis and how I predict who survived.

Vertical Divider

DrivenData - Pump It Up: Data Mining The Water Table

Using data from Taarifa and the Tanzanian Ministry of Water, I predict which pumps are functional, which need some repairs, and which don't work at all. This is an intermediate-level practice competition on DrivenData. I predict one of these three classes based on a number of variables about what kind of pump is operating, when it was installed, and how it is managed using Random Forest.

Vertical Divider

DrivenData - Predicting Blood Donations (Predictive Modelling)

DrivenData find real world questions where data science can have positive social impact, then run online modelling competitions for data scientists to develop the best models to solve them. It shows we, as data scientists, can change the world with data and modelling for a good cause. Here I start small with DrivenData’s starter dataset, predicting blood donations.

Hackerrank - Email Prediction Competition (Ensemble-Modelling)

In this challenge, we are presented with a dataset that details information regarding users of the hackerrank site and emails sent to these users. The objective of the challenge is to predict whether or not a user will open the email or not. We are given a training set that we are to use to build the model and a test set against which we will predict in order to produce our submission.

Vertical Divider

Multi-class Logistic Classification and KMeans Clustering of Iris

Here I analyse the infamous iris dataset using Python. Within my analysis, I look at interesting ways in which the data can be cut and viewed in order to extract insights.

I also look at K-Means Clustering and multi-class logistic regression modelling to classify the data into their respective species. The objective are highlighted in the post.

Vertical Divider

Restaurant Multi-Criteria Decision Analytics

Finding the perfect place to eat in town can be difficult. With a wide range of restaurants and multiple factors influencing your choice, It seems almost impossible to pick the best place for you! With the use of Multi-Criteria Decision Analytics like TOPSIS and VIKOR methods, I make that decision easier and find the perfect place to have a steak in Leeds (personal favourite meal).

Fifa Pro Clubs Promotion - Statistical Probabilites Analysis

Fifa Pro Clubs is my favourite game mode in fifa! My team, MichuToday, continue to struggle to make it to division 1.

We have remained in division 2 and 3 for the majority of our career! With only 8 games remaining in the season, we need 5 wins or rather, 15 points to promote to division 1! What are the chances?! Well, we figure that out here!

​Description

Coming Soon!

Coming Soon!

Description