Description
This is where I delve into the field of data science, modelling and data analytics. Here I analyse data using Machine Learning, Statistics and other analytical techniques in order to extract insights from the data or answer questions about data or the world.
Vertical Divider
Coming Soon!Coming Soon!
|
Vertical Divider
Coming Soon!Coming Soon!
|
StackerPy - Model Stacking for
|
Vertical Divider
Keep It Plain And Simplex: Linear Programming for the PL Fantasy FootballHaving been invited by many people in many social circles to participate in a Fantasy Football League, I decided to take 30 minutes out of my day to approach it like I would any problem... with the data! In this post, I detail how I employed the simplex method, a linear optimisation technique, that looks to optimise an objective function with respect to a number of constraints in a linear manner.
|
Vertical Divider
Optimisation of Feature Selection in Machine Learning using Genetic AlgorithmIn the world of data science, there are thousands of variables that you can choose to build models and there are techniques which you can apply to find out which are the best features. I came up with this idea when I had 2000 variables to choose from in order to build a regression model. With 2^2000 potential combinations, I thought of using an optimisation technique to pick the best features.
|
Kaggle - Titanic: Machine Learning From DisasterThis is an infamous challenge hosted by Kaggle designed to acquaint people to competitions on their platform and how to compete. In this challenge, they ask you to complete the analysis of what sorts of people were likely to survive. In particular, they ask you to apply the tools of machine learning to predict which passengers survived the tragedy. In this post, I show how I visualise the data to complete the analysis and how I predict who survived.
|
Vertical Divider
DrivenData - Pump It Up: Data Mining The Water TableUsing data from Taarifa and the Tanzanian Ministry of Water, I predict which pumps are functional, which need some repairs, and which don't work at all. This is an intermediate-level practice competition on DrivenData. I predict one of these three classes based on a number of variables about what kind of pump is operating, when it was installed, and how it is managed using Random Forest.
|
Vertical Divider
DrivenData - Predicting Blood Donations (Predictive Modelling)DrivenData find real world questions where data science can have positive social impact, then run online modelling competitions for data scientists to develop the best models to solve them. It shows we, as data scientists, can change the world with data and modelling for a good cause. Here I start small with DrivenData’s starter dataset, predicting blood donations.
|
Hackerrank - Email Prediction Competition (Ensemble-Modelling)In this challenge, we are presented with a dataset that details information regarding users of the hackerrank site and emails sent to these users. The objective of the challenge is to predict whether or not a user will open the email or not. We are given a training set that we are to use to build the model and a test set against which we will predict in order to produce our submission.
|
Vertical Divider
Multi-class Logistic Classification and KMeans Clustering of IrisHere I analyse the infamous iris dataset using Python. Within my analysis, I look at interesting ways in which the data can be cut and viewed in order to extract insights.
I also look at K-Means Clustering and multi-class logistic regression modelling to classify the data into their respective species. The objective are highlighted in the post. |
|