Before You Begin,

With any Machine Learning Projects, You SHOULD HAVE have an in-depth understanding of the problem that the data represents, the structure of the dataset and what all machine learning algorithms are best suited to solve the problem.

When studying the dataset, always use a statistical environment so that your focus remains on the questions you are looking to answer about the dataset instead of being distracted from a given technique and learning how to implement it in code.

Iris Flowers Classification

Iris flowers dataset is one of the best datasets in classification literature. The aim is to classify iris flowers among three species (setosa, Versicolor or virginica) from measurements of length and width of sepals and petals. The iris dataset contains 3 classes of 50 instances each, where each class refers to a type of iris plant. The dataset has numeric attributes and beginners need to figure out on how to load and handle data. The iris dataset is small which easily fits into the memory and does not require any special transformations or scaling to begin with. The Dataset is HERE.

Titanic Disaster

The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class. By examining factors such as class, sex, and age, we will experiment with different machine learning algorithms and build a program that can predict whether a given passenger would have survived this disaster. The Dataset is HERE.

Loan Prediction

This data corresponds to a set of financial transactions associated with individuals. The data has been standardized, de-trended, and anonymized. You are provided with over two hundred thousand observations and nearly 800 features. Each observation is independent of the previous. For each observation, it was recorded whether a default was triggered. In case of a default, the loss was measured. This quantity lies between 0 and 100. It has been normalized, considering that the notional of each transaction at inception is 100. For example, a loss of 60 means that only 40 is reimbursed. If the loan did not default, the loss was 0. You are asked to predict the losses for each observation in the test set. The Dataset is HERE.

BigMart Sales Prediction

The data scientists at BigMart have collected 2013 sales data for 1559 products across 10 stores in different cities. The data also includes certain attributes of each product and store. The objective is to build a predictive model and find out the sales of each product at a particular store. Big Mart will use this model to understand the properties of products and stores which play a key role in increasing sales. The Dataset is HERE.