Let’s discuss steps to predict salary using regression here. The regression model used is multi linear regression.

Assume that, a company’s salary for it’s programmers is as shown below

Total, Lead, Manager, Certifications, Salary

1,0,0,0,20000

1.5,0,0,0,23000

2,0,0,0,25000

2,0,0,1,30000

2.5,0,0,0,27000

3,0,0,2,30000

3.5,1,0,0,33000

3.5,1,0,1,35000

3.5,1,0,2,40000

4,1,0,0,35000

4,2,0,0,40000

4,2,0,2,43000

So in the table, there are some independent variables

they are total years of experience, total years of experience as team lead, total years of experience as project manager and number of certifications he has.

And the dependent variable is salary.

So we can try to predict salary of an employee

with 5 years of total experience, 2 years as team lead and one year as project manager. Assume that he took 2 certifications too.

First step is create independent variable matrix

Let’s take all rows and columns expect the last column

X = dataset.iloc[:, :-1].values

Now let’s take the dependent variable vector, it’s the last column (5th column, index starts at zero so last column index is 4)

Y = dataset.iloc[:, 4].values

Now let’s split the data in to training and test set.

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)

Now let’s fit the data to the linear regression

regressor = LinearRegression() regressor.fit(X_train, y_train)

Now we can predict salary of an employee using the regression object as shown below

total = 5

lead = 2

manager = 1

cert = 2

We have created an api.py, which will listen for requests on port 5000

Our below API call return the predicted salary as 48017.16738197425

curl -X POST -d “total=5&lead=2&manager=1&cert=2” http://127.0.0.1:5000

{

“predicted”: 48017.16738197425

}

Now let’s calculate accuracy of our prediction as shown below.

accuracy = (regressor.score(X_test,y_test))

Accuracy calculated is .9321110353200895. That means accuracy is 93%. If the accuracy was 1 (ie 100%), we could say that our predicted salary is perfect.

Sample code can be found from the link https://github.com/abdunnasir/data-science-open/tree/master/multi_linear_regression