Contents:

  • What is Machine Learning?
  • Types of Machine Learning
  • What is regression?
  • Types of regression
  • What is linear regression
  • Linear regression terminology
  • Advantages and disadvantages
  • Example

1. What is Machine Learning?

ML and AI are branches of computer science. As the volume of data increases day by day we can use this to automate some tasks. As a human we learn from gathering information about something and by past experience which in a technical term can be considered as a data.In a similar way we can make machine to learn by themselves by providing data.

ML is a process which gives an  ability to a machine to  think like a human and perform tasks as humans do. The demand of machine learning has increased over a period of time as you can see many machine learning models are implemented in different industries.

ML can be used in sales prediction, stock prediction, email spam detection and in many more areas.

2. Types of Machine Learning?

  • Supervised
  • Unsupervised
  • Reinforcement

3. What is Regression?

The term “Regression” was introduced by Francis Galton. The regression was basically used to find out the average height of the population with some depending factors. This is called “Galton’s law of universal Regression”.

The main goal of regression is to predict the number of dependent variables and the relation between the dependent and Independent variable.We can also use regression in statistical means in housing and investing applications.

A Regression analysis involves a graphing a line over a set of data points that most closely fits the overall shape of the data.

4. Types of Regression?

  • Simple Linear Regression
  • Support Vector Regression
  • Polynomial Regression
  • Decision Tree Regression
  • Random Forest Regression

5. What is Linear Regression?

A Linear Regression is one of simplest algorithms in Machine Learning. It basically shows the relationship between two variables using linear equations.

Linear regression can be used in evaluating trends and making estimates or forecasts.A linear regression can be used to observe the impact of cost changes in consumer product as quantity as dependent variable and price as independent variable.

Linear regression

6. Linear Regression Terminology

Cost function

The cost function provides the best possible values for b0 and b1 to make the best fit line for the data points. We do it by converting this problem into a minimization problem to get the best values for b0 and b1. The error is minimized in this problem between the actual value and the predicted value.

Gradient descent

It is a method of updating b0 and b1 values to reduce the MSE. The idea behind this is to keep iterating the b0 and b1 values until we reduce the MSE to the minimum.

7. Advantages and disadvantages

Advantages

  • One of the easiest ML algorithms to implement and train.
  • It handles over-fitting well using dimensional reduction techniques, regularization and cross validation.
  • One more advantage is the extrapolation behind the specific data set.

Disdvantages

  • The assumption between dependent and independent variable.
  • Linear regression is quite sensitive to outliers.
  • It is prone to multi-collinearity.

8. Example

Spam detection for Youtube comments using Django

  1. Create one django application
  1. Create one app in the django project and create template and static folder inside that app
  1. Create data folder in the root of the project
  1. Download dataset for Youtube comments from internet and place that csv file inside the data folder
  1. In views.py create two functions

This function will display initial web page

def home(request):
	    return render(request, 'home.html')

The below function predict the spam or not a spam

def predict(request):
	df= pd.read_csv(settings.MEDIA_ROOT +'/'+ 'YoutubeSpamMergedData.csv')
	df_data = df[["CONTENT","CLASS"]]
	df_x = df_data['CONTENT']
	df_y = df_data.CLASS
	corpus = df_x
	cv = CountVectorizer()
	X = cv.fit_transform(corpus) # Fit the Data
	from sklearn.model_selection import train_test_split
	X_train, X_test, y_train, y_test = train_test_split(X, df_y, test_size=0.33,            random_state=42)
	#Naive Bayes Classifier
	from sklearn.naive_bayes import MultinomialNB
	clf = MultinomialNB()
	clf.fit(X_train,y_train)
	clf.score(X_test,y_test)
	           if request.method == 'POST':
		comment = request.POST.get('comment','')
		data = [comment]
		vect = cv.transform(data).toarray()
		my_prediction = clf.predict(vect)
		return render(request, 'result.html',{'prediction' : my_prediction})

6. Add some comments in the box and click on predict.

Add comment and click on predict

7. It will display result – “Not spam”.

Result

Leave a Reply