We Offer 100% Job Guarantee Courses (Any Degree / Diploma Candidates / Year GAP / Non-IT / Any Passed Outs). Placement Records
Hire Talent (HR):+91-9707 240 250

Interview Questions

Machine Learning Interview Questions and Answers

Machine Learning Interview Questions and Answers

Machine Learning Interview Questions and Answers

Machine Learning is the application of AI which offers system the capacity to learn and improve their work based on the experiences without being overtly programmed. Machine Learning interview questions and answers by Besant Technologies providing the skillful details to all our students and giving the best to all our students. Our trainees are completely and fully skilled professional those are having lots of years’ experience and they prepared these machine learning interview questions and answers. These machine learning interview questions and answers are a sample only and picked as top best.

Best Machine Learning Interview Questions and Answers

Besant Technologies trained students are having the luxury life by getting placed in top MNC companies and earning lots of huge amount as salary. We have lots of best feed backs for the machine learning interview questions and answers prepared by us and these questions are fully analyzed and prepared by having a tie-up with the top MNC companies. Do pursue in the best Machine learning institute in Chennai by Besant Technologies and get placed and stay happy.

Q1). If a highly positively skewed variable has missing values and we replace them with mean, do we underestimate or overestimate the values?

Since in positively skewed data, mean in greater than median, we overestimate the value of missing observations.

Q2). Why is Harmonic mean used to calculate F1 score and not arithmetic mean?

Because the Harmonic mean gives more weightage to lower values. Thus, we will only get high F1 score if both Precision and Recall are higher.

Q3). Why do we convert categorical variables into factor? Which function is used in R to perform the same?

Most Machine learning algorithms require numbers as input. On converting categorical values to factors we get numerical values and also we don’t have to deal with dummy variables.

We can use both factor() and as.factor() to convert variables to factors.

Q4). Does 100% precision mean that our model predicts all the values correctly?

No. We can get perfect precision in many ways, but it doesn’t mean that our model predicts every value accurately. For ex. if we make one single positive prediction and make sure it is correct, our precision reaches 100%. Generally, precision is used with other metrics (recall) to measure the performance.

Q5). Give a drawback of Gradient descent.?

It does not always converge to the same point as in some cases it reaches a local-minima instead of a global optimal point.

Q6). What does linear in ‘linear regression’ actually mean?

It implies that the dependent variable should be a linear function of parameters. For the same reason, Polynomial regression is classified as linear though it fits a non-linear model between the dependent and independent variable.

Q7). Logistic regression gives probabilities as result then how do we use it to predict a binary outcome?

A logistic model outputs a value between 0 and 1. To convert these probabilities into classes we use decision boundaries. We can set equal or unequal boundaries depending upon the requirement.

Q8). How do we separate one dimensional, two dimensional and three-dimensional data?

One dimensional can be separated using a point, two dimensional using a line and three dimensional can be separated by a hyperplane.

Q9). What is Standardization and Normalisation? Give one advantage of each over the other.

Both are feature scaling techniques.

Standardization is less affected by outliers as compared to Normalisation.

Standardization doesn’t bound values to a specific range which may be a problem for some algorithms where an input is bounded between ranges.

Q10). When should one use Mean absolute error over Root mean square error as a performance measure for regression problems?

When we have many outliers in the data, Mean absolute error is a better choice.

Q11). What type of learning is needed when the system needs to adapt to rapidly changing data?

Online learning. Because in Online learning each learning step is fast and cheap, and the system can be trained by feeding data instances sequentially.

Q12). What are some common unsupervised tasks other than clustering?

Visualization, Dimensionality reduction, association rule learning

Q13). What are the three stages to build any model in Machine learning?

There are 3 stages to build mode in machine learning. Those are

  1. Model Building:- Choose the suitable algorithm for the model and train it according to the requirement of your problem.
  2. Model Testing:- Check the accuracy of the model through the test data
  1. Applying the model:- Make the required changes after testing and apply the final model which we have at the end.
Q14). What is machine learning?

In the world, we had humans and get computers. What is the main difference between humans and computers are humans learn from past experience but for computers need to tell what to do through a set of instructions. But we need to prepare computers to learn from their past experiences as like humans. That we called machine learning. But for computers for their past experience have some name called data. Basically, we no need to fear about machine learning as it is we need to train computer or machine as like you through data.

Q15). How is machine learning used in the movement?

As per my knowledge many people already using machine learning in their everyday life. Let us suppose when you are engaging with the internet you are actually expressing your preferences, likes, dislikes through your search. So all these things picked up by cookies coming on to your computer. From that, we can evaluate the behavior of a user. Basically, that will help us to increase a progress of a user through the internet. Navigation is also one of the examples where we are using machine learning to find a distance between two places through using optimization techniques. I think people going to more engage with machine learning in the near future is health.

Example:- If you see now, Actually watson is being to use for health. It looking at and scans of body data & trying to understand symptoms of cancer. These are the things machine learning used in the movement.

Q16). What are the similarities & difference between machine learning and human learning?

Machine learning and human learning actually quite similar. Machine learning is about an algorithm or computer. Actually engaging with its environment with data and adopting a coding too on the things that it learns. Let us suppose a program fails to make the right predictions & it will balance itself in some sense. In order to make better predictions next time. Now, That is very similar the way human learns. Human is actually engaging with its environment & learning from it. So, Machine learning has an aspect of kind of an evolutionary aspect to it. Which I think quite new to the area of this artificial intelligence.

Q17). What is the difference between A.I. and machine learning, and has A.I. been oversold for decades because of sci-fi?

More People thought of that A.I. that mean artificial intelligence may be than machine learning. Artificial intelligence actually it’s like which we go to see that alan turing aim was to somehow make a machine have the sort of intelligence that human might have. In particularly a program actually to convince you that it is human if you chat it with it. But i think artificial intelligence has evolved since then to make a unique sort of intelligence that machine might have. Machine learning has slightly a  different quality. It is like a more specific part of artificial intelligence. Which is the idea of the program is going to change a world through a coding that makes interactions as same like humans. By the end, the program might not know actually how the program is written. Because it’s been changing as it’s been interacting. So, might be when we look at the program & see the actual program we do not know why necessarily it decided to write these decisions in a particular way. Because there are a lot of connections between artificial intelligence have with machine learning.

Q18). What kind of problems lend themselves to machine learning?

I think machine learning is become such a big deal because of big data. We now had access to so much of data that machine can interact with it. So, I think this would be a problem where machine learning is going to make great progress. Its like big exploitation to the data. So, one of the big challenges is for artificial intelligence is a computer vision. One of the things like humans do their job in an incredible way

Example:- When humans look at a picture and they will interpret that picture very well. For computers, it is very difficult.

Because we are trying to program the thing from bottom to upwards. But now we can expose an algorithm to many pictures as it can learn as its going learn. So, I think the sort of ability for machine actually to view its environment and interpret and read it. Where it can make a lot of progress. Frankly, where are this data machine learning would be successful? For Example recommendations on the internet and navigation like whenever we drive we are giving new information to it and that’s being used and adapt to change the progress to a higher level. Likewise, health filed one of the biggest filed where a machine can study a lot of data that doctors can’t study and cant maintain that much data.

Q19). Why is it important for the royal society to be doing a project about machine learning?

I think this is very important that a royal society to do a project on machine learning to realize themselves to know, how much impact machine learning is going to create in the future. There are some people who even did not heard about what is machine learning till now. That going to be changed in our society in near future. In order to address their potential or in order to address their phase/state where they are right now. When the world is moving so forward with these cutting-edge technologies. I think it’s all about transparency that we need to tell the potential of all the things where we can go when we learn these things in our future. It’s all about looking into the future to make predictions

Q20). What is false positive and false negative in terms of machine learning?

Let see you are performing some task or you conducted some experiment or you conducted some test and whatever the test is associated with you or whatever the output came from your test or task is actually a negative but you actually predicted as a positive. That means you performed some experiment and output is actually negative but you predicted as a positive. So, Those kinds of cases will lie under false positive. In false negative exactly negative of the previous case called false positive. Actually, there are some outputs which are actually positive but you predicted a negative. So those kinds of cases lies under false negative

Q21). How to decide one problem is a machine learning problem or not?

When you are analyzing a problem. If that problem consisting patterns and that pattern we can’t extract from mathematical equations. If you found such kind of problem then we need to use machine learning to extract those pattern by using lots of data. These above key features are helpful to predict whether the problem is a machine learning problem or not.

Example:– We need to find whether the number is even or odd. This example seems very simple. Yes, this problem is very simple because we know the logic to find whether a number is odd or even and we also know about the mathematics behind this problem. Let us suppose the number is divided by 2. Then the remainder is 1 then we call that number is odd. Whether the remainder is 0 then we call that number as an even. So, this problem has some pattern but we can solve through mathematical equations. do not need a lot of data also. So, definitely, this is not a machine learning problem. Till now this is not a machine learning problem but if you want to make it as a machine learning problem. We can do one thing. We can feed lots of data as an individual number by telling the number is odd the number is even. The machine will classify whether the number is even or odd. But as we know logic and mathematics behind this problem this problem can’t come under as a machine learning problem.

Example1:- Let’s say you have a lot of photos or images. We need to find whether particular photo contains human face or not. Here there is a pattern that we need to find human across all the photos. Can we solve this problem through mathematical equations? So, it’s very difficult. SImply we can take this lot of data and we can feed this data to the algorithm as a training data. Which means training the machine using this data. After this training, we will get some mathematical equations based on the patterns that we got from training. But humans can’t write this logic as their own. So Definitely, this is a machine learning problem. Here machine will automatically form a rule based on training data. That rule is nothing but to detect whether the photo contains human face or not.

Q22). How do we know which machine learning algorithm is better for us to solve our problem?

If we are concerning about accuracy then one can test with different algorithms and cross-validate them to know whether you are getting good accuracy or not. Let us suppose When your problem having some small training dataset we need to use models which having low variance and high bias. Or else When your problem having large training dataset we need to use models which having high variance and low bias. If we follow these things we will easily get o know which algorithm is better to solve your machine learning algorithm.

Q23). How is ML different from artificial intelligence?

AI is a way to make computer to think. Whereas ML is an application of AI which provides ability for computer to learn from experiences.

Q24). Differentiate between statistics and ML?

Stats are used to find the relationship between relevant data, but ML depend on data without any statistical influence. Stats derive to a conclusion on the basis of evidence and reasoning of data, whereas ML is used to optimize the data.

Q25). What are neural networks?

Neural network models are used to process the data, these are derived functions based on biological neurons, which are found in human brains. As ML duty to find the patterns from the data, this neural network models helps to find the patterns from complex data.

Q26). What is Normal Distribution?

When we have distributed the data in a graph, if the shape looks like ‘bell curve’ with a mean value at center, then this will be called as Normal Distribution. This is widely used distribution in statistics.

Q27). What is Standard Normal Distribution?

This is same as the normal distribution, but with a average of ‘0’ and SD is equal to ‘1.

Q28). Reasons for using Regression?

Regression is powerful. It’s versatile because it can be used for all kinds of data, including non-linear relationships. In fact, regression can be really thought of as the first crossover hit from machine learning that has gained wide acceptance in everyday life.

Q29). What is Residual?

The residuals of a regression are the difference between the actual and the fitted values of the dependent variable. If the regression was a perfect fit, the residuals would all be equal to 0.

Q30). Give some examples of Machine Learning Scenarios?

Detecting credit card fraud : Suppose you have some number of credit card customers who are supplying their credit cards to some payment application. The challenge is to work out which of those transactions the application should reject, because they’re likely fraudulent.

Predicting customer churn : For each caller, you want the call center staff to be able to figure out how likely that customer is to churn, that is, switch to a competitor.

Predict imminent failure : Suppose we’ve got a bunch of devices, robots, thermostats, whatever, that generate lots of streaming data that’s being handled by some kind of real time data processing software. That software is looking for anomalies or patterns that predict imminent failure.

Q31). What is Categorical Data?

Categorical variables which take on discrete values may need special treatment and preprocessing before you can feed them into a machine learning module. This is because machine learning modules can only accept numeric data.

Q32). What are Collaborative Filtering Techniques?

Nearest Neighbors Model – Use the ratings of “most similar” users.

Latent Factor Analysis – Solve for underlying factors that drive the ratings.

Q33). What is Logistic Regression?

Logistic Regression helps estimate how probabilities of categorical variables are influenced by causes.

Q34). What is the difference between Rule-based or ML-based?

ML-based:

  • Dynamic
  • Experts optional
  • Corpus required
  • Training step
  • Rule-based:

Static

  • Experts required
  • Corpus optional
  • No training step
Q35). Suppose you have $100, which you can invest with a 10% return each year. Write code to calculate how much money you end up with after 7 years.
Note: 1) Complete the code in single line 2) Use some functional knowledge to solve it (which makes you smart)

print(100*(1.1 ** 7))

Q36). Create a list ‘a_list’ , with the following elements 1, ‘hello’, array(1,2,3 ) and True.

a_list=[1, ‘hello’, [1,2,3 ] , True]

Q37). Find the value stored at index 1 of ‘a_list’. a_list=array(1, ‘hello’, array(1,2,3), True)

a_list[1]

Q38). Concatenate the following lists A=array(1,’a’) abd B=array(2,1,’d’):

A=[1,’a’]

B=[2,1,’d’]

A+B

Q39). What is the value of y ? y = (3 + 2) * 2?

Ans : 10

Q40). What is the value of the variable ‘A’after the following code is executed?

A=1

“1”

Q41). Find the value of variable ‘C’ after the following code is executed?

A=”1″

B=”2″

C=A+B

“12”

Q42). He Albums ‘Back in Black’, ‘The Bodyguard’ and ‘Thriller’ have the following music recording sales in millions 50, 50 and 65 respectively:
Create a dictionary “album_sales_dict” where the keys are the album name and the sales in millions are the values.

album_sales_dict= { “The Bodyguard”:50, “Back in Black”:50,”Thriller”:65}

Q43). Find the length of the tuple, ‘genres_tuple’:
Gt=(‘pop’, ‘rock’, ‘soul’, ‘hard rock’, ‘soft rock’, ’R&B’, ‘progressive rock’, ‘disco’)

len(genres_tuple)

Q44). Generate a sorted List from the Tuple C_tuple=(-5,1,-3):

C_tuple = (-5,1,-3)

C_list = sorted(C_tuple)

C_list

Q45). Write an if statement to determine if an album had a rating greater than 8. Test it using the rating for the album’Back in Black’that had a rating of 8.5. the statement is true print’Amazing !'

if rating>8:

print “Amazing !”

Q46). Write an if-else statement that performs the following. If the rating is larger then eight print “this album is amazing”. If the rating is less than or equal to 8 print “this album is ok”.

rating = 8.5

if rating > 8:

print “this album is amazing”

else:

print “this album is ok

Q47). Write an if statement to determine if an album came out before 1980 or in the years: 1991 or 1993. If the condition is true print out the year the album came out.

album_year = 1979

if album_year < 1980 or album_year == 1991 or album_year == 1993:

print (“this album came out already”)

Q48). Write a for loop the prints out all the element between -5 and 5 using the range function.

for i in range(-5,6):

print(i)

Q49). Print the elements of the following list:
Genres=array(‘rock’, ‘R&B’, ‘Soundtrack’ ‘R&B’, ‘soul’, ‘pop’) Make sure you follow Python conventions.

Genres=[ ‘rock’, ‘R&B’, ‘Soundtrack’ ‘R&B’, ‘soul’, ‘pop’]

for Genre in Genres:

print(Genre)

Q50). Write a while loop to display the values of the Rating of an album playlist stored in the list “PlayListRatings”. If the score is less than 6, exit the loop. The list “PlayListRatings” is given by:
PlayListRatings = array(10,9.5,10, 8,7.5, 5,10, 10):

PlayListRatings = [10,9.5,10, 8,7.5, 5,10, 10]

i=0;

Rating=100

while(Rating>6):

Rating=PlayListRatings[i]

i=i+1

print(Rating)

Q51). Write a while loop to copy the strings ‘orange’ of the list ‘squares’ to the list ‘new_squares’. Stop and exit the loop if the value on the list is not ‘orange’:
squares=array(‘orange’,’orange’,’purple’,’blue ‘,’orange’)
new_squares=array(); squares=[‘orange’,’orange’,’purple’,’blue ‘,’orange’] new_squares=[];
i=0
while(squares[i]==’orange’):
new_squares.append(squares[i])
i=i+1
Q52). How to address overfitting?

1.Reduce number of features.

  • Manually select which features to keep.
  • Model selection algorithm .

2.Regularization.

  • Keep all the options, however cut back magnitude/values of parameters .
  • Works well once we have heaps of options, every of that contributes a touch to predicting .
Q53). Use a stride value of 2 to print out every second character of the string’E’:

print(E[::2])

Q54). How NLP works against AI?

NLP:

– Natural Language Processing shortly called as NLP.

– It’s nothing but an processing and prehaps based understanding.

Q55). Multiply the numpy array y with -2:
y=np.array(1,2)  -2*y

Q56). Consider the list array(1,2,3,4,5) and array(1,0,1,0,1), and cast both lists to a numpy array then multiply them together:

a=np.array([1,2,3,4,5])

b=np.array([1,0,1,0,1])

a*b

Q57). Convert the list array(1,0) and array(0,1) to numpy arrays ‘a’ and ‘b’. Then, plot the arrays as vectors using the function Plotvec2 and find the dot product:

a=np.array([1,0])

b=np.array([0,1])

Plotvec2(a,b)

print(“the dot product is”,np.dot(a,b) )

Q58). Convert the following list to a set array(‘rap’,’house’,’electronic music’, ‘rap’):

set([‘rap’,’house’,’electronic music’,’rap’])

Q59). Consider the list A=array(1,2,2,1) and set B=set array(1,2,2,1), does sum(A)=sum(B)

A=[1,2,2,1]

B=set([1,2,2,1])

print(“the sum of A is:”,sum(A))

print(“the sum of B is:”,sum(B))

Q60). Create a new set ‘album_set3’ that is the union of ‘album_set1’ and ‘album_set2’:
album_set1 = set(‘Thriller’,’AC/DC’, ‘Back in Black’)
album_set2 = set( ‘AC/DC’,’Back in Black’, ‘The Dark Side of the Moon’ ) album_set3=album_set1.union(album_set2)
album_set3

Q61). In the dictionary ‘soundtrack_dict’ what are the keys ?
soundtrack_dic = [ ‘The Bodyguard’:’1992′, ‘Saturday Night Fever’:’1977′] The values are “1992” and “1977”
Q62). Consider the variable ‘D’: use slicing to print out the first three elements:
D=’ABCDEFG’ print(D[:3]) or print(D[0:3])

Q63). Content Based Filtering vs Latent Factor Analysis? In Content Based Filtering :
Factors are identified by experts and Factors are product attributes
In Latent Factor Analysis :
Factors are derived using machine learning techniques and Factors may be related to product attributes or may be abstract.

Q64). Python 3.x vs. Python 2.x

Python 3 is cleaner and much faster than its predecessor, and is definitely the future.

However, some packages have still not moved to Python 3, so Python 2 offers stable third-party packages, and because it’s been there in the arena for a long time it has better community support also.

Some special features of Python 3 have backward compatibility with Python 2, so you can use Python 2 and still get those features.

Q65). What is Jupyter Notebook?

Jupyter Notebook has become a go-to tool and environment for most of the data scientists these days.

Jupyter Notebook, commonly known as IPython Notebook, has become an integral part of data science projects due to its ability to combine code blocks along with human-friendly text that can be formatted using markdowns.

We can also view the images and videos right in your notebook.

We can work on it using your favorite web browsers, and

Not only supports Python code in it, you can also run codes in other languages as well, such as R or Julia or Scala, right in the single notebook.

Q66). What are the different ways of Data extractions?
  • Data from Databases
  • Data Through APIs
  • Data Using Web Scraping
  • Data from files like csv/xls/notepad

and so on.

Q67). What is Centrality Measure?

Centrality measure provides you a number that you can use to represent the entire set of values for a certain feature. This number will be central to the data, and that’s why we call it central tendency.

Q68). What is Data Munging?

Involves activities such as looking into potential issues in the data and solving them using appropriate techniques.

Q69). How to use Cross-validation?

In the cross-validation setup, instead of two parts, we split the data into three parts, training, test, and cross-validation. Now, you pass the training set to the model and apply the training process to get the trained model, and then we evaluate the performance of the trained model on a cross-validation dataset.

Q70). What is Model Persistence?

Model persistence is a technique where you take your trained model and write or persist it to the disk. And once you have your model saved on the disk, you can use it whenever you want.

Q71). Create histogram of life_exp data
life_exp=array(43.828, 76.423, 72.301, 42.731, 75.32, 81.235, 79.829, 75.635, 64.062, 79.441, 56.728, 65.554, 74.852, 50.728, 72.39, 73.005, 52.295, 49.58, 59.723, 50.43, 80.653, 44.74100000000001, 50.651, 78.553, 72.961, 72.889, 65.152, 46.462, 55.322, 78.782, 48.328, 75.748, 78.273, 76.486, 78.332, 54.791, 72.235, 74.994, 71.33800000000002, 71.878, 51.57899999999999, 58.04, 52.947, 79.313, 80.657, 56.735, 59.448, 79.406, 60.022, 79.483, 70.259, 56.007, 46.38800000000001, 60.916, 70.19800000000001, 82.208, 73.33800000000002, 81.757, 64.69800000000001, 70.65, 70.964, 59.545, 78.885, 80.745, 80.546, 72.567, 82.603, 72.535, 54.11, 67.297, 78.623, 77.58800000000002, 71.993, 42.592, 45.678, 73.952, 59.44300000000001, 48.303, 74.241, 54.467, 64.164, 72.801, 76.195, 66.803, 74.543, 71.164, 42.082, 62.069, 52.90600000000001, 63.785, 79.762, 80.204, 72.899, 56.867, 46.859, 80.196, 75.64, 65.483, 75.53699999999998, 71.752, 71.421, 71.688, 75.563, 78.098, 78.74600000000002, 76.442, 72.476, 46.242, 65.528, 72.777, 63.062, 74.002, 42.56800000000001, 79.972, 74.663, 77.926, 48.159, 49.339, 80.941, 72.396, 58.556, 39.613, 80.884, 81.70100000000002, 74.143, 78.4, 52.517, 70.616, 58.42, 69.819, 73.923, 71.777, 51.542, 79.425, 78.242, 76.384, 73.747, 74.249, 73.422, 62.698, 42.38399999999999, 43.487)

import matplotlib.pyplot as plt

plt.hist(life_exp,bins=10)

# Display histogram

plt.show()

Q72). Iterate over europe – Write a for loop that goes through each key:value pair of europe. On each iteration,

europe = {‘spain’:’madrid’, ‘france’:’paris’, ‘germany’:’berlin’,

‘norway’:’oslo’, ‘italy’:’rome’, ‘poland’:’warsaw’, ‘austria’:’vienna’ }

for e in europe.keys():

print(“the capital of “+e+” is “+europe[e])

Besant Technologies WhatsApp