Which of the following is not a correct application for the Classification?
A. credit scoring
B. tumor detection
C. image recognition
D. drug discovery
Support vector machines (SVMs) are a set of supervised learning methods used for:
A. Linear classification
B. Non-linear classification
C. Regression
Select the correct statement regarding the naive Bayes classification:
A. it only requires a small amount of training data to estimate the parameters
B. Independent variables can be assumed
C. only the variances of the variables for each class need to be determined
D. for each class entire covariance matrix need to be determined
Which of the following technique can be used to the design of recommender systems?
A. Naive Bayes classifier
B. Power iteration
C. Collaborative filtering
D. 1 and 3
E. 2 and 3
As a data scientist consultant at ABC Corp, you are working on a recommendation engine for the learning resources for end user. So Which recommender system technique benefits most from additional user preference data?
A. Naive Bayes classifier
B. Item-based collaborative filtering
C. Logistic Regression
D. Content-based filtering
Which of the following true with regards to the K-Means clustering algorithm?
A. Labels are not pre-assigned to each objects in the cluster.
B. Labels are pre-assigned to each objects in the cluster.
C. It classify the data based on the labels.
D. It discovers the center of each cluster.
E. It find each objects fall in which particular cluster
What are the advantages of the mutual information over the Pearson correlation for text classification problems?
A. The mutual information has a meaningful test for statistical significance.
B. The mutual information can signal non-linear relationships between the dependent and independent variables.
C. The mutual information is easier to parallelize.
D. The mutual information doesn't assume that the variables are normally distributed.
What is the best way to evaluate the quality of the model found by an unsupervised algorithm like k-means clustering, given metrics for the cost of the clustering (how well it fits the data) and its stability (how similar the clusters are across multiple runs over the same data)?
A. The lowest cost clustering subject to a stability constraint
B. The lowest cost clustering
C. The most stable clustering subject to a minimal cost constraint
D. The most stable clustering
What is the best way to ensure that the k-means algorithm will find a good clustering of a collection of vectors?
A. Only consider values of k larger than log(N), where N is the number of observations in the data set
B. Run at least log(N) iterations of Lloyd's algorithm, where N is the number of observations in the data set
C. Choose the initial centroids so that they all He along different axes
D. Choose the initial centroids so that they are far away from each other
Clustering is a type of unsupervised learning with the following goals
A. Maximize a utility function
B. Find similarities in the training data
C. Not to maximize a utility function
D. 1 and 2
E. 2 and 3