Leads4pass > Databricks > Databricks Certifications > DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-SCIENTIST > DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-SCIENTIST Online Practice Questions and Answers

DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-SCIENTIST Online Practice Questions and Answers

Questions 4

Which of the following is not a correct application for the Classification?

A. credit scoring

B. tumor detection

C. image recognition

D. drug discovery

Buy Now
Questions 5

Support vector machines (SVMs) are a set of supervised learning methods used for:

A. Linear classification

B. Non-linear classification

C. Regression

Buy Now
Questions 6

Select the correct statement regarding the naive Bayes classification:

A. it only requires a small amount of training data to estimate the parameters

B. Independent variables can be assumed

C. only the variances of the variables for each class need to be determined

D. for each class entire covariance matrix need to be determined

Buy Now
Questions 7

Which of the following technique can be used to the design of recommender systems?

A. Naive Bayes classifier

B. Power iteration

C. Collaborative filtering

D. 1 and 3

E. 2 and 3

Buy Now
Questions 8

As a data scientist consultant at ABC Corp, you are working on a recommendation engine for the learning resources for end user. So Which recommender system technique benefits most from additional user preference data?

A. Naive Bayes classifier

B. Item-based collaborative filtering

C. Logistic Regression

D. Content-based filtering

Buy Now
Questions 9

Which of the following true with regards to the K-Means clustering algorithm?

A. Labels are not pre-assigned to each objects in the cluster.

B. Labels are pre-assigned to each objects in the cluster.

C. It classify the data based on the labels.

D. It discovers the center of each cluster.

E. It find each objects fall in which particular cluster

Buy Now
Questions 10

What are the advantages of the mutual information over the Pearson correlation for text classification problems?

A. The mutual information has a meaningful test for statistical significance.

B. The mutual information can signal non-linear relationships between the dependent and independent variables.

C. The mutual information is easier to parallelize.

D. The mutual information doesn't assume that the variables are normally distributed.

Buy Now
Questions 11

What is the best way to evaluate the quality of the model found by an unsupervised algorithm like k-means clustering, given metrics for the cost of the clustering (how well it fits the data) and its stability (how similar the clusters are across multiple runs over the same data)?

A. The lowest cost clustering subject to a stability constraint

B. The lowest cost clustering

C. The most stable clustering subject to a minimal cost constraint

D. The most stable clustering

Buy Now
Questions 12

What is the best way to ensure that the k-means algorithm will find a good clustering of a collection of vectors?

A. Only consider values of k larger than log(N), where N is the number of observations in the data set

B. Run at least log(N) iterations of Lloyd's algorithm, where N is the number of observations in the data set

C. Choose the initial centroids so that they all He along different axes

D. Choose the initial centroids so that they are far away from each other

Buy Now
Questions 13

Clustering is a type of unsupervised learning with the following goals

A. Maximize a utility function

B. Find similarities in the training data

C. Not to maximize a utility function

D. 1 and 2

E. 2 and 3

Buy Now
Exam Name: Databricks Certified Professional Data Scientist
Last Update: Jun 20, 2026
Questions: 138
10%OFF Coupon Code: SAVE10

PDF (Q&A)

$49.99

VCE

$55.99

PDF + VCE

$65.99