A company's marketing team has asked for help in identifying a high performing long-term storage service for their data based on the following requirements:
The data size is approximately 32 TB uncompressed.
There is a low volume of single-row inserts each day.
There is a high volume of aggregation queries each day.
Multiple complex joins are performed.
The queries typically involve a small subset of the columns in a table.
Which storage service will provide the MOST performant solution?
A. Amazon Aurora MySQL
B. Amazon Redshift
C. Amazon Neptune
D. Amazon Elasticsearch
A smart home automation company must efficiently ingest and process messages from various connected devices and sensors. The majority of these messages are comprised of a large number of small files. These messages are ingested using Amazon Kinesis Data Streams and sent to Amazon S3 using a Kinesis data stream consumer application. The Amazon S3 message data is then passed through a processing pipeline built on Amazon EMR running scheduled PySpark jobs.
The data platform team manages data processing and is concerned about the efficiency and cost of downstream data processing. They want to continue to use PySpark.
Which solution improves the efficiency of the data processing jobs and is well architected?
A. Send the sensor and devices data directly to a Kinesis Data Firehose delivery stream to send the data to Amazon S3 with Apache Parquet record format conversion enabled. Use Amazon EMR running PySpark to process the data in Amazon S3.
B. Set up an AWS Lambda function with a Python runtime environment. Process individual Kinesis data stream messages from the connected devices and sensors using Lambda.
C. Launch an Amazon Redshift cluster. Copy the collected data from Amazon S3 to Amazon Redshift and move the data processing jobs from Amazon EMR to Amazon Redshift.
D. Set up AWS Glue Python jobs to merge the small data files in Amazon S3 into larger files and transform them to Apache Parquet format. Migrate the downstream PySpark jobs from Amazon EMR to AWS Glue.
A company analyzes historical data and needs to query data that is stored in Amazon S3. New data is generated daily as .csv files that are stored in Amazon S3. The company's analysts are using Amazon Athena to perform SQL queries against a recent subset of the overall data.
The amount of data that is ingested into Amazon S3 has increased substantially over time, and the query latency also has increased.
Which solutions could the company implement to improve query performance? (Choose two.)
A. Use MySQL Workbench on an Amazon EC2 instance, and connect to Athena by using a JDBC or ODBC connector. Run the query from MySQL Workbench instead of Athena directly.
B. Use Athena to extract the data and store it in Apache Parquet format on a daily basis. Query the extracted data.
C. Run a daily AWS Glue ETL job to convert the data files to Apache Parquet and to partition the converted files. Create a periodic AWS Glue crawler to automatically crawl the partitioned data on a daily basis.
D. Run a daily AWS Glue ETL job to compress the data files by using the .gzip format. Query the compressed data.
E. Run a daily AWS Glue ETL job to compress the data files by using the .lzo format. Query the compressed data.
A social media company is using business intelligence tools to analyze its data for forecasting. The company is using Apache Kafka to ingest the low-velocity data in near-real time. The company wants to build dynamic dashboards with machine learning (ML) insights to forecast key business trends. The dashboards must provide hourly updates from data in Amazon S3. Various teams at the company want to view the dashboards by using Amazon QuickSight with ML insights. The solution also must correct the scalability problems that the company experiences when it uses its current architecture to ingest data.
Which solution will MOST cost-effectively meet these requirements?
A. Replace Kafka with Amazon Managed Streaming for Apache Kafka. Ingest the data by using AWS Lambda, and store the data in Amazon S3. Use QuickSight Standard edition to refresh the data in SPICE from Amazon S3 hourly and create a dynamic dashboard with forecasting and ML insights.
B. Replace Kafka with an Amazon Kinesis data stream. Use an Amazon Kinesis Data Firehose delivery stream to consume the data and store the data in Amazon S3. Use QuickSight Enterprise edition to refresh the data in SPICE from Amazon S3 hourly and create a dynamic dashboard with forecasting and ML insights.
C. Configure the Kafka-Kinesis-Connector to publish the data to an Amazon Kinesis Data Firehose delivery stream that is configured to store the data in Amazon S3. Use QuickSight Enterprise edition to refresh the data in SPICE from Amazon S3 hourly and create a dynamic dashboard with forecasting and ML insights.
D. Configure the Kafka-Kinesis-Connector to publish the data to an Amazon Kinesis Data Firehose delivery stream that is configured to store the data in Amazon S3. Configure an AWS Glue crawler to crawl the data. Use an Amazon Athena data source with QuickSight Standard edition to refresh the data in SPICE hourly and create a dynamic dashboard with forecasting and ML insights.
A data analytics specialist is creating a solution that uses AWS Glue ETL jobs to process .csv and .json files as they arrive in Amazon S3. The data analytics specialist has created separate AWS Glue ETL jobs for processing each file type. The data analytics specialist also has set up an event notification on the S3 bucket for all new object create events. The event invokes an AWS Lambda function to call the appropriate AWS Glue ETL job to run.
The daily number of files is consistent. The files arrive continuously and take 5-10 minutes to process. The data analytics specialist has set up the appropriate permission for the Lambda function and the AWS Glue ETL job to run, but the solution fails in quality testing with the following error:
ConcurrentRunsExceededException
All the files are valid and are in the expected format for processing.
Which set of actions will resolve the error?
A. Create two separate S3 buckets for each file type. Create two separate Lambda functions for the file types and for calls to the corresponding AWS Glue ETL job.
B. Use job bookmarks and turn on continuous logging in each of the AWS Glue ETL job properties.
C. Ensure that the worker type of the AWS Glue ETL job is G.1X or G.2X and that the number of workers is equivalent to the daily number of files to be processed.
D. Increase the maximum number of concurrent runs in the job properties.
A company uses Amazon Redshift for data analysis. The data is not encrypted at rest. A data analytics specialist must implement a solution to encrypt the data at rest.
Which solution will meet this requirement with the LEAST operational overhead?
A. Use the ALTER TABLE command with the ENCODE option to update existing private information columns in the Amazon Redshift tables to use LZO encoding.
B. Export data from the existing Amazon Redshift cluster to Amazon S3 by using the UNLOAD command with the ENCRYPTED option. Create a new Amazon Redshift cluster with encryption enabled. Load data into the new cluster by using the COPY command.
C. Create a manual snapshot of the existing Amazon Redshift cluster. Restore the snapshot into a new Amazon Redshift cluster with encryption enabled.
D. Modify the existing Amazon Redshift cluster to use AWS Key Management Service (AWS KMS) encryption. Wait for the cluster to finish resizing.
A business intelligence (BI) engineer must create a dashboard to visualize how often certain keywords are used in relation to others in social media posts about a public figure. The BI engineer extracts the keywords from the posts and loads them into an Amazon Redshift table. The table displays the keywords and the count corresponding to each keyword.
The BI engineer needs to display the top keywords with more emphasis on the most frequently used keywords.
Which visual type in Amazon QuickSight meets these requirements?
A. Bar charts
B. Word clouds
C. Circle packing with words
D. Heat maps
A data analytics specialist has a 50 GB data file in .csv format and wants to perform a data transformation task. The data analytics specialist is using the Amazon Athena CREATE TABLE AS SELECT (CTAS) statement to perform the transformation. The resulting output will be used to query the data from Amazon Redshift Spectrum.
Which CTAS statement should the data analytics specialist use to provide the MOST efficient performance?
A. CREATE TABLE new_Table
WITH (
format = 'TEXTFILE',
orc_compression = 'SNAPPY')
AS SELECT *
FROM old_table;
B. CREATE TABLE new_Table
WITH (
format = 'TEXTFILE',
)
AS SELECT *
FROM old_table;
C. CREATE TABLE new_Table
WITH (
format = 'PARQUET',
parquet_compression = 'SNAPPY')
AS SELECT *
FROM old_table;
D. CREATE TABLE new_Table
WITH (
format = JSON,
)
AS SELECT *
FROM old_table;
An online retail company has an application that runs on Amazon EC2 instances launched in a VPC. The company wants to build a solution that allows the security team to collect VPC Flow Logs and analyze network traffic. Which solution MOST cost-effectively meets these requirements?
A. Publish VPC Flow Logs to Amazon CloudWatch Logs and use Amazon Athena for analytics.
B. Publish VPC Flow Logs to Amazon CloudWatch Logs and stream log data to an Amazon OpenSearch Service cluster for analytics.
C. Publish VPC Flow Logs to Amazon S3 in text format and use Amazon Athena for analytics.
D. Publish VPC Flow Logs to Amazon S3 in Apache Parquet format and use Amazon Athena for analytics.
A large energy company is using Amazon QuickSight to build dashboards and report the historical usage data of its customers. This data is hosted in Amazon Redshift The reports need access to all the fact tables' billions ot records to create aggregation in real time grouping by multiple dimensions.
A data analyst created the dataset in QuickSight by using a SQL query and not SPICE Business users have noted that the response time is not fast enough to meet their needs.
Which action would speed up the response time for the reports with the LEAST implementation effort?
A. Use QuickSight to modify the current dataset to use SPICE
B. Use AWS Glue to create an Apache Spark job that joins the fact table with the dimensions. Load the data into a new table
C. Use Amazon Redshift to create a materialized view that joins the fact table with the dimensions D. Use Amazon Redshift to create a stored procedure that joins the fact table with the dimensions Load the data into a new table