Top 10 Redshift DBA Interview Questions!
Answers to these 10 interview questions should really be known by any qualified DBA candidates who have a decent level of Redshift knowledge and experience.
These questions imho are must-asks!
Q1. What is Redshift?
AWS Redshift is a fast, fully managed relational datastore offered by Amazon Web Services that is primarily used for data warehouse storage and processing. It allows one to run complex analytics type queries such as large and complex aggregations, etc. It allows a user to keep the ETL complexity low and great performance.
Q2. What is Redshift Spectrum?
It is an AWS Redshift offering that allows for analysis of large amounts of data in its native format without requiring you to load the data.
It gives the ability to work on data formats such as textile, csv, tsv etc.
Q3. What makes Redshift fast?
1) Storage – Columnar storage on high-performance local disks, and massively parallel query execution
2) MPP – Queries are distributed and parallelized across multiple physical resources
Q4. True or false, … standard ODBC and JDBC connections can be used against AWS Redshift to query over structured data using SQL-based clients?
Q5. What are some AWS Redshift technologies that can be made use of to protect your Redshift data in transit and at rest?
- Hardware Security Modules (HSM)
- AES-256 encryption (and encrypted volumes for data at rest)
- Virtual Private Cloud (VPC)
Q6. What is a leader node and compute node?
In a leader node the queries from the client are acknowledged and read and then the queries are parsed. execution plan developed, then the steps to process the queries are laid out on the leader node and (after the compute nodes process the result) the resultset is then sent back to the client.
As mentioned above, a compute node executes the steps generated by the leader node. The result is not immediately sent from the compute node to the client but instead it is sent back to the leader node..
Q7. What datasources can you use to load data into Amazon Redshift?
Some datasources which can be used to load data into Redshift are:
- AWS DynamoDB
- INSERT sql to load data via ODBC or JDBC clients
- AWS S3
- AWS Glue
- AWS Data Pipeline
- Amazon Elastic Map Reduce (EMR)
Q8. Why does using columnar storage (as is used in Redshift) for data warehouse make more sense for speed and storage?
The use case that Redshift is used for is primarily analytical queries, which are concerned with aggregates across many rows, hence a columnar structure performs much better.
Using columnar storage, each data block can hold column field values for as many as three times the records as row-based storage.
Q9. Name a few Amazon Redshift CLI/API commands that can be used to manage RS clusters?
Q10. What are the system views to query when troubleshooting slow performance on the queries in Redshift?
- Check queries and their start and run times in stv_recents
- Check table info in svv_table_info
- Check locks in stv_locks view