Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud.
DataRow helps you make sense of your Amazon Redshift data and analyze it interactively. You can use DataRow to analyze trends and make predictions using your data.
In this tutorial, you use the DataRow Notebooks to visualize data in the Amazon Redshift “Berlin Airbnb Listings” sample table.
In this tutorial you;
- Set up an environment to run DataRow Notebooks.
- Query and visualize Amazon Redshift data using DataRow Notebooks.
Setting Up the Environment
Before you begin;
- Go to Kaggle and download the Berlin Airbnb Sample Data.
- Login DataRow, or create one if you haven’t set up a DataRow account yet.
- Connect to the Amazon Redshift database you want to work on. Add a connection, if you haven’t done it yet.
Setting up the sample data;
- You’ll only use the listings.csv file from the sample database that we have downloaded before. Create a new table using the Create Table Tool.
- After creating your table, upload the file to S3 and transfer all the data to the “listings” table, using DataRow’s COPY Command Wizard. Visit the Help Center for detailed instructions on how to use DataRow COPY Command Wizard to import data into Amazon Redshift.
- You may need to troubleshoot the data load issues after processing the COPY command. To do so, use DataRow’s Amazon Redshift System Tables tool.
Querying and visualizing Amazon Redshift data
In this section, you create a DataRow Notebook used to query and visualize data in Amazon Redshift. You create visualizations using the data in the Berlin Airbnb Sample data.
Querying the Data
- To get the data ready for querying, select the “listings” table that you created before, and double-click it. DataRow will automatically bring the needed SQL code into the editor.
- Click RUN and get the result of the query.
- Click Open Notebooks button and wait for DataRow to perform the action. The query result from Amazon Redshift now appears in your Notebook.
Visualizing the Data
Now, your environment is ready for analysis and visualization. You’ll continue with a sample Python code which shows the average prices of Airbnb listings by room types.
Sample Python Code
import seaborn as sns import matplotlib.pyplot as plt import numpy as np import pandas as pd #Set my query result to data variable data = datasets["dataset1"] #Group by room types room_types = data.groupby(['room_type']) room_types['price'].agg(np.mean).plot(kind='bar') plt.show()
- Click +Add Paragraph just below the default paragraph to open a new one.
- Copy the sample Python code above and paste it to the new paragraph.
- Click the Run button at the top right corner of the paragraph and perform the Python code.
- You’ll get the visual analysis of the average pricing of the Berlin Airbnb listings by room types.