Are you looking for some Excel tips and tricks to help you be a stronger analyst at work?
Well, you’ve come to the right place.
For me, one of the best classes I took at University was an Excel class. It was extremely beneficial, and I learned a lot. I was lucky to take an Excel class because I had friends who took up internships or jobs after graduation with no Excel skills. I even remember one of my fellow interns had no idea how to use Excel, and she would always ask me Excel questions!
Having worked for some big firms worldwide, I can guarantee you strong Excel skills are a must for data analysis. Most of the financial world runs on Excel. So, having these skills will ensure you will be technically competent in your career.
Let’s get started.
This one tip probably helped my data analysis be much more efficient. It gave me a massive boost in efficiency and confidence in what I was doing.
For most of you working a 9-5, it’s challenging to stay alert the entire day. So, not having to waste energy switching between the keyboard and mouse is helpful as well.
For instance, when you have a 20 sheet Excel file with thousands of columns and rows, it makes a lot of sense to use hotkeys. I’ve seen coworkers scroll all the way down for thousands of rows, and it can take up 20-30 seconds. They could be pressing Ctrl + Down, as it will take a second to get to the end of the spreadsheet!
Knowing hotkeys make navigating through your Excel workbook easy and, quite honestly, fun.
So how do we accomplish this?
First, I recommend learning shortcuts and hotkeys for the current actions you use. For instance, if you’re currently building a spreadsheet, some simple hotkeys you can use are Ctrl + C for copying and Ctrl + V for pasting.
From there, any other actions in Excel, you can learn as you go. Googling will be your best friend for this.
To help you get started, I’ve created a list of my favourite hotkeys for you to use.
First of all, what is hardcoding?
Hardcoding is when cells contain numbers or text with no explanation behind them.
Hardcoding makes it difficult for multiple reasons, include:
- Users reviewing your work will have to ask where these numbers came from
- If you need to make changes later on, you may need to go back and fix the hardcoded cells.
So, when performing an analysis on a dataset or any other Excel exercises, you mustn’t code cells with random numbers (e.g. “0.05” vs A1 * B3).
When we receive data from clients or surveys, very rarely is the data clean.
The most common issues with messy datasets are:
- Missing data values
- Different formatting
- Incorrect spelling
- Incorrect logic
Here is my 3 step process to clean datasets before conducting any data analysis.
Whenever I open a huge dataset, I always get a little overwhelmed. It can be quite anxious, not knowing where to start. Before we clean the data, we want to have a broad understanding of what’s going on.
To help you understand this step, I’ll be walking you through this dataset of bike sale transactions from Kaggle.
The first step of this process is to look at the data from a top-down approach. You can do this by looking at the first row of the dataset and quickly interpreting the row.
In the “Transactions Data” Excel sheet above, I have highlighted the first row in red. You can quickly create filters by pressing Ctrl + Shift + L.
Some insights I’ve garnered from this dataset are:
- There are 13 columns. You can find this by looking at the bottom right of the spreadsheet.
- The “Transaction ID” column is the unique ID for the dataset. Unique IDs are essential to datasets, as it differentiates the rows.
There should not be any identical transaction IDs. You can check for duplicates in multiple ways. The simplest way is to go to the Data ribbon and click on the “Remove Duplicates” button.
No duplicate values were found after running the “Remove Duplicates”.
- Each category label (e.g. online_order) makes sense, as in there some relevancy between the category label and bike transactions. For instance, if I noticed a column labelled “Clothes”, that would be a red flag.
- Products from transactions are categorized under the following columns: brand, product line, product class, product size, list price, and standardized price.
- Transactions are categorized under the following columns: customer ID, transaction date, online order, and order status.
Analysis: To conclude, all transactions are unique. Bike transaction data is categorized into thirteen columns. Seven columns are relevant to the product purchased, and four columns are relevant to the transaction details.
Simple enough, right?
We’ve gained some insights into the dataset, and now it’s time to dig further. Part of doing data analysis is being a detective and understanding your data.
Next, we can use data filters to scan through all the unique row items for each column. Data filters help break down the dataset without having to go through all the rows. From the dataset, some critical insights derived from the dataset are:
- Product ID: For product ID, it seems there are 100 unique product IDs. For reference, there are 20,001 rows of transactions.
- Customer ID: It’s not conclusive how many unique customer IDs are in the dataset for customer ID, but I can check later when doing data analysis. However, one observation is it seems the IDs go up by one until customer ID 3500. From customer IDs 3500 to 5034, it seems there is missing transaction data.
- Transaction Date: For the transaction date column, all transactions occurred in 2017, and there is no missing data for any of the months.
- Online Order: For the online order column, it appears there are three options. It can either be set to True that it was an online order or set to False to not an online order.
Interestingly enough, there is another option such that there were blank rows. It appears there were 360 rows that we are unsure if the transaction was an online order.
- Order Status: No issues identified for the order status. It can be set to either approved or cancelled.
To summarize, you can use data filters to find missing data and draw insights from the data. Data filters are great for scouring through the entire dataset without having to go through each row and confirm if data is missing or other issues.
From the previous step 2, it appears the most common issue in our dataset is missing data.
We can quickly filter for each of the columns with the blank labels and remove the rows with blank labels from the dataset. This process is cleaning a dataset.
From the dataset, there are approximately 197 rows with blanks. The first step is selecting all the cells with blanks.
We can do this by selecting cell A139, then pressing Ctrl + Right and Ctrl + Down to choose the data. Then, by simply pressing Ctrl + – button, it will be removed. It’s crucial we delete the cells and not clear the cell contents, or else we’ll have empty rows in our dataset.
For this data analysis, it’s essential to know what we’re looking for. In this case, I’m going to ask the question:
With that in mind, the following tips can help us answer this question.
Pivot tables are incredibly useful in data analysis. There is a slight learning curve to understanding how these tables work, but once you learn it, you’ll be able to summarize, organize, and create insights.
Let’s walk through some pivot tables I created from the bike transaction dataset.
Pivot Table 1
I’ve summarized the % of each bike brand’s transactions as a % of total transactions for this pivot table. In doing so, it appears the company’s most popular bike sales are attributed to the Solex bikes.
Insight 1: The company should focus on selling Solex Bikes and perhaps create a partnership with Solex. These bikes tend to sell faster and are much more popular.
Pivot Table 2
For this pivot table, I’ve summarized the % of online orders between True and False, and it appears the type of orders are split between online and non-online orders.
Speculative Insight: There are not too many insights we can draw from this pivot table. However, one speculative insight is if the company had invested money in online marketing campaigns, we could conclude the online campaigns were ineffective.
Pivot Table 3
I’ve summarized the % of transactions for each bike size as a % of total transactions for this pivot table. In doing so, it appears the medium bike size is the most popular at 65.66% versus the large and small bike sizes.
Insight 2: In terms of an operations perspective, the company can focus its supply chain on making sure they have more medium-sized bikes instead of large and small bikes.
For instance, if the company has a new bike model coming into the store, the company can supply one large-sized bike and one small-sized bike and three medium-sized bikes.
- Index, Match or Vlookup
When performing data analysis, you mustn’t make any changes to the dataset. Users may be confused when trying to understand your model, as it would be difficult to differentiate between data work and data.
To solve this, you can create a new sheet in the workbook and reference raw data using Index and Match, or Vlookup.
Now, for the visualization portion. We’ve already done the heavy lifting of data cleaning and data analysis. We need to finalize all our hard-work and present our data analysis easily to digest, bite-size pieces.
Conditional formatting is great for helping you quickly develop insights from your spreadsheet. You must use the right colours. I typically stick to red and green, as the two colours are generally known to be not good and good, respectively. However, there may be instances where it’s appropriate to use other colours.
Charts can be a simple way to show insights that are much more visually appealing to users. Executives at firms love graphs. It makes things super easy to understand, and quite honestly, not dull.
Microsoft Excel is an excellent tool for data analysis, and many take it for granted since there is so many other software such as Python and R that can be used. But, just knowing how to use Excel effectively can be great for data analysis.
I hope you’ve learned some Excel tips from this data analysis case study, and if you have any suggestions or tricks you know, feel free to drop a comment in the comment section below!