Data Analysis Project 1 (Power BI) - HR Relevant Data with Staff Attendance Insights
Importing Data, Transposing Data, and Metrics Creation (DAX) with Power Bi
Table of contents
- Project Details
- Importing the data from Excel and Building the Database & Dashboard on Power BI
- Step 1: Import the data into Power BI
- Step 2: Create the template
- Step 3: Transpose the template data
- Step 4: Remove data type errors
- Step 5: Create a Parameter
- Step 6: Create a Function
- Step 7: Load the prepared data into Power BI
- Step 8: Metrics/Measures creation
- Step 9: Dashboard building & more measures
- Recommendations
- My Portfolio
- REFERENCES:
Project Details
Company: AtliQ
Stakeholders:
Founder of Atliq
HR Manager
Task:
- Understand the work preference of staff, between working from home and working in the office. Also the frequency on what days of the week.
(Why? For better capacity planning for a hybrid system, better team bonding activity planning and to save office rental space/size costs.)
- Understand the percentage of sick leave and frequency of what days of the week.
(Why? For adequate office sensitization procedures in the case of an epidemic/pandemic and putting appropriate health precautionary measures in place.)
SUMMARY:
Staff Working Preference
Staff Sick Leave Percentage
Importing the data from Excel and Building the Database & Dashboard on Power BI
If you don't have Power BI already on your Windows desktop, you can get it from the Microsoft Store.
Step 1: Import the data into Power BI
Open Power BI and click on “Import data from Excel”.
Right-click on the file and select “Transform data” (to combine data from multiple sheets). A power query editor, which is a data preparation feature of Power BI, opens up containing the different sheet titles.
Note: The goal is to put the dates in one column and not have the dates in separate columns as seen in the first image. We are transforming the dataset from the wide format to the long format for better data appending.
Step 2: Create the template
Right-click on the "Attendance Sheet 2022..." query and duplicate it. Then rename the duplicate to “Template”.
Left-click on the dropdown beside the "Name" column to filter the rows and leave the “Apr 2022” row.
Right-click on the "Data" column and select "Remove Other Columns".
Click on "Table" to expand the data table.
Note: The template is created to be applied to all available sheets and future sheets added.
Step 3: Transpose the template data
Delete the "Change Type" step. If it is not deleted and the transformation is applied to the other sheets an error will be thrown, it would look out for similar column names and the sheets have different column names, in this case (Apr 1, May 1, ...).
Click on the table dropdown and select “Use First Row as Headers”.
Click on the table dropdown and select “Remove Top Rows...”
In the pop-up enter “1” for just the first row.
Rename the first two columns to "Employee Code" & "Name".
Select the first two columns “Employee Code" & "Name” (Ctrl & click).
Select the “Transform” tab and click on the Unpivot dropdown then select “Unpivot other columns”.
Rename the "Attributes" column to "Date".
Note: "Unpivot" is the same as transpose, which puts the rows into a single column. Observe the dates are in rows and paired with the accurate names and employee codes.
Step 4: Remove data type errors
The data has some insights in the "Date" column that are "Text" data types, therefore the data type needs to be changed to "Date" data type. "Error" shows in the place of the text insights after this change, as a result, the errors need to be removed. Removing these errors is for future automation purposes so that when the Excel sheet is updated with new attribute columns they are not automatically displayed in this modified "Date" column.
Right-click on the "Date" column and select "Change Type" then select "Date".
Right-click on the "Date" column and select “Remove Errors”.
Note: When transforming data, consider putting dynamic structures in place when creating solutions.
Step 5: Create a Parameter
A parameter is a dynamic way of filtering the data.
In the Home tab, select the dropdown for “Manage parameters” and select “New Parameter”.
In the prompt locate the Name input and enter “Worksheet”, for Type enter “Text” and Current Value enter “Apr 2022”.
Go back to the Template query and click on the applied step titled “Filtered rows”.
Select the gear/setting icon beside the step for the pop-up to display
Select the "Parameter" option which filters with “Worksheet” and not with Apr 2022 directly.
Note: To apply this transformation to all the sheets a function needs to be created.
Step 6: Create a Function
All the transformation steps are encapsulated in the “Template” query, therefore to apply it to all the other sheets a function is created for code reusability and to reduce repetition.
Right-click on “Template” select “Create Function” and enter “GetData” as the function name in the prompt.
Go back to “Attendance sheet 2022 - 2023…” left-click on the dropdown icon beside the "Name" column to filter the rows and remove the "Attendance Key" row.
To add the function as a column go to the “Add Column” tab and select “Invoke custom column".
In the prompt under “New column name” & “Function Query” enter “GetData”
Errors pop up due to the column names missing April date headers. Go to “Template” and identify the step that references the column headers, the first “Change Type” and delete the step.
Go back to “Attendance sheet 2022-2023…”. To expand the data click on the icon beside the “GetData” column name and deselect “Use original column name as prefix”.
"Ctrl and right-click" to select multiple columns, "Name, Data, Kind, Hidden", to delete the columns right-click and select "Remove Columns".
Rename the “Item” column to “Sheet Name” and “Name.1” to “Name”.
Right-click on the columns "Employee Code, Value, and Name" and change their data type to "Text". Then change the "Date" column to the date data type.
Note: Ensure the data is rechecked in random sections to ensure correlation with the Excel data. Whatever is added to the Excel data over time will reflect.
Step 7: Load the prepared data into Power BI
Only what is needed is loaded to the Power BI for visualization therefore the load on “Template” is disabled.
Right-click on "Template" and select "Enable Load" to disable it.
Rename “Attendance sheet 2022-2023…” to “Final Data”
Go to the “Home” tab and click on “Close & Apply” to load into Power BI.
Note: When you have date values in multiple columns remember to bring them into one column for easy working/format creation when using Power BI.
Step 8: Metrics/Measures creation
Note: Measures are created using DAX which is a query expression language.
Metrics are quantitative assessments that enable decision-making. In Power BI metrics are equal to measures. We are looking for two main metrics for this project:
Percentage of workers from home (working preference).
Percentage of sick leave.
Note: More metrics/measures will be created for more specific insights.
I.
Measure: Total Working days
Select "Final Data" and switch to the "Table view" then click on the "Home” tab and select “Enter data”.
In the prompt under the “Name” input enter “Measure Table” to enter the data manually.
Right-click on “Measure Table” and select “New measure”.
- Change “Measure” to “Total Working Days” and enter the following code:
Total Working Days = Var totaldays = COUNT('Final Data'[Value]) Var noworkdays = CALCULATE(COUNT('Final Data'[Value]), 'Final Data'[Value] in {"WO", "HO"}) RETURN totaldays - noworkdays
Note: WO (weekly off) and HO (holiday off) need to be removed from the count. The noworkdays
variable uses the CALCULATE
function to only count "WO" and "HO" from "Final Data" and then subtract them from the totaldays
count to get the Total Working Days
.
- Drag the “Total working days” measure, to the canvas on the “Report View” tab for visualization. Select the “Card” to change it to the number visualization.
Now we have the total working days we would like to know the total present days of the staff. That is, how many days are staff present out of the total working days. WFH(work from home) and HWFH (half work from home) are also considered as present days. We need to create a column in the “Final Data” table that calculates these as 1 and 0.5 respectively to have an accurate total count of present days.
II.
Measure: Total Present Days (WFH & HWFH Count)
- Navigate back to the "Table View" and create a new column called "WFH Count" with the following code:
WFH Count = SWITCH(
'Final Data'[Value],
"WFH", 1,
"HWFH", 0.5,
0)
Note: Right-click on the “Value” column to filter only “HWFH” and “WFH” This is to confirm that the code works and the “WFH Count” column should reflect 0.5 beside HWFH and 1 beside WFH and all the other values should be 0.
- Create a new measure called “WFH count” to get the sum count with the following code:
WFH Count = SUM('Final Data'[WFH Count])
- Create a new measure, "Present Days" with the following code.
Present Days =
Var Presentdays = CALCULATE(COUNT('Final Data'[Value]), 'Final Data'[Value] = "P")
RETURN
Presentdays + [WFH Count]
The Present Days
code is updated by adding the WFH Count
measure to the Presentdays
variable being returned.
- Drag the measure: “Present Days” to the canvas on the “Report View” tab for visualization. Select the “Card” to change it to the number visualization.
III.
Measure: Present %
- Create a new measure, Present % with the following code:
Present % = DIVIDE([Present Days], 'Measure Table'[Total Working Days],0)
Drag the “Present %” measure, to the canvas on the “Report View” tab for visualization. Select the “Card” to change it to the number visualization.
Left-click on the “Present %” measure and select “%” under the “Measure tools” tab to change the decimal to percentage.
For better visualization practices, a month column will be created in the table to visualize the data dynamically by month.
- Navigate to the “Final Data” table view section and create a new column “Month” for the months only from the dates, with the following code:
Month = STARTOFMONTH('Final Data'[Date])
- Change the display format to "Mmm yyyy". That is in the “Format” input under the “Column tools” tab.
Navigate back to the "Report View", drag "Month" to the canvas, and remove the “quarter” and “day” selections.
Select “Slicer” from the Build Visual, Visualizations section to change the data display.
Step 9: Dashboard building & more measures
The most important insights are at the TOP LEFT of the dashboard.
IV.
Measure: Work from Home Percentage (WFH %)
- Create a new measure, WFH % with the following code:
WFH % = DIVIDE([WFH Count], [Present Days], 0)
Note: 0 is added so that no error is thrown if there is an alternate value.
Drag the “WFH %” measure, to the canvas on the “Report View” tab for visualization. Select the “Card” to change it to the number visualization.
Left-click on the “WFH %” measure and select “%” under the “Measure tools” tab to change to percentage.
V.
Measure: Sick Leave Insights (SL %)
Create a column in the “Final Data” table that calculates “SL”(sick leave) and “HSL” (half sick leave) as 1 and 0.5 respectively to have an accurate total count of sick leaves.
- Navigate back to the "Table View" and create a new column called "SL Count" with the following code:
SL Count =
SWITCH(
'Final Data'[Value],
"SL", 1,
"HSL", 0.5,
0)
Note: Right-click on the “Value” column to filter only “HSL” and “SL” This is to confirm that the code works and the “SL count” column should reflect 0.5 beside HSL and 1 beside SL and all the other values should be 0.
- Create a new measure, “SL count” which gets the sum of the count with the following code:
SL Count = SUM('Final Data'[SL Count])
- Create a new measure, “SL %” with the following code:
CODE: SL % = DIVIDE([SL Count], [Total Working Days], 0)
Drag the “SL %” measure, to the canvas on the “Report View” tab for visualization. Select the “Card” option in the build visualization tab to change it to the number visualization.
Left-click on the “SL %” measure and select “%” under the “Measure tools” tab to change to percentage.
To understand who is working from home and on which days(underlying data), these are the next steps that help answer the "WHY?".
Dashboard Build: Add Table
Select “Table” from the “Build visual” in the Visualizations tab.
Drag “Name”, “Present %”, “WFH %” and “SL%” into the table columns.
Add the dashboard title and select “Text box” from the “Home” tab. Enter the name and make positioning and format changes using the “Format page” under the Visualizations tab.
Note: If you left-click on the table column headers e.g.: “Present %”, it sorts the data from ascending to descending and vice versa when clicked.
After the steps above your dashboard should look like this:
Dashboard Build: Add Matrix
Copy and paste the table, change it to a “Matrix” and then remove the other columns leaving the “Name” column.
Drag “Value” into the value input of the Matrix.
Drag the “Date” into the column input of the Matrix
Left-click the Date drop-down icon in the input and select Date, not Date Hierarchy.
Note: The tables can be expanded by clicking the expand icon, Focus mode.
After the steps above your dashboard should look like this:
Dashboard Build: Add Area Chart (Present %)
We want to see the trends of each insight over time therefore we’ll add an “area chart” to the dashboard.
Select “Area chart” and drag the “Present %” measure to the Y-axis input of the area chart.
Drag the Date to the X-axis input of the area chart.
Left-click the “Date” drop-down icon in the input and select Date, not Date Hierarchy.
After the steps above your dashboard should look like this:
Note: Notice the irregularity in the area chart and the numerous 0 percentages this is the weekend, follow the next steps to remove the weekend counts from the visualization.
- Drag value to the “Filter on all pages” input and select all and unselect “WO”.
After the step above your dashboard should look like this:
Note: Notice the drop in the graph, referring to the Excel file there are blank cells within those timeframes. The following steps below filter those timeframes out.
In the “Filter on this page” input, drag the “Date” and select “Advance filtering” under Filter type.
Under the “show items when the value” input select “on or before” and enter 6/17/2022. This is because the data ends on June 17th, 2022, then select “Apply filter”.
Add a trend line to the area chart from the “add further analyses to your visual” section in the visualizations tab to understand the data trend over time.
After the steps above your dashboard should look like this:
Dashboard Build: Add an Area Chart (WFH %)
Copy and paste the area chart, drag the WFH %, and remove the Present % from the Y-axis input of the copied area chart.
Change the color under the “Format your visual” section in visualizations. In "color" under the "lines" dropdown section.
Dashboard Build: Add an Area Chart (SL %)
Copy and paste the area chart, drag the SL%, and remove the WFH% from the Y-axis input of the copied area chart.
Change the color under the “Format your visual” section in visualizations. In "color" under the "lines" dropdown section.
Add data labels to each area chart, under the "Format your visual" tab.
After the steps above your dashboard should look like this:
Dashboard Build: Add Tables (Present %, WFH% & SL%)
To have more insights per day of the week. A "Day of Week" column needs to be added to the "Final" Data table.
- Navigate to the “Final Data” table view section and create a new column “Day of Week”, with the following code:
Day of week = FORMAT('Final Data'[Date], "ddd")
Navigate back to the "Report View" of "Final Data", drag “Day of Week” to the canvas, and add “Present %” to the table.
Copy and paste the table, remove the Present % from the Column input of the copied table then drag the WFH% measure to the columns input.
Copy and paste that table, remove the WFH% from the Column input of the copied table then drag the SL% measure to the columns input.
After the steps above your dashboard should look like this:
Note: Power BI visualizations make reusability easier by copying and pasting visualizations promoting productivity and saving you time.
Feel free to change the theme (colors, font, font size, etc) of the dashboard to your preference.
That's it!!
Recommendations
My recommendations can be found in this Data Presentation.
My Portfolio
https://www.datascienceportfol.io/estheribom
I hope this was helpful, the idea was to make the steps very detailed and beginner-friendly. The pictorial steps were very detailed as well and the goal is to help the reader save time locating the features as a first-timer working with Power BI.
The data used and other references are highlighted below. I hope you learned one or more Power BI basics. Feel free to add your twist along the way and share more knowledge in the comments!