Chapter: 5 Power Query Basics That Actually Matter

If Power BI is a house, Power Query is the foundation. If the foundation is weak, everything built on top of it will eventually fail.

In public health and development programs, most reporting problems are not caused by poor visuals. They are caused by inconsistent, fragile, or undocumented data preparation. Power Query exists to make data cleaning explicit, repeatable, and defensible.

This chapter focuses on the parts of Power Query that actually matter for reliable data use.

5.1 Why Power Query Matters in Public Health Data

Health data is complex by nature. It comes from multiple systems, is collected by different actors, and often changes definitions over time. Missing values, placeholder codes, and inconsistent formats are common.

When these issues are handled manually in Excel, the logic is hidden. Hidden logic cannot be reviewed, reproduced, or trusted.

Power Query forces transparency.

Every transformation is recorded, ordered, and replayed exactly the same way every time data is refreshed. This is not convenience. It is governance.

5.2 What Power Query Is (and Is Not)

Power Query is a data transformation engine.

When you click Transform Data in Power BI Desktop, you open the Power Query Editor, a dedicated environment for preparing data before analysis.

Power Query is:
- Deterministic
- Repeatable
- Explicit

Power Query is not:
- A visualization tool
- A place to calculate indicators
- A replacement for data modeling

Its purpose is singular:
turn raw, messy data into structured analytical inputs.

5.3 Understanding the Power Query Interface

The Power Query Editor has three areas that matter conceptually.

5.3.1 The Ribbon

The ribbon contains buttons for common transformations such as:
- Split Column
- Group By
- Replace Values
- Change Data Type

These buttons do not directly clean data. They generate steps. The step matters more than the button.

Getting CSV file using the Get Data button in Power BI Desktop

5.3.2 The Queries Pane

Each item in the Queries pane represents a dataset, referred to as a query.

In this book, these include:
- fact_patient_visits
- dim_patient
- dim_facility
- dim_date

At this stage:
- Do not join queries
- Do not calculate indicators
- Do not optimize visuals

Think of each query as an ingredient being prepared independently.

5.3.3 The Applied Steps Pane

This is the most important part of Power Query.

Every transformation is recorded as a step:
- Source
- Navigation
- Changed Type
- Removed Columns
- Replaced Values

Note: Together, these steps form a data preparation contract: This is exactly how raw data becomes analysis-ready.

Deleting or reordering steps changes that contract.

5.4 Why Power Query Feels Like “Magic”

In Excel, data cleaning is manual and repetitive. If you clean this month’s file, you must repeat the same work next month.

In Power Query, you are not cleaning data once. You are recording logic.

When new data arrives and you click Refresh, Power Query replays every applied step automatically. The same rules. The same order. The same results.

This is the difference between one-off reporting and sustainable monitoring.

5.5 Power Query Thinking vs Excel Thinking

Excel encourages manual correction.

Power Query requires logical definition.

In Excel:

- You fix a cell

- The logic lives in your head

In Power Query:

- You define a rule

- The logic lives in the steps

If a fix cannot be described as a rule, it does not belong in Power Query.

5.6 The Principle of Repeatable Cleaning

In Monitoring and Evaluation work, numbers must be defensible.

If someone asks, “How did you arrive at this figure?”, the answer should be a documented sequence of steps, not personal judgment.

Before applying any transformation, ask:
- Will this still work if new rows are added?
- Will this break if values change?
- Can another analyst understand this step?

If the answer is no, rethink the approach.

5.7 Query-Level vs Column-Level Cleaning

Not all cleaning decisions are equal.

Some issues affect the entire dataset, while others affect specific columns.

Query-level cleaning applies to the structure of the dataset
Examples include:
- Removing empty rows
- Filtering invalid records
- Renaming queries
Column-level cleaning applies to individual fields
Examples include:
- Standardizing Yes/No values
- Converting data types
- Replacing sentinel values

A good rule of thumb:
- Structural problems belong at the query level while Meaning problems belong at the column level.

5.8 Working With the Sample Dataset

This chapter focuses primarily on the fact_patient_visits table.

This dataset intentionally includes:
- Inconsistent Yes/No values
- Dates stored as text
- Sentinel values such as 999
- Missing ART start dates and regimens

These are not errors. They reflect real system behaviour and are included to support learning.

5.9 Understanding the Role of Each Table in Cleaning

Not all tables require the same level of cleaning, and not all issues should be handled in the same place.

In this project, the tables serve different purposes:

fact_patient_visits
Contains transactional records and is the primary source of complexity. Most cleaning effort happens here.
dim_patient
Contains relatively stable demographic attributes. Cleaning focuses on consistency and validity.
dim_facility
Mostly descriptive. Cleaning focuses on standardization rather than logic.
dim_date
Should be clean by design. Any issues here usually indicate upstream problems.

Understanding the role of each table helps you decide what to clean, where to clean it, and what to defer.

5.10 Core Data Cleaning Patterns for Messy Health Data

5.10.1 Setting Correct Data Types

This is the most common source of silent errors.

Data types determine what can be calculated, aggregated, and filtered correctly.

Common issues include: - Dates stored as text - Numeric values mixed with symbols - Boolean values recorded inconsistently

Always review the automatically generated Changed Type step. Do not assume Power BI guessed correctly.

5.10.2 Exercise 5.1: Data Type Audit

Objective: Identify and correct incorrect data types.

Steps: 1. Open the Power Query Editor 2. Select the fact_patient_visits query 3. Review the data type icon beside each column 4. Identify at least three columns with incorrect data types 5. Explicitly set the correct data type

Reflection: - Which values became null after correction? - What does this reveal about source data quality?

5.10.3 Standardizing Categorical Values

Health datasets often encode the same concept in multiple ways: - Yes, YES, Y, 1 - No, NO, N, 0

These differences matter in filters and calculations.

The goal is one representation per concept.

5.10.4 Exercise 5.2: Standardizing Yes and No Fields

Objective: Normalize categorical values.

Steps:

1. Choose a Yes/No column (for example, hiv_tested)

2. List all distinct values

3. Use Replace Values to standardize to Yes and No

Reflection:

- How many representations existed?

- What errors would occur if these were left uncleaned?

5.10.5 Handling Missing and Sentinel Values

Missing data is not the same as zero.

Common placeholders include: - 999 - -1 - Unknown - Not Recorded

Leaving these values untreated will distort averages and rates.

Convert placeholders to null. Power BI ignores nulls in calculations.

5.10.6 Exercise 5.3: Replacing Sentinel Values

Objective: Prevent placeholder values from distorting analysis.

Steps:

1. Identify numeric columns with sentinel values

2. Replace sentinel values with null

3. Confirm the column data type remains numeric

Reflection:

- Why is null safer than a numeric placeholder?

- How would this affect indicator calculations?

5.10.8 Exercise 5.4: ART Logic Consistency

Objective: Apply logical rules to treatment data.

Steps:

1. Identify records where art_started = Yes but art_start_date is missing

2. Identify records where art_started = No but art_regimen is populated

3. Define a consistent rule

4. Apply the rule using Power Query logic

Reflection:

- What assumptions are safe at the cleaning stage?

- What should be deferred to analysis?

5.10.9 Unpivoting for Time and Indicator Analysis

Wide tables are human-readable. Tall tables are machine-readable.

Power BI works best with tall tables where:

- Time is a variable

- Indicators are values, not column names

5.10.10 Exercise 5.5: Unpivoting Data

Objective: Convert wide data into analysis-ready format.

Steps:

1. Identify identifier columns

2. Select those columns

3. Right-click and choose Unpivot Other Columns

4. Rename resulting columns appropriately

Reflection:

- Why does this structure support trend analysis?

- How does it simplify dashboard design?

5.11 Reading Power Query Steps as Logic

Every Power Query step is written in M language.

You do not need to master M at this stage, but you should:

- Read step names carefully

- Rename steps to reflect intent

- Avoid leaving steps as generic defaults

Note:

Clear steps build trust.

5.12 Why the Order of Steps Matters

Power Query steps are executed from top to bottom.

This means:

Changing data types before replacing values may introduce errors
Removing columns too early may break later steps
Reordering steps can silently change results

For example:

Replacing 999 with null should happen before setting numeric data types
Splitting columns should happen before renaming them

Note:

When debugging Power Query issues, always review step order first.

5.12.1 Exercise 5.6: Renaming Applied Steps

Objective: Improve transparency and maintainability.

Steps:

1. Review the Applied Steps pane

2. Rename steps such as Changed Type

3. Use descriptive names

Set Correct Data Types
Standardize Yes No Values
Replace Sentinel Values

Reflection:

Could another analyst understand this logic?
Would this hold up in an audit or handover?

5.13 Data Quality Rules Reference

The table below summarizes data quality rules applied throughout this book.

Domain	Data Element	Common Issue	Rule Applied	Rationale
Demographics	Age	Text instead of numeric	Convert to numeric, invalid to null	Prevent calculation errors
Visits	Visit Date	Stored as text	Convert to Date	Enable time analysis
ANC	ANC Visit	Inconsistent categories	Standardize values	Reliable filtering
HIV	HIV Tested	Mixed encodings	Normalize categories	Accurate rates
ART	ART Start Date	Missing when ART = Yes	Flag as incomplete	Avoid false counts
ART	ART Regimen	Inconsistent labels	Standardize text	Regimen analysis
Malaria	Test Result	Placeholder values	Replace with null	Correct positivity
TB	TB Screened	Missing values	Treat conservatively	Avoid overestimation
NCD	Screening Result	Mixed formats	Normalize categories	Comparable indicators

These rules support clean analysis, not indicator definitions.

5.14 What Not to Do in Power Query

At this stage, do not:

- Calculate indicators

- Create ratios or percentages

- Join fact and dimension tables

- Apply user-specific logic

Note:

Power Query prepares data. Meaning is defined later.

5.15 A Simple Exit Checklist

Before leaving Power Query, ask:

- Are data types correct?
- Are categories consistent?
- Are missing values handled properly?
- Are steps readable and ordered?

If yes, you are ready to model.

5.16 What This Enables Next

By the end of this chapter, you should have:
- Clean, structured tables
- Documented transformations
- Confidence in your inputs

In the next chapter, we will build a data model that connects these tables using relationships and prepares them for indicator calculation.

Notes:

Power Query gives you control over data. Modeling gives you control over meaning.
Good dashboards start with good visuals.Trusted dashboards start with clean data.

5.17 Stop Point: When to Leave Power Query

You should exit Power Query when:

Data types are correct
Categories are standardized
Missing values are handled consistently
Structural issues are resolved

You should not remain in Power Query to:

Interpret indicators
Create performance metrics

At this point, the data is clean, not meaningful. The meaning comes next.

4 Connecting to Common Data Sources

6 The Star Schema: The Backbone of Analytical Models

Chapter: 5 Power Query Basics That Actually Matter

5.1 Why Power Query Matters in Public Health Data

5.2 What Power Query Is (and Is Not)

5.3 Understanding the Power Query Interface

5.3.1 The Ribbon

5.3.2 The Queries Pane

5.3.3 The Applied Steps Pane

5.4 Why Power Query Feels Like “Magic”

5.5 Power Query Thinking vs Excel Thinking

5.6 The Principle of Repeatable Cleaning

5.7 Query-Level vs Column-Level Cleaning

5.8 Working With the Sample Dataset

5.9 Understanding the Role of Each Table in Cleaning

5.10 Core Data Cleaning Patterns for Messy Health Data

5.10.1 Setting Correct Data Types

5.10.2 Exercise 5.1: Data Type Audit

5.10.3 Standardizing Categorical Values

5.10.4 Exercise 5.2: Standardizing Yes and No Fields

5.10.5 Handling Missing and Sentinel Values

5.10.6 Exercise 5.3: Replacing Sentinel Values

5.10.7 Cleaning ART-Related Fields

5.10.8 Exercise 5.4: ART Logic Consistency

5.10.9 Unpivoting for Time and Indicator Analysis

5.10.10 Exercise 5.5: Unpivoting Data

5.11 Reading Power Query Steps as Logic

5.12 Why the Order of Steps Matters

5.12.1 Exercise 5.6: Renaming Applied Steps

5.13 Data Quality Rules Reference

5.14 What Not to Do in Power Query

5.15 A Simple Exit Checklist

5.16 What This Enables Next

5.17 Stop Point: When to Leave Power Query