loader

Piece of cake! Learning KQL for DP-700

I’ve recently been studying to take the new DP-700 Fabric Data Engineering Associate Exam. I already have some Databricks certifications, so I have some existing Spark and Data Lakehouse knowledge. However, I was still surprised to learn that a lot of the DP-700 exam was made up of several topics I already knew quite well. There were, however, some notable gaps in my knowledge.

The most significant gap was around KQL or Kusto Query Language. I had never really learned much about KQL aside from the fact it exists. So I realised I needed to do some research and improve my KQL knowledge for the exam. I thought I would share my findings to help other people in a similar position.

Luckily, for the exam, you don’t need to be a KQL expert, so I won’t be going into too much detail, but hopefully, I will cover everything you need for the exam. This blog will focus on the language of KQL, writing a query and commands. I may look at wider Real Time Intelligence in a future blog.

I have assumed some basic knowledge of SQL in this guide, as it is aimed at DP-700 takers.

 

What is KQL?

Kusto Query Language is a “simple but powerful” (Microsoft’s words!) language that allows you to query structured, semi-structured, and unstructured data. It is designed to be optimal for querying telemetry, metrics and logs and performing time series analytics on sensor data, financial data or anything similar.

KQL is a query language which was initially internally developed and used by Microsoft. It was publicly released with Azure Data Explorer in 2018. It has since been open sourced and KQL is part of Fabric now in the Real Time Intelligence suite.

real time intelligence

 

Differences to SQL

You may be wondering how similar KQL is to SQL. SQL is another query language and there are some similarities in terms of commands. In fact, in most tools that use KQL you can also use SQL however KQL may be more performant for the task.

The main difference is that SQL is designed for managing and querying structured, relational data whereas KQL is designed for analytics on structured, semi structured and unstructured data.

KQL is also inherently designed for low latency real time analytics. SQL databases can be horizontally and vertically scaled but they may struggle with very large datasets and real time processing.

Another key difference is due to SQL being declarative due to the focus on the SELECT statement. This can be limiting at times as you need to know the schema of what you are querying unless you just SELECT * (Select every column in the results).

Unlike SQL, KQL is also case-sensitive for everything including table names, table column names, operators and functions.
Keywords can be used as identifiers by enclosing them in brackets and quotes ([' and '] or [" and "]). For example, ['where'].

 

Writing a KQL query

There are three main types of KQL statements

  • Tabular Expression
  • Let statement
  • Set statement

 

Tabular Expression

Tabular Expression queries are the most common in KQL. Tabular Expression queries are defined as being those queries where the input and output both consist of tables or tabular datasets. This is also the most like SQL as SQL queries also have an input and output of tabular data.

Let’s look at some simple SQL examples and the KQL version. For this I will use the weather analytics sample data which is used in the Microsoft docs.

Here is a simple query to count the records in the StormEvents table.

SELECT count(*) FROM StormEvents

We can see in the KQL version the table comes first and then we pipe an operator after. We use the count operator in KQL which has the same name as the SQL version.

StormEvents

| count

When I used to be an analyst in an earlier job I would often use something similar to the following query to see a sample of data from a table.

SELECT TOP 5 * FROM StormEvents

In KQL we use the take operator here to take a specified number of records from the table and return them.

StormEvents

| take 5

To see specified set of columns in SQL we add them to the SELECT statement.

SELECT TOP 5 State, EventType, DamageProperty FROM StormEvents

In KQL we use the project statement. This requires a new line and another pipe. Think of the structure like a funnel or a filter so we start with a table and as we add more pipes and commands the data is being filtered more and more.

KQL Funnel

Here is the KQL statement with the project statement added using another pipe.

StormEvents

| take 5

| project State, EventType, DamageProperty

 

Let Statements

Let statements give you the ability to create a binding between a name and an expression. This can be thought of like a variable in other programming languages. In the following query values are defined using let statements and then used in the query.

let n = 10;

let place = “VIRGINIA”;

let cutoff = ago(12345d);

StormEvents

| where StartTime > cutoff and State ==place

| take n

 

Set statements

Set statements are used to set request properties which control how a query or command executes and returns results. They are on the more advanced side of KQL and you don’t need to know about them for DP700.

If you do want to know more about request properties you can look at the official Microsoft documentation here.

 

User defined functions

You can create user defined functions which can be reused in your KQL by using a let statement to define the name of the function and the expression. The following query defines the function MultiplyByN for multiplying two numbers.

let MultiplyByN = (val:long, n:long) { val * n };

range x from 1 to 5 step 1

| extend result = MultiplyByN(x, 5)

 

Order of Operations

Order of Operations is very important in KQL as the query will execute in the order it is written. Therefore if you want to improve the speed of a KQL query you may want to filter data earlier in the order of operations so the query is not bringing as much back rather than filtering at the end after data has been retrieved which is not needed.

 

Other Kusto commands

dcount – Counts distinct rows in a column.

distinct – Eliminates duplicates leaving distinct rows in a column. The same as distinct in SQL.

extend – Adds calculated columns to the dataset.

mv_expand – This command creates a row for every value in an array. It is comparable to explode in Databricks SQL.

pivot – Rotates a table by turning unique values in a column in the input into columns in the output. The same as pivot in SQL.

summarize – Aggregates the content of the input table by a certain operator e.g. average, count

tolower – Converts input string to lower case

 

Kusto Window Functions

row_cumsum – Gives a running total of a numeric column.

row_number – Returns the row index number in the returned list

row_rank_dense – Returns the dense rank (rank with ties returned as the same rank)

row_rank_min – Returns the rank with the lowest ranked first

row_window_session – Returns the timestamp of the beginning of the defined session based on matching IDs

 

Kusto Documentation

https://learn.microsoft.com/en-us/kusto/query/?view=microsoft-fabric

KQL quick reference guide

https://learn.microsoft.com/en-us/kusto/query/kql-quick-reference?view=microsoft-fabric

 

Good luck for the exam!

I hope I have given a good introduction to KQL and helped you prepare for DP-700.

There is much more learning available on Microsoft Learn and even free practical labs, so give KQL a go.

If you need help with KQL or anything Fabric related you can find out more about our Fabric Offerings here.

If you need help with KQL or anything Fabric-related, you can explore our Fabric Offerings here. Or, if you’d like to chat with us directly, head over to our Contact Us page. We’d love to help!


 

author profile

Author

Ed Oldham