Skip to content

Checking out Sentinel Data Lake

Ever since the announcement of Sentinel Data Lake I wanted to try it!

Introduction

Data Lake sounds like it's going to revolutionize the usage of Sentinel (and I really dislike using big words, but I really think so). Data Lake costs about 1% of Analytics Logs, which would at last make it possible to simple ingest all security logs to Sentinel and then pick out what's needed for Analytics. I'm a big fan of being able to really consolidate security logs in a single source, it's a big time-saver for incident response and forensics.

Below are my thoughts and experiences trying Data Lake.

Setup

The setup is fairly simple and I won't go into detail right now. Just follow Microsoft's documentation.

There are only 2 anecdotes to the official documentation:

Regions

According to Microsoft, only some regions are supported: Supported regions

You have to keep that in mind, because you simply can't deploy Data Lake if your Sentinel instance is in another region. And as you might now, Sentinel is one of the Azure resources, that can't be moved natively. Moving it manually is possible but a major PITA.

Permissions

The docs simply state the following regarding Azure permissions:

You must be the subscription owner.

Now this is only half true, because - for whatever reason - you will get a permission error if your Owner permissions are inherited: Error 1 Error2 As soon as you assign the Owner permission directly to the user, onboarding works.

Warning

Don't use inerhited permissions for Data Lake Onboarding

Success

Use directly assigned Owner permissions

Though do consider removing them after onboarding is done if you don't need them.

Usage

Data Lake Explorer

Located in the Sidebar under Microsoft Sentinel --> Data lake exploration --> KQL queries, you can set off queries against your Data Lake logs here.

Duplication

All data ingested into the Analytics Logs is automatically duplicated into Data Lake - for no additional cost. So you can theoretically query it here as well. Do keep in mind though, that querying Data Lake is a billable action, whereas querying Analytics Logs is always free!

DLQueries

Inventory Logs

In addition to the transferred Analytics Logs, Microsoft - at last - provides some inventory information from Entra ID. Until now that was something that was always missing, especially for correlations. In the below screenshot you can see all the native table, as well as two custom log tables in the bottom (which are Analytics Logs).

asdf

Though right now some of the data is not really usable, e.g. the EntraGroupMembershiphs doesn't really contain any relevant information.

asdf

Jobs

It's a funny name, just Jobs 😁 Anyway, it's essentially what Summary Rules used to be: You create a KQL query and all matching results are copied to a selected target table. This table can be in the Analytics Logs tier, so you effectively "promote" your logs. asdf asdf asdf

Considerations

Custom Table name

As you can see in the screenshot above, the newly created custom table gets the suffix _KQL_CL which is an addition to the general requirement that custom tables have to end in _CL. I'm not sure why it is done that way, but since creating a custom table is quite the hassle in my opinion, I'll just take it.

Column names and special characters

Now funny enough, the built-in tables I mentioned in Inventory Logs have columns which start with an underscore _ and Jobs don't support that. The error message is pretty cryptic until you find out that _SnapshotTime and _ReceivedTime are the columns that can't be added. So just use project-away or similar to get rid of them.

asdf

Timing

Jobs can only run daily at most - at least currently. Summary Rules could run every 20 minutes. So keep that in mind. For now this makes Jobs not suitable for near-real-time alerting (something that a SIEM should do in my opinion).

Verdict

After trying Data Lake, I still think it will be revolutionary (still don't like the sound of that). BUT there are some caveats that need to be fixed for Data Lake:

  1. Jobs need to be able to run more often. 5 minutes at minimum in my opinion. Or another option needs to be provided, but I don't want to go around splitting logs before ingest to get fast alerts and cost savings.
  2. The inventory data needs to be expanded, about half of the current tables are essentially worthless
  3. Some insights into accrued costs regarding Data Lake querying would be nice. I want to be able to see how much I've racked up by querying Data Lake, since that's not free. So I'm really excited to see where Data Lake goes during the preview or at launch. It could be a game changer or just good, that depends on the points above.