Azure Data Explorer is a fully-managed data analytics service for real-time analysis on large volumes of streaming data. It offers fast ingestion with linear scaling that supports up to 200 MB of data per second per node. It allows for querying large amounts of structured, semi-structured (JSON-like nested types), and unstructured (free-text) data with an intuitive query language (KQL).
Like many other big data services, Azure Data Explorer uses a cluster of compute resources with the ability to scale up or down the compute resources of the nodes in the cluster as well as the ability to scale out or in the number of nodes in the cluster.
Scaling the cluster appropriately is critical to the performance of Azure Data Explorer. In most cases, the demand on a cluster changes over time and can't always be predicted. A static cluster size can lead to underutilization or overutilization, neither of which is ideal. This is why it is recommended to configure auto-scaling for your cluster.
Notice that auto-scaling is only possible for horizontal scaling (scale out). You can also configure vertical scaling, but this is something that you need to do manually. Notice that the vertical scaling process can take a few minutes, and during that time your cluster will be suspended.
But let's focus on auto-horizontal-scaling. You can configure auto-scaling in two ways. The first is called Optimized Auto-Scale, and the second is called Custom Auto-Scale.
With optimized auto-scale, the cluster manages everything automatically. All you need to do is to configure the minimum and maximum node count in the cluster. Once you enable optimized auto-scale, the cluster begins to analyze CPU utilization, cache utilization, and ingestion utilization.
If your workload has a seasonal pattern, such as a daily spike at the same time, more or less, then Azure Data Explorer can identify this pattern. It analyzes resource utilization in the past few weeks, and schedules scaling operations ahead of time, which allows the cluster to complete the scaling and rebalancing process before the spike hits.
But when there are sudden workload changes that differ from the pattern and create a state of overutilization or underutilization, then the cluster will auto-scale (out or in) immediately. Well, not exactly immediately. This evaluation takes place once every hour, and it analyzes data from the last 24 hours.
Sometimes optimized auto-scaling might not be a good fit. If your workload is not predictable and changes all the time, then optimized auto-scaling might not be fast enough to adjust accordingly, and performance will suffer. Based on the logic of optimized auto-scale, it also might take too much time until scaling in happens (or it might even never happen), and this can result in an overspending.
In such cases, you should consider custom auto-scaling. This option allows you maximum flexibility in defining the rules for auto-scaling. For example, you can specify that when CPU utilization is above 90% for more than 30 minutes, then an additional node should be added to the cluster. You can combine multiple rules, and you can even configure the time between scaling operations (cool down).
Using custom auto-scaling allows more flexibility and better optimization, but it is more complex to manage, and it requires monitoring of the cluster and adjustments over time. If you use custom auto-scaling, then it's important that you monitor the cluster and make adjustments to the auto-scaling configurations for best performance and cost optimization.
Bình luận