Optimizing Microsoft Azure: Microsoft’s Journey to Cloud Efficiency
How does Microsoft extract maximum value from its Microsoft Azure environment? The answer is not simply about minimizing the monthly Azure bill. It’s a continuous process of refinement and enhancement, a journey that began with the transition from managing its own data centers to leveraging Microsoft Azure. This shift has been a catalyst for learning and growth.
Initially, Microsoft migrated over 600 services and solutions, comprising approximately 1,400 components, to Azure-based cloud technologies. These technologies need fewer specialized skills to manage and offer faster, more agile access to resources and solutions. The Microsoft Digital team spearheaded this move, facilitating the company’s transition from on-premises data centers to the cloud, which enabled best-in-class platforms and productivity services for a mobile-first, cloud-first world. This strategy benefits the employees who use their services, the company’s developers, and the IT implementers who are the backbone of IT operations. The ability to provision resources in minutes, rather than days, markedly changed how teams utilized their resources, allowing them to create operating environments and resources quickly. This empowers engineering teams to respond rapidly to evolving business needs.
However, Microsoft discovered that easy provisioning and rapid deployments could lead to increased costs. An unmanaged or poorly managed enterprise estate in Microsoft Azure can result in skyrocketing cloud billing costs and underutilized resources. Through their journey, Microsoft has learned many effective strategies for optimizing cloud usage to keep costs down while transforming operations.
This post explores the lessons learned by Microsoft on how to fine-tune the use of Microsoft Azure.
Modern Engineering Practices
Modern engineering practices are critical to all elements of Microsoft Azure, ranging from single resource deployments to enterprise-scale, globally distributed Azure-based solutions that involve hundreds of resources. The vision for modern engineering has produced a cohesive culture, tools, and practices focused on developing high-quality, secure, and feature-rich services to support digital transformation across the organization. Microsoft’s operations and engineering teams have progressed through various phases of efficiency maturity. With each phase, the operations substructure needed to advance resulting in greater operational efficiency.
“Our operations and engineering teams have journeyed through several phases of efficiency maturity. Through each of these phases, our operations substructure had to evolve, and many of those changes resulted in increased efficiency, not just with the bottom line on our monthly Azure bill, but with the way we do service management in Azure, including development, deployment, change management, monitoring, and incident management.” —Pete Apple, principal program manager for Microsoft Azure engineering, Microsoft Digital

In fact, now that Microsoft has completely migrated to Microsoft Azure, the company is finding smart ways to use its cloud product more efficiently, says Pete Apple, a principal program manager for Microsoft Azure Engineering in Microsoft Digital. Apple and his team have been responsible for overseeing and implementing its massive migration to the cloud over the past eight years. They are also responsible for ensuring that the company’s enterprise estate in Microsoft Azure runs efficiently.
Microsoft’s journey to greater efficiency in Microsoft Azure involved three main phases:
- Improving operational efficiency.
- Delivering value through innovation.
- Embracing the digital ecosystem.
Phase One: Improving Operational Efficiency
Microsoft Digital plays a crucial role in their business strategy, as most business processes depend on it. During the first phase, key areas were identified for improvement: aligning services, optimizing infrastructure, and assessing the existing culture. Teams realigned to remove information silos between support areas which led to the realization of redundant projects with similar goals. Reducing projects and streamlining delivery methods freed up engineering resources. The engineering culture shifted to enable engineers to create business solutions, resulting in increased innovation, creativity, and productivity throughout engineering processes.
Phase Two: Delivering Value Through Innovation
In phase two, Microsoft embraced the Azure platform and cloud-native engineering design principles by adopting Infrastructure as Code and continuous deployment, streamlining IT operations. Rapid provisioning of Azure made speed improvements fortyfold. Azure native solutions, especially platform-as-a-service (PaaS) offerings, were adopted across all areas of the engineering and operations lifecycle. These included infrastructure as code with ARM templates, APIs, and PowerShell.
Phase Three: Embracing the Digital Ecosystem

Optimizing the firm’s use of Microsoft Azure has helped to keep costs down, notes Heather Pfluger, the general manager of Infrastructure and Engineering Services in Microsoft Digital Employee Experience. The focus of this last stage is to develop intelligent systems on Microsoft Azure to provide reliable, scalable services while connecting operational processes across Microsoft. Automation was integrated further into support and development processes by adopting a DevOps culture and open-source standards within their solutions. Microsoft Azure PaaS offerings and Microsoft Azure DevOps together enable engineers to concentrate on features and usability, while the ARM fabric and Azure Monitor provide unified management to provision, manage, and decommission infrastructure resources securely.
“This final phase is never really final. Continual evaluation and optimization of our Microsoft Azure environment is built into how we manage our resources in the cloud. As new features and engineering approaches arise, we’re adapting our methods and best practices to get the most from our investment.” —Heather Pfluger, general manager, Infrastructure and Engineering Services, Microsoft Digital
Solutions that were lifted and shifted into Microsoft Azure infrastructure as a service (IaaS) resources are regularly reassessed for migrating or refactoring into PaaS offerings. In addition, they adopted Microsoft Azure Monitor for aggregated monitoring, covering both Azure resources and on-premises resources.
Apple adds that customers’ migrations can benefit from a shortcut Microsoft didn’t take.
“As early adopters, our migration practices were pushing the toolsets available. When we looked at our on-premises environment and what was available in Azure, it made sense to move a significant portion of our solutions directly into IaaS resources.” —Pete Apple, principal program manager for Microsoft Azure engineering, Microsoft Digital
By leveraging the tools and best practices available today, on-premises solutions can be migrated directly into PaaS resources. This bypasses the need for lift-and-shift migrations and avoids the costs that are inherent with IaaS infrastructure.
Managing Data and Resource Sprawl

Microsoft is also working to lower costs and increase efficiency across their Microsoft Azure estate. Data sprawl is a constant challenge. Through migrating to Microsoft Azure and optimizing its use, Microsoft was able to maintain flat costs despite a 20 percent workload increase.
Specific details in areas such as Apache Spark clusters can make a big difference.
“Each job that comes through Azure Synapse Analytics is run on a Spark cluster for compute services. There’s a large selection of compute sizes available. The largest ones process data the quickest, but, naturally, they’re also the most expensive. We all like things to be done quickly, so many of our engineers were using very large compute sizes because they’re fast.” —Dan Babb, Principal Software Engineering Manager
Babb clarifies that high speed is not always necessary.
“Many of our jobs aren’t crucially time-sensitive, so we stopped using the bigger cluster sizes because we didn’t need to. Processing a workload on a smaller instance for 20 minutes instead of using a larger instance for 5 minutes has resulted in significant cost savings for us. We’re monitoring our subscriptions and if a really big cluster size gets spun up, an Azure Monitor Alert notifies our engineering leads they can follow up to ensure that the cluster size is appropriate for the job it’s running.” —Dan Babb, Principal Software Engineering Manager
The team is also creating solutions that are distributed across clusters and Microsoft Azure Synapse Analytics workspaces to create a distributed platform architecture that is more flexible and less prone to a single point of failure.
Designing for Zero Trust
The Zero Trust security model is prevalent across their Microsoft Azure environment. Based on the principle of verified trust, Zero Trust eliminates the embedded trust assumed inside the traditional corporate network. This architecture minimizes risk across all environments by establishing strong identity verification, validating device compliance, and ensuring least privilege access. The model assumes every request is a potential breach.
Microsoft Azure Entra ID allows them to centralize their identity and access workload, simplifying identification and authorization across the hybrid cloud. Zero Trust, coupled with Microsoft’s flexible network infrastructure, enables micro-segmentation scenarios with Microsoft Azure Bicep Templates and Virtual Networks. These scenarios would have been unimaginable using traditional networking practices to their engineers.
“Real-time detection data is more expensive than some of the other data storage options we have. As convenient as it would be to have it all, it’s not that critical. We move our older data into Azure Data Explorer where it’s less expensive to store, but still allows us to use Kusto Query Language (KQL) queries just like we would in Sentinel.” —Mei Lau, principal program manager, Security Monitoring Engineering, Microsoft Digital Security and Resilience
Lau’s team actively monitors data storage in Microsoft Sentinel for sudden spikes in usage or other indicators that data usage practices might need to be assessed. This contributes to an efficient and streamlined threat management system that works well and avoids excessive spending.
Observing Results and Managing Governance
Governance is critical for effective identification and implementation of cost optimization recommendations. The model consists of several components including:
- Microsoft Azure Advisor recommendations and automation.
- Tailored cost insights.
- Improved Microsoft Azure budget management.
Implementing a governance solution has enabled Microsoft to realize significant savings by making simple changes to how they use Microsoft Azure resources across the board.
As Microsoft continues its journey, it’s focused on refining its efforts and uncovering new opportunities for increased cost optimization in Microsoft Azure. Embracing modern engineering practices, using data to drive results, and using proactive cost-management are critical aspects of efficient cloud utilization.
“Our Microsoft Digital Azure footprint will continue to grow in the years ahead, and our cost-optimization and efficiency efforts will grow to ensure that we’re making the most of our Azure investment.” —Heather Pfluger, general manager, Infrastructure and Engineering Services, Microsoft Digital
Cloud optimization is an ongoing process. Adopting modern engineering practices, staying alert to new Azure services, and changes to existing functionality will help you recognize cost-optimization opportunities. Use data insights and proactive cost-management practices to implement changes quickly. Implement central governance with local accountability to identify gaps and improve cost-management methods.