How AI on the Cloud Is Changing Everything | Narendra Mangala | TEDxGaya College of Engineering
Narendra Mangala presents a five-year research journey (2021–2026) tracing how AI on the cloud transformed enterprise data engineering, from foundational medallion architecture to autonomous agentic pipelines. He covers the evolution of data governance, MLOps-ready infrastructure, and the critical role of responsible AI compliance. His central argument is that trustworthy, scalable, and ethically grounded data architecture is the prerequisite for meaningful AI-driven business decisions.
Summary
Narendra Mangala opens with a vivid scenario of a CFO reading an AI-generated overnight report that processed 400 million rows of sales data, detected anomalies, and flagged supply chain risks — all without human intervention. He frames this not as science fiction but as the real-world outcome of five years of research (2021–2026) into cloud-based AI data infrastructure.
He begins with historical context, describing how enterprises 15 years ago operated with siloed data systems where answering a basic revenue question could take analysts three to four days. The ETL (Extract, Transform, Load) backbone of that era was fragile — brittle pipelines that routinely failed overnight. The arrival of cloud platforms like Azure and AWS changed the philosophy: instead of building infrastructure and fitting data into it, organizations could design for data first and scale infrastructure to match.
Mangala's 2021 research introduced the medallion architecture — a three-layer data refinement system. The bronze layer stores raw, unfiltered data exactly as received. The silver layer applies cleaning, deduplication, and business rules. The gold layer contains aggregated, semantically enriched data ready for dashboards, executive reports, and machine learning models. He reports that organizations adopting this architecture saw a 40% reduction in data processing failures and significantly faster time to insights. A companion 2021 study benchmarked PySpark against Scala for distributed data transformations, finding that processing language choice could reduce pipeline runtime from two hours to twenty minutes.
In 2022, Mangala shifted focus to data governance, arguing that technical accuracy alone cannot build business trust. He introduced Databricks Unity Catalog as a centralized governance layer — a 'control tower' that catalogs every dataset, tags PII columns, enforces masking and encryption, and provides audit trails. His research addressed the challenge of governing 50 business units with distinct access requirements under one unified framework, solving it with a federated model offering local autonomy with centralized policy enforcement.
The 2022 public release of ChatGPT triggered a turning point. AI's sudden enterprise prominence created both excitement and anxiety among data engineers. Mangala responded with 2023 research on MLOps-ready pipelines, redefining the gold layer of the medallion architecture as a feature store — a curated repository of pre-engineered attributes that machine learning models can directly consume in milliseconds. He also published research on automating PII compliance within AI-driven ecosystems, arguing that ungovernanced data fed into ML models produces 'confidently wrong answers at scale.'
His 2024 research explored prompt-driven data transformations, testing whether large language models like GPT-4 could generate production-ready ETL pipeline code from plain English descriptions. Results showed LLMs were production-ready for straightforward transformations with minimal editing, but human expertise remained essential for complex, domain-specific logic. The key distinction he emphasizes is that AI is an accelerator, not a replacement. A separate 2024 comparative study between Microsoft Fabric and Azure Databricks found neither universally superior: Databricks excels in raw compute performance and ML flexibility, while Fabric excels in unified governance, end-to-end integration, and lower total cost of ownership for mid-sized enterprises. The real question, he argues, is which platform fits an organization's specific data maturity, team skills, and three-year strategic ambitions.
By 2025, Mangala's research entered what he describes as territory that would have seemed like science fiction just three years earlier: agentic data pipelines. In this model, AI agents — not humans or pre-written scripts — make real-time decisions about how data flows, is transformed, and is routed. These agents autonomously detect anomalies, trace root causes, and either fix issues or escalate with detailed diagnoses. His research found that organizations adopting agentic orchestration reduced mean time to pipeline failure discovery by 60%. Parallel 2025 research addressed real-time feature engineering for streaming AI workloads, enabling fraud detection systems to enrich and evaluate transactions within 200 milliseconds using PySpark on Azure Event Hubs.
His 2026 research addresses responsible AI data architecture, arguing that GDPR and PII compliance cannot be afterthoughts or checkboxes — they must be architectural principles embedded from the bronze layer through model training. This includes PII tagging in Unity Catalog, differential privacy in feature engineering, audit trails for every transformation, and consent management integrated at the data ingestion layer.
Mangala closes by recapping the full journey: from the medallion architecture foundation in 2021, through governance in 2022, AI integration in 2023, platform evaluation in 2024, autonomous orchestration in 2025, and responsible compliance in 2026. He concludes that the distance from a data center to a decision-maker, once measured in days or weeks, is now measured in milliseconds — and that data which is clean, governed, AI-enriched, and ethically grounded is the foundation of a better, smarter world.
Key Insights
- Mangala's 2021 research found that organizations adopting medallion architecture on Azure Data Lake saw a 40% reduction in data processing failures, because for the first time data had a defined home, journey, and purpose across bronze, silver, and gold layers.
- Mangala argues that trust in data is not just about accuracy but about governance — knowing who accessed what data, when, and for what purpose — and that without this, even the most technically elegant pipeline will fail to earn business confidence.
- Mangala's 2024 research on prompt-driven data transformations found that LLM-generated ETL code was production-ready with minimal editing for straightforward transformations, but human expertise remained essential for complex domain-specific logic, leading him to characterize AI as an accelerator rather than a replacement.
- Mangala's 2025 research on agentic data pipelines found that organizations adopting AI-agent-based orchestration reduced mean time to pipeline failure discovery by 60%, because AI agents do not sleep, go on holiday, or need to manually scroll thousands of log lines to find a single failure point.
- Mangala argues in his 2026 research that GDPR and PII compliance cannot be a post-build checkbox but must be an architectural principle embedded from the bronze layer onward, including differential privacy in feature engineering and consent management integrated at the data ingestion layer, because regulated industries now treat these as table stakes.
Topics
Full transcript available for MurmurCast members
Sign Up to Access