WED, 03 JUN 2026 · 18:33:57 UTC

Databricks

FlagshipPlatform

USA·HQ San Francisco·Est. 2013

Data + AI platform — DBRX open-weights model.

9.0

our score

Our take

Databricks dominates the lakehouse category and is racing to own the enterprise AI stack from data prep to model serving.

At a glance

Best known for
Pioneering the lakehouse architecture and the DBRX open-weights model.
Biggest strength
End-to-end data + AI platform rooted in open-source Apache Spark ecosystem.
Biggest risk
Hyperscaler competition and high burn rate from AI investments before IPO.
Stage
Series J
Primary revenue
Subscription platform fees for lakehouse compute, storage, and AI/ML workloads.

What they do

Databricks sells a cloud data and AI platform built around the lakehouse concept—a single architecture that combines the flexible storage of data lakes with the reliability and performance of data warehouses. The core offering, Databricks Lakehouse, runs on a customer’s existing AWS, Azure, or Google Cloud account and layers open-source technologies such as Delta Lake, Apache Spark, and MLflow with proprietary optimization, governance, and collaboration tools. Data engineering teams use it for ETL and streaming, data scientists use it for model training and feature stores, and—increasingly—software developers use Mosaic AI and DBRX to embed generative AI into business applications.

The platform monetizes primarily through usage-based subscriptions tied to compute and storage consumed on the lakehouse, alongside premium capabilities in Unity Catalog governance and Mosaic AI model serving. Databricks targets large enterprises and fast-growing digital natives that need to unify analytics and machine learning on one copy of data rather than siloing them across separate warehouses and lakes. By shipping its own open-weights model and acquiring MosaicML, the company is trying to move up the stack from infrastructure provider to full-stack AI platform vendor.

Origin story

Databricks was founded in 2013 by a group of University of California, Berkeley researchers—Ali Ghodsi, Matei Zaharia, Ion Stoica, Reynold Xin, Patrick Wendell, and Andy Konwinski—who had created Apache Spark in the AMPLab. The company began as a commercial support and hosting provider for Spark, but soon pivoted to a fully managed cloud platform that abstracted cluster management and notebook collaboration for data teams. This academic-to-commercial lineage gave Databricks deep credibility in the open-source community and early enterprise traction with advanced analytics workloads.

The defining strategic shift came with the coining of the 'lakehouse' architecture in the late 2010s, positioning Delta Lake and the Databricks runtime as a replacement for legacy data warehouses. The model gained rapid enterprise adoption as companies sought to reduce data duplication between lakes and warehouses. In 2023, Databricks acquired MosaicML, signaling a decisive bet on generative AI model training and serving. The subsequent release of the DBRX open-weights model in 2024 and a landmark $10 billion Series J round at a $62 billion valuation cemented its status as the largest private data+AI platform vendor.

Key products

Databricks Lakehouse

Unified data and AI platform combining data lake storage with warehouse-style governance and performance for enterprise analytics.

Mosaic AI

2023

Suite for building, tuning, and deploying generative AI and ML models, built on MosaicML training infrastructure.

DBRX

2024

Open-weights mixture-of-experts large language model developed in-house for enterprise customization and private deployment.

Delta Lake

Open-source storage layer that brings ACID transactions and reliable data pipelines to cloud data lakes.

Unity Catalog

Unified governance solution for data and AI assets across clouds, formats, and compute engines.

Leadership

  • AG

    Ali Ghodsi

    Co-founder & CEO

    Former UC Berkeley AMPLab researcher; led Databricks from open-source support to a $62B data+AI platform.

  • MZ

    Matei Zaharia

    Co-founder & CTO

    Creator of Apache Spark; drives technical vision for the lakehouse and AI platform.

  • IS

    Ion Stoica

    Co-founder & Executive Chairman

    Berkeley computer science professor and AMPLab co-director; co-founded several successful infrastructure startups.

Funding history

Year
Round
Amount
Lead investors
  • 2013
    Series A
    $14M
    Andreessen Horowitz
  • 2024
    Series J
    $10B
    Thrive Capital, Andreessen Horowitz, GIC, Ontario Teachers' Pension Plan

Strengths & risks

Strengths

  • +Category creator of the lakehouse architecture, now standard for modern data platforms.
  • +Deep open-source moat via Apache Spark, Delta Lake, MLflow, and Unity Catalog.
  • +Multi-cloud deployment across AWS, Azure, and GCP with enterprise-grade security.
  • +Integrated AI stack from data ingestion to custom LLM training via MosaicML.
  • +Massive balance sheet after $10B Series J fuels aggressive R&D and acquisitions.

Risks

  • Direct competition from AWS, Azure, GCP, and Snowflake on price and features.
  • High capital intensity of training frontier models and integrating MosaicML.
  • Platform complexity creates longer onboarding cycles versus simpler SQL warehouses.
  • Delayed IPO risks employee retention and investor liquidity expectations.

Recent moves

  1. Acquired MosaicML to bolster generative AI stack

    June 2023

    Purchased MosaicML to add large-model training, inference optimization, and research talent, later integrated into the Mosaic AI product suite.

  2. Launched DBRX open-weights LLM

    Mar 2024

    Released DBRX, a mixture-of-experts model offered as open weights to attract enterprises seeking private-cloud and on-prem AI customization.

  3. Closed $10B Series J at $62B valuation

    Dec 2024

    One of the largest private financing rounds ever, providing capital to consolidate the data and AI market before an eventual IPO.

  4. Pushed Unity Catalog as open-source standard

    Mid-2024

    Open-sourced the governance layer to reduce vendor lock-in fears and drive cross-platform metadata adoption.

Competitive position

Databricks’ primary rival is Snowflake, which offers a simpler, SQL-first data cloud but has been racing to add AI and ML capabilities. Databricks generally wins when enterprises need advanced analytics, custom model training, or open-source flexibility; Snowflake wins when the buyer is a SQL-centric analyst team prioritizing ease of use and native data sharing. Against hyperscalers, Databricks competes by being multi-cloud and workload-agnostic, whereas AWS, Azure, and Google prefer to lock customers into native stacks. The MosaicML acquisition and DBRX release give Databricks a credible story for enterprises that want to own their models rather than rent APIs from OpenAI or Anthropic.

The platform’s biggest competitive vulnerability remains complexity. Implementing a lakehouse demands skilled data engineers and platform teams, creating an opening for simpler competitors and managed Spark alternatives. Databricks is investing heavily in serverless autoscaling and natural-language interfaces to close that gap. If it succeeds, the $62B valuation will look prescient; if not, large customers may default to the 'good enough' AI services bundled by their cloud provider.

What to watch

  • 01Path to profitability and operating cash flow given massive AI R&D spend.
  • 02Mosaic AI and DBRX adoption rates versus closed-model APIs from OpenAI and Anthropic.
  • 03Win rates in head-to-head enterprise deals against Snowflake and native cloud stacks.
  • 04IPO timing and ability to sustain $62B valuation in volatile tech markets.
  • 05Open-source community momentum for Delta Lake and Unity Catalog vs. competitor formats.

Frequently asked questions

What is a lakehouse and why did Databricks create it?

A lakehouse merges low-cost data lake storage with warehouse management and performance. Databricks invented the term to unify analytics, data engineering, and AI on one copy of data.

How does Databricks differ from Snowflake?

Databricks targets AI/ML and Spark-based workloads with an open-core model, while Snowflake emphasizes cloud-native SQL warehousing and simplicity for business analysts.

Is DBRX fully open source?

DBRX is released as open weights, letting enterprises download and customize it, but it does not meet the strict Open Source Initiative definition for full open-source licensing.

What did the MosaicML acquisition bring?

It added large-model training infrastructure, efficient inference engines, and top AI research talent, forming the backbone of the Mosaic AI generative AI suite.

Can Databricks run inside my own cloud account?

Yes. It is a managed service deployed within AWS, Azure, or Google Cloud, keeping data in the customer’s own environment for security and compliance.

Does Databricks support standard BI tools?

The platform offers SQL endpoints and connectors for Tableau, Power BI, and Looker, though its deepest capabilities lie in advanced analytics and machine learning.

When will Databricks IPO?

No date has been confirmed. While the company is widely expected to go public, timing depends on market conditions and readiness to justify its $62B valuation.

The bottom line

Databricks sits at the center of the enterprise data and AI convergence, with a $62B valuation and $10B war chest that signal clear ambition to outbuild hyperscalers and specialized rivals alike. Its lakehouse architecture has become industry shorthand, while the MosaicML acquisition and DBRX model give it genuine differentiation in generative AI infrastructure rather than mere repackaging of others' APIs. The strategic risk is execution at scale: integrating MosaicML, pushing Unity Catalog as an open standard, and simplifying a historically complex platform for mainstream data teams are all expensive, simultaneous bets.

If Databricks can translate its technical leadership into broader wallet share and a clear path to profitability, it is poised to become the default operating system for enterprise AI. A stumble in user experience, a prolonged private-market stay, or a pricing war with AWS and Snowflake could force a reset of expectations.

Visit Databricks

Key products

  • Databricks Lakehouse
  • Mosaic AI
  • DBRX

Latest announcements

11 entries
  1. Octopus Energy reduced margin data engineering costs by 50x while scaling for MHHS using Databricks.

  2. An overview of analytics strategies for pharmaceutical launches to accelerate early impact and sustain long-term success.

  3. New product capabilities enable production-ready tracing and observability for AI agents using OpenTelemetry and Unity Catalog.

  4. A case study on how the World Bank Group leverages Databricks to share knowledge and advance poverty eradication efforts.

  5. Financial services organizations can use Databricks Genie to make data insights more accessible across business teams.

  6. Guidance for security teams on effectively communicating cyber risk and metrics to board-level stakeholders.

  7. Strategies for leveraging observability data to proactively identify and prevent operational incidents.

  8. Databricks introduces prompt caching to accelerate LLM inference for open-source models on its platform.

  9. Energy and sustainability teams can move from emissions reporting to actionable decarbonization strategies.

  10. Partners are building industry-specific conversational AI solutions powered by Databricks Genie.

  11. Unlocking business user access to insights across the data estate.

Related companies

All companies →