The simple answer to the question, What is Azure Databricks?, is that it is a Databricks Workspace integrated into the Azure platform. This leads to the question, What is Databricks? And the simple answer to that question is that Databricks is a platform that allows you to easily manage and interact with an Apache Spark analytics service. Which of course leads to the question, What is Apache Spark?
A high level answer to, What is Spark? – An open-source cluster computing framework for real-time processing. And a high level explanation of “cluster computing framework” is a group of computers that can parallelize large volumes of work but is seen as a single computer from the end users perspective. Which doesn’t tell you much, but it is enough to start using Databricks and all of the data connections that come with it.
A Databricks workspace allows you to use the computing power of Spark to analyze data. Despite what I always pictured when I first heard the name Databricks, there are no permanent bricks of data when using Azure Databricks. The storage is usually a database, streaming data, or files on Azure Blob Storage. But using Python or Scala you can create easily manipulated “bricks”, called data frames, irrespective of the data source. This ability to quickly combine different data sources and to use the analytical libraries of languages like Python or R makes Spark a great tool for data engineers and data scientists. Databricks provides a cloud platform that removes the need to provision your own installation of Apache Spark. The Databricks framework allows you to quickly create Notebooks using Python or Scala, spin up a Spark cluster, which can run jobs in parallel, connect to Azure Blob storage, and schedule jobs.
TLDR What is Azure Databricks? – A computer running in the Azure cloud that comes with the a large set of APIs letting you access structured and unstructured data and run code on it.
I realize this doesn’t dig into how Spark technology actually works, but that is more of a book than a blog post. If you are curious to learn more you can download the free e-book from Databricks, Apache Spark Under the Hood.
“Data!data!data!” he cried impatiently. “I can’t make bricks without clay.”― Arthur Conan Doyle, The Adventure of the Copper Beeches