External reviews
External reviews are not included in the AWS star rating for the product.
One platform for all Data Management & Analytics
What do you like best about the product?
Seamless integration between spark, pyspark, scala, sparkr, and SQL APIs with Cloud Storages, Easy to use and schedule streaming and batch services with delta lake as storage for all data engineering needs with git integration and revision control.
What do you dislike about the product?
UI can be a little more like VSCode or cloud editor to give you more choices, modular code, packaged code for better unit testing, and CI CD can improve the developer experience drastically.
What problems is the product solving and how is that benefiting you?
Handling multiple tools for different data roles is an issue for many organizations. Databricks provides Data ingestion, storage, data engineering, analytics, modeling, and deployment all at one place with scale to handle petabytes of data processing using the power of spark distributed processing.
- Leave a Comment |
- Mark review as helpful
Lakehouse: Great Goals with poor execution
What do you like best about the product?
Lakehouse platform is a solution that is easy to setup, the infrastructure is easy to maintain and the UI is accessible to a wide variety of engineers.
It allows for a fast rollout to production and covers most common needs of a data company.
It allows for a fast rollout to production and covers most common needs of a data company.
What do you dislike about the product?
The biggest kink in Lakehouse platform is its speed. It does not deliver on the performance promised.
In addition, the Databricks UI is not easy to use. It feels like it's a smartphone app.
On the side of technology, it is slow and expensive, with authorization added as an afterthought.
It's an absolute pain to administer and hard to control expenses.
In addition, the Databricks UI is not easy to use. It feels like it's a smartphone app.
On the side of technology, it is slow and expensive, with authorization added as an afterthought.
It's an absolute pain to administer and hard to control expenses.
What problems is the product solving and how is that benefiting you?
We used lakehouse to ingest the events from our event based infrastructure. We produced a moderate amount of events and they all landed in the Lakehouse for analysis and additional processing.
Unified Platform & Collaborative Workspace for Data & AI/ML team
What do you like best about the product?
Databricks Serverless SQL with Photon Query acceleration for data analyst & business analyst
In-built Visualization & dashboards, along with GeoSaptial & Advanced SQL functions
Unified Pipeline for Structure streaming batch & real-time ingestion
Auto-loader for standard formats of file ingestion & Schema Evolution in-built
Delta Live Table for data Engineering Workloads & Pipelines
Databricks Multi-task Orchestration job worklfows
Unity Catalog Metstaore & its integration with other data catalogs
MLFlow for building and tracking ML experiments & Feature Store for centralized feature supply for production/inference models
Time Travel & Z-order Optimization
In-built Visualization & dashboards, along with GeoSaptial & Advanced SQL functions
Unified Pipeline for Structure streaming batch & real-time ingestion
Auto-loader for standard formats of file ingestion & Schema Evolution in-built
Delta Live Table for data Engineering Workloads & Pipelines
Databricks Multi-task Orchestration job worklfows
Unity Catalog Metstaore & its integration with other data catalogs
MLFlow for building and tracking ML experiments & Feature Store for centralized feature supply for production/inference models
Time Travel & Z-order Optimization
What do you dislike about the product?
Need to build a more comprehensive orchestration workflow JOBS panel for a diverse set of pattern design workflows
Serverless Cluster for Data Engineering Streaming/Batch pipelines
Integrate most IDE features into the notebook
Clear documentation on Custom Databricks runtime docker image creation will be helpful
Lineage & flow monitoring dashboard can be built automated for non-DLT jobs as well
DLT implementation can be extended to other DELTA format supporting warehouse in future
Serverless Cluster for Data Engineering Streaming/Batch pipelines
Integrate most IDE features into the notebook
Clear documentation on Custom Databricks runtime docker image creation will be helpful
Lineage & flow monitoring dashboard can be built automated for non-DLT jobs as well
DLT implementation can be extended to other DELTA format supporting warehouse in future
What problems is the product solving and how is that benefiting you?
Unified Pipeline for Structure streaming batch & real-time ingestion
The schema merge feature helps to track the change in Schema
DLT feature helps to build Data Quality Lineage along with automated Pipeline links to the reference LIVE tables
Auto-loader helps to build the common ingestion framework for our enterprise
The schema merge feature helps to track the change in Schema
DLT feature helps to build Data Quality Lineage along with automated Pipeline links to the reference LIVE tables
Auto-loader helps to build the common ingestion framework for our enterprise
Solid Data Platform
What do you like best about the product?
Fast iterative abilities and notebook baseed UI. It is also helpful to have multiple contributors on a single notebook at one time. You can see where others are in the notebook which helps with collaboration.
What do you dislike about the product?
The UI frequently exhibits unintended behavior. I will occasionally have random characters added to random cells in the notebook causing errors. It makes debugging difficult when you made no changes and a working cell is now causing errors.
What problems is the product solving and how is that benefiting you?
We are using Databricks to move large amounts of data. Our team is able to run different ETL pipelines with different schedules in an organized way. We are able to quickly iterate on our notebooks to add new features.
Great platform for working collaboratively
What do you like best about the product?
- Ability to edit the same notebook with collaborators
- GitLab compatibility
- Multiple languages supported
- Broad functionality allows most of our digital teams to use it for their own needs
- Spark compute is fast and the amount of processors on a cluster is clear
- GitLab compatibility
- Multiple languages supported
- Broad functionality allows most of our digital teams to use it for their own needs
- Spark compute is fast and the amount of processors on a cluster is clear
What do you dislike about the product?
- UI is constantly changing, and changes are not announced with any leadup
- UI can be buggy - WebSocket disconnects, login timeouts, copy/pasting into incorrect cells
- Pricing structure is a little opaque - DBUs don't have a clear dollar-to-time amount
- Notebook structure isn't perfect for production engineering, better for ML or ad-hoc operations
- UI can be buggy - WebSocket disconnects, login timeouts, copy/pasting into incorrect cells
- Pricing structure is a little opaque - DBUs don't have a clear dollar-to-time amount
- Notebook structure isn't perfect for production engineering, better for ML or ad-hoc operations
What problems is the product solving and how is that benefiting you?
- Maintains access to all of our business data on both AWS and Azure, and can switch between those platforms
- Has an interface for data scientists, engineers, and business users and prevents needing to buy additional tools
- Allows big data applications to run without having to do much Spark configuration
- Has an interface for data scientists, engineers, and business users and prevents needing to buy additional tools
- Allows big data applications to run without having to do much Spark configuration
BIA
What do you like best about the product?
Databricks is an excellent tool for data processing and analysis. The platform is user-friendly and intuitive, making it easy for team members of all technical skill levels to collaborate and work on data projects. The integration with popular data storage systems and the ability to run both SQL and Python code make it a versatile option for handling a variety of data types and tasks. The platform also offers robust security features and the ability to scale resources as needed. Overall, I highly recommend Databricks for anyone looking for a reliable and efficient data platform.
What do you dislike about the product?
Nothing. I like the UI and the toggle between python and sql
What problems is the product solving and how is that benefiting you?
Visualization and table is the best for my case
Great All-in-One Platform for data handling
What do you like best about the product?
- Repo deployment allows my team to collaboratively develop against databricks resources while still using their local development toolkit, and quickly deploy out to it when they're ready
- Delta live tables are a breeze to set up and get streaming data into the lakehouse
- Language mixing is very nice; most of my data engineering work is SQL focused, however I can leverage Python or Scala for more complex data manipulation, all within the same notebook
- Delta live tables are a breeze to set up and get streaming data into the lakehouse
- Language mixing is very nice; most of my data engineering work is SQL focused, however I can leverage Python or Scala for more complex data manipulation, all within the same notebook
What do you dislike about the product?
- Data explorer can be incredibly slow and cumbersome if your datalake is unevenly distributed
- Cold starting clusters can take a frustratingly long amount of time, at least for the way our clusters are set up (the minimum size for our cluster options are i3.xlarge on AWS)
- While developing in notebooks is nice, the concept of running notebooks in production where anyone can edit from the ui is concerning, wish there was more ways to "lock" down production processes
- Cold starting clusters can take a frustratingly long amount of time, at least for the way our clusters are set up (the minimum size for our cluster options are i3.xlarge on AWS)
- While developing in notebooks is nice, the concept of running notebooks in production where anyone can edit from the ui is concerning, wish there was more ways to "lock" down production processes
What problems is the product solving and how is that benefiting you?
As a data engineer, databrick has been huge in speeding up my ETL development time, connecting to external databasing and rapidly creating new data objects in a sustainable way
Great way to automate
What do you like best about the product?
I have been actively engaged in Databricks training and I find it very relevant to the work our organization does. We usually have large amounts of data we need to process for our power generation and revenue needs, and I find that Databricks can be a one-stop shop for our automation and streamlining the process.”
What do you dislike about the product?
I believe it could be a steep learning curve for someone who may not know how to program or have a general understanding of it. The best way to work around this is to follow training offered on data bricks.
What problems is the product solving and how is that benefiting you?
We need to build processes around our time-series data for generation and flow. This platform allows us to build quick process and intuitive dashboard which help in quick data processing and workflow setup.
Journey: Delta Lake to Lakehouse
What do you like best about the product?
Databricks' Lakehouse platform combines the capabilities of a data lake and a data warehouse to provide a unified, easy-to-use platform for big data processing and analytics. The platform automatically handles tasks such as data ingestion, data curation, data lineage, and data governance, making it easy to manage and organize large amounts of data. The platform includes features such as version control, collaboration tools, and access controls, making it easy for teams to work together and ensure compliance with data governance policies.
What do you dislike about the product?
The amount of time to spin up a new cluster takes around 10-15 minutes. Moreover, the limited resources and learning materials for new users become challenging. If data bricks can provide more learning resources will be great.
What problems is the product solving and how is that benefiting you?
The platform allows for seamless integration of data from various sources, including structured, semi-structured, and unstructured data, and provides a unified view of all data stored in the lake. The platform includes features such as version control, collaboration tools, and access controls, making it easy for teams to work together and ensure compliance with data governance policies.
Built to accelerate development
What do you like best about the product?
I have been using databricks for almost 4 years and it has been a great asset to our development as a team and our product.
Shared folders of re-usable and tracked notebooks allow us to work on tasks only once, minimising duplication of work, which in turn accelerates development cycle.
One of my personal favourites are the workflows, that allowed us to automate a variety of tasks, which availed capacity for us to focus on the right problems at the right time.
Another great selling point for me, is that collaborators can see each other typing and highlighting live.
Shared folders of re-usable and tracked notebooks allow us to work on tasks only once, minimising duplication of work, which in turn accelerates development cycle.
One of my personal favourites are the workflows, that allowed us to automate a variety of tasks, which availed capacity for us to focus on the right problems at the right time.
Another great selling point for me, is that collaborators can see each other typing and highlighting live.
What do you dislike about the product?
UX could be improved
While I appreciate the addition of new features, developments and experiments, the frequency of changes made it tiring and frustrating for me recently.
Too much, too frequently. The 'new notebook editor' is a great example here. The editor itself could be a very useful change, but changing all the keyboard shortcuts at the same time without letting the user know is questionable to me.
I would prefer it, if changes were rolled out less frequently with detailed patch updates (see Dota 2 for example), and configurable options in the user settings.
E.g. I would use the experimental 'new notebook editor' if I could keep the keyboard shortcuts the same.
Less frequent, more configurable updates please.
One of the biggest pain point for me is the Log In and Log Out process. Why does Databricks have to log me out every couple of hours? Especially while I am typing in a command cell?
Could this be improved please?
Also, would love it if libraries on clusters could be updated without having to restart the cluster.
Having said all this, I do love some of the new features, such as the new built-in visualisation tool, however would love it even more if titles could be added and adjusted.
While I appreciate the addition of new features, developments and experiments, the frequency of changes made it tiring and frustrating for me recently.
Too much, too frequently. The 'new notebook editor' is a great example here. The editor itself could be a very useful change, but changing all the keyboard shortcuts at the same time without letting the user know is questionable to me.
I would prefer it, if changes were rolled out less frequently with detailed patch updates (see Dota 2 for example), and configurable options in the user settings.
E.g. I would use the experimental 'new notebook editor' if I could keep the keyboard shortcuts the same.
Less frequent, more configurable updates please.
One of the biggest pain point for me is the Log In and Log Out process. Why does Databricks have to log me out every couple of hours? Especially while I am typing in a command cell?
Could this be improved please?
Also, would love it if libraries on clusters could be updated without having to restart the cluster.
Having said all this, I do love some of the new features, such as the new built-in visualisation tool, however would love it even more if titles could be added and adjusted.
What problems is the product solving and how is that benefiting you?
Databricks is used as the core of our research environment.
It is used to provide quick and efficient analysis of whatever question or problem might arise while keeping the production environment safe and undisturbed.
It is used to provide quick and efficient analysis of whatever question or problem might arise while keeping the production environment safe and undisturbed.
showing 221 - 230