crazieh
crazieh
Barbara Lake Wisdom
1 post
Don't wanna be here? Send us removal request.
crazieh · 4 years ago
Text
Myths about data lakes and their role in enterprise data storage
Data Lake is the latest trend in the market. There are certain misconceptions and myths which are proliferated across the community of data management. To gain more insights, it is necessary to gain information about the myths about Data lakes. In the beginning, you need to define what a data lake is so that you can understand that everyone is on the same page.
A data lake solutions is the user-defined method that helps to organize diverse and large volumes of data. The data lake can be used on different data management platforms, like relational databases, Hadoop clusters, clouds, relationship databases, etc. Based on the platform, the data lake can handle a variety of data types, including structured data, semi-structured data, and unstructured data.
Tumblr media
For most business enterprises, the data lake offers support to several use cases, including data warehouse extensions, advanced analytics, broad data exploration, data staging, and data landing. Data lakes are beneficial in different departments such as supply chain and marketing and various industries like logistics and healthcare.
Here are some of the myths associated with data lakes.
Data Lakes are useful for Internet organizations only
Internet firms were the pioneer of data lake and Hadoop. We will always be thankful to them for bringing such massive innovations to the industry. However, there are several other companies that have come up with data lakes in the production in different mainstream industries, like insurance, finance, healthcare, pharma, and telco.
Few data lakes serve various departmental analytics and operations. Other organizations have come up with several analytic forms that operate on the Data lake, which are inclusive of clustering, text and data mining, predictive analytics, graph, and natural language processing. It would be best if you keep in mind that data lake-based analytics supports a variety of applications like customer segmentation, risk calculations, security breaches, fraud detection, and insider trading, to name a few.
Data Lake is a dumping ground
At times, the database might turn into a dumping ground. However, early adopters do not treat the data lake as a mere dumping ground. Instead, the data lake is treated as the balancing act. However, few of the customers dump the data, whereas many of them do not. Data scientists, Data analysts, and power users should create data sandboxes in work. They can take the data out and into the lake freely, till they can govern themselves. But the majority of other users need to petition the lake curator, or steward, who will vet the incoming data.
Hadoop is a for Data Lake
The latest survey has revealed that more than half of data lakes involved in production are exclusively on Hadoop. But Hadoop is not a must-have for data lakes. Few of the data lakes are on the relational database management systems. It would be best if you keep in mind that data lake is not like any other logical data architecture, which is distributed physically across several platforms.
It explains why a certain part of data lakes are deployed on top of the Hadoop cluster, which is known to be integrated with any RDBMS. There are chances that each one of them will turn into a cloud.
Lake Data is a product that can be purchased
Data Lake is the reference architecture that is not dependent on technology. It happens to be an approach used by business organizations to use data as the focal point of business operations. It is inclusive of quality, governance, and data management, which allows self-service analytics to provide empowerment to data customers. It would be best to remember that the data lake is not any other product, which can be purchased. It is not possible to purchase a data warehouse product and refer to it as the data lake.
Customers will come only if we create Data Lakes
Implementation of Data Lake does not necessarily indicate that technical and business users will flock into it automatically. They will not come till there is a compelling business case. Business users need to perform data preparation, data exploration, and visualization with the aid of Data Lake. Instead, they want the data in the self-service fashion. Also, the potential audience will not be able to stay if you offer trusted, supreme quality, and governed data. Also, business users will not be successful without any certain training and consultants.
All the Data Lakes get converted into data swamps.
There is no doubt that the data lake might get converted into the data swamp. It is recognized to be a disorganized and undocumented data store, and trusting, using, and navigating the data store can be challenging. Data swamp results, owing to the absence of data governance, curation, stewardship, lack of control over the incoming data, and access to the data lake.
A data lake can be a replacement for the data warehouse
The data lake is known to incorporate several data warehouses along with different data sources. All of them come from the data lake in which the governance will be embedded, simplification of trusted data discovery for the users across the business organization.
The data lake will augment different EDW environments that provide the suitable choice to enable and empower data analysts and data scientists to easily explore the data. It also provides a helping hand in discovering new insights and new perspectives. Besides this, it is useful in boosting business growth and accelerating innovation.
Summary
Thanks to the Internet of Things, applications, and smart devices, the amount of unstructured data will grow exponentially. So, the demand for storing the data will intensify. With the adoption data lake, there will be an increase in the majority of the organization across the globe. If you want to avoid a data swamp, the data steward needs to curate the lake data, whereas the governance policies should be capable of defining the standards and controls for the lake and the data.
1 note · View note