What is a data product and why everyone needs one?

In the world of data, a Data Product can be defined as an entity that is generated from raw/source data and presents a consistent interface to all types of data files, APIs, streams, databases. It can provide additional metadata on top of the data and there is Governance and Quality involved. Data Products are part of a broader set of products called Data Services which include Machine Learning & AI Platforms, Cybersecurity & IoT Security Services, Big Data Analytics Services and Analytic Consulting Services.

Data Products are ready-to-use entities that let you consume data products one way or another. They make it possible to get actionable insights almost in real-time. This can be done by exposing the raw data, which requires investments in computation and storage infrastructure, or by creating derived data products through multiple steps of transformation and other technical approaches. Automating the process of producing raw data products can result in substantial savings in terms of both money and time. However, this might not be true for many organizations because of data governance, quality and integrity considerations (which I will discuss later) that are often not considered during this process but rather after it’s deployed.

Data Products are much like other products, in that they present a ready-to-use entity. However, in the case of data, this entity is the combination of raw data and additional metadata – where “raw” comes from any batch, stream, real-time or API source. So a spreadsheet is not a Data Product; just as leather or fabric are not shoes, and so forth. A Data Product built on top of ready-to-use data may present a consistent interface to all types of data files as well as APIs and databases – meaning that anyone can consume it with ease.