In the era of data-driven decision-making, the availability of vast and diverse datasets is crucial for organisations. Data marketplaces are emerging as the go-to platforms for acquiring, sharing, and monetising data, revolutionising the way we approach data acquisition. These marketplaces offer a wide range of benefits, like the convenience of online shopping, where a vast array of products is just a click away.
Just as online shopping brings together multiple vendors and allows users to browse, compare and select products based on their needs, data marketplaces serve as digital market hubs, connecting data providers and data consumers. In this article, we will delve into some important building blocks in developing data marketplaces, in reference to the DOME 4.0 platform [1], which is an industrial data marketplace developed as part of the DOME 4.0 project [2].
A Collection of Databases or Data Catalogues
A data marketplace consists of a collection of databases or data catalogues; an inventory of available datasets for potential consumers. Depending on the focus and scope of the marketplace, it can cater to specific industries or a wide range of data categories. In some cases, it can be a collection of curated datasets provided by the platform itself, or, as is the case for the DOME 4.0 platform, the databases can be registered by independent data providers – similar to how applications are registered on an app store. In some data marketplaces, there can also be data consumer tools, which are applications like data analytics software, machine learning algorithms, visualisation tools, etc. which users can use to directly inspect, explore and get value out of the data.
Easy Onboarding of Data Providers
For data providers to register and provide data on a platform, onboarding and connecting to the marketplace must be made easy. To simplify the registration process for data providers, data marketplaces can offer standardised plugin templates. These templates serve as a starting point, providing a pre-defined structure that data providers can customise for their specific databases.
By using the template to create their plugins, data providers ensure that their datasets can be uniformly queried within the marketplace. During registration, data providers provide information about their datasets, including metadata such as data model description, format, size and licensing terms. Additionally, data consumer tools, such as data analytics software or visualisation tools, can also be registered in a similar manner. DOME 4.0 provides plugin templates for both data providers and consumers, facilitating seamless onboarding.
Enabling Enhanced Search Functionalities
The data marketplace places great emphasis on an efficient data discovery process, making advanced search functionalities a crucial feature. As part of its ongoing development, the DOME 4.0 platform is actively exploring methods to enhance search efficiency, ensuring its effectiveness in the future. Two notable examples include leveraging metadata standards and ontologies, as well as harnessing the power of indexing and AI, to facilitate seamless and effective dataset retrieval for users.
Leveraging Metadata Standards and Ontologies:
Data marketplaces, in adherence to the FAIR principles [3] (i.e., to make data findable, accessible, interoperable and reusable), employ metadata standards and ontologies to enable efficient search and discovery of datasets. Metadata plays a crucial role in providing additional information about the datasets, ensuring that data consumers can assess their relevance and compatibility with their specific needs. By utilising metadata standards and mapping them to ontologies, data marketplaces foster seamless search and filtering capabilities, increasing the findability, accessibility and reusability of datasets. This adherence to the FAIR principles ensures that data consumers can easily locate and access the most suitable datasets for their needs.
Furthermore, this mapping of data concepts to ontological concepts enables the marketplace to effectively match appropriate datasets with the capabilities and requirements of registered data consumer tools, enhancing the interoperability of the data ecosystem. DOME 4.0 has developed ontologies and does mapping of datasets to these ontologies during platform registration to support interoperability across data providers and data consumers. These features add value to data by promoting the exchange and collaboration of data, which allows for the combination and analysis of diverse datasets. The ability to easily integrate and leverage data from multiple sources, along with the use of compatible tools for analysis, enhances its value by providing a complete and comprehensive understanding of the information.
Harnessing the Power of Indexing and AI:
Once data providers have registered their databases, the platforms can generate indexes based on the data content. Indexing involves creating data structures that optimise data retrieval operations, significantly improving search speed and efficiency, even with large volumes of data. Furthermore, data marketplaces can leverage natural language processing (NLP) to enhance the search experience, making it more intuitive and user-friendly.
Additionally, AI techniques like machine learning can be employed to enhance search relevance. For example, embeddings, which are vector representations of datasets based on their characteristics and relationships, can be analysed using AI algorithms to identify patterns, correlations and similarities.
As data marketplaces continue to evolve and thrive, they are revolutionising the accessibility, reusability and abundance of data. This offers immense potential for unlocking insights, driving innovation and fostering collaboration in the ever-changing data-driven landscape of tomorrow.
References:
[1]: Website: https://dome.the-marketplace.eu/
[2]: https://dome40.eu/, DOME 4.0 project, funded by Horizon 2020, the European Union’s Horizon 2020 Research, and Innovation Programme (Grant Agreement no 953163).
[3]: Wilkinson et al., The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data volume 3, 160018, 2016. https://www.nature.com/articles/sdata201618
0 comments on “Developing a Data Marketplace: Discovering Data in Distributed Databases ”