Mining Approximate Acyclic Schemas from Relations

Published in ACM SIGMOD International Conference on Management of Data, 2020

Abstract

Acyclic schemes have numerous applications in databases and in machine learning, such as improved design, more effi- cient storage, and increased performance for queries and ma- chine learning algorithms. Multivalued dependencies (MVDs) are the building blocks of acyclic schemes. The discovery from data of both MVDs and acyclic schemes is more chal- lenging than other forms of data dependencies, such as Func- tional Dependencies, because these dependencies do not hold on subsets of data, and because they are very sensitive to noise in the data; for example a single wrong or missing tuple may invalidate the schema. In this paper we present Mai- mon, a system for discovering approximate acyclic schemes and MVDs from data. We give a principled definition of ap- proximation, by using notions from information theory, then describe the two components of Maimon: mining for ap- proximate MVDs, then reconstructing acyclic schemes from approximate MVDs. We conduct an experimental evaluation of Maimon on 20 real-world datasets, and show that it can scale up to 1M rows, and up to 30 columns.

Download Paper Here