University of California
Main Page

Query and Goal Driven Entity Resolution Framework

Overview

The significance of data quality research is motivated by the observation that the effectiveness of data-driven technologies such as decision support tools, data exploration, analysis, and scientific discovery tools is closely tied to the quality of data on which such techniques are applied. It is well recognized that the outcome of the analysis is only as good as the data on which the analysis is performed. That is why today organizations spend a tangible percent of their budgets on cleaning tasks such as removing duplicates, correcting errors, filling missing values, to improve data quality prior to pushing data through the analysis pipeline.

Given the critical importance of the problem, many efforts, in both industry and academia, have explored systematic approaches to addressing the cleaning challenges. The work of our group focuses primarily on the entity resolution challenge that arises because objects in the real world are referred to using references or descriptions that are not always unique identifiers of the objects, leading to ambiguity.

The traditional approach for entity resolution uses features associated with a reference (or a record) to find references that co-refer. In our project we are exploring which other sources and types of information could be used, in addition to features, to better disambiguate among references. This information could be present in that data being cleaned itself or can be obtained from external data sources, including ontologies, encyclopedias, and the Web. We are also looking into ways to guide and fine-tune the data cleaning process based on the type of analysis that will be done on the data being cleaned for it to reach higher disambiagution quality as well as efficiency.

Faculty

Current Students

Alumni

Publications

  1. Progressive Approach to Relational Entity Resolution.
    Yasser Altowim, Dmitri V. Kalashnikov, and Sharad Mehrotra.
    In PVLDB, 7(11) Sep 1-5, 2014.
    [Download Paper] [Download Slides]

  2. Context Assisted Face Clustering Framework with Human-in-the-Loop.
    Liyan Zhnag, Dmitri V. Kalashnikov, Sharad Mehrotra
    In International Journal of Multimedia Information Retrieval (IJMIR), Springer, 2014
    [Download Paper]

  3. Efficient Summarization Framework for Multi-Attribute Uncertain Data.
    Jie Xu, Dmitri V. Kalashnikov, and Sharad Mehrotra.
    In Proc. of ACM SIGMOD Int'l Conf. on Management of Data (ACM SIGMOD), June 22-27, 2014.
    [Download Paper] [Download Slides]

  4. Query Aware Determinization of Uncertain Objects.
    Jie Xu, Dmitri V. Kalashnikov, Sharad Mehrotra
    In IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE), 2014
    [Download Paper]

  5. Query-driven approach to entity resolution.
    Hotham Altwaijry, Dmitri V. Kalashnikov, and Sharad Mehrotra.
    In Proc. of International Conference on Very Large Data Bases (VLDB 2013), Aug 26-30, 2013.
    [Download Paper] [Download Slides]

  6. Context-based Person Identification Framework for Smart Video Surveillance.
    Liyan Zhang, Dmitri V. Kalashnikov, Sharad Mehrotra, Ronen Vaisenberg
    In Machine Vision and Applications (MVA), 2013
    [Download Paper]

  7. A Unified Framework for Context Assisted Face Clustering.
    Liyan Zhang, Dmitri V. Kalashnikov, and Sharad Mehrotra.
    In ACM International Conference on Multimedia Retrieval (ACM ICMR 2013), Apr 16-19, 2013.
    (Best Paper Award)
    [Download Paper] [Download Slides]

  8. Adaptive connection strength models for relationship-based entity resolution.
    Rabia Nuray-Turan, Dmitri V. Kalashnikov, and Sharad Mehrotra
    In ACM Journal of Data and Information Quality (ACM JDIQ), 2013
    [Download Paper]

  9. Exploiting web querying for web people search.
    Rabia Nuray-Turan, Dmitri V. Kalashnikov, and Sharad Mehrotra
    In ACM Transactions on Database Systems (ACM TODS), 37(1), February 2012
    [Download Paper]

  10. Attribute and Object Selection Queries on Objects with Probabilistic Attributes.
    Rabia Nuray-Turan, Dmitri V. Kalashnikov, Sharad Mehrotra, and Yaming Yu.
    In ACM Transactions on Database Systems (ACM TODS), 37(1), February 2012
    [Download Paper]

  11. Video Entity Resolution: Applying ER Techniques for Smart Video Surveillance.
    Liyan Zhang, Ronen Vaisenberg, Sharad Mehrotra, and Dmitri V. Kalashnikov.
    In Workshop on Information Quality and Quality of Service for Pervasive Computing (IQ2S 2011) in Conjunction with IEEE PERCOM 2011, invited paper, Mar 21-25, 2011.
    [Download Paper]

  12. Exploiting Context Analysis for Combining Multiple Entity Resolution Systems.
    Zhaoqi Stella Chen, Dmitri V. Kalashnikov, and Sharad Mehrotra.
    In Proc. of ACM SIGMOD Int'l Conf. on Management of Data (ACM SIGMOD), June 29-July 2, 2009.
    [Download Paper]

  13. Exploiting Web querying for Web People Search in WePS2.
    Rabia Nuray-Turan, Zhaoqi Chen, Dmitri V. Kalashnikov, and Sharad Mehrotra.
    In 2nd Web People Search Evaluation Workshop (WePS 2009), 18th WWW Conference, April, 2009.
    [Download Paper]

  14. WEST: Modern Technologies for Web People Search.
    Dmitri V. Kalashnikov, Zhaoqi Chen, Rabia Nuray-Turan, Sharad Mehrotra, and Zheng Zhang.
    In Proc. of IEEE International Conference on Data Engineering (IEEE ICDE), demo publication, March 29 - April 4, 2009.
    [Download Paper]

  15. Web people search via connection analysis.
    Dmitri V. Kalashnikov, Zhaoqi Chen, Rabia Nuray-Turan, and Sharad Mehrotra.
    In IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE), 20(11), November 2008
    [Download Paper]

  16. Towards breaking the quality curse. A web-querying approach to Web People Search.
    Dmitri V. Kalashnikov, Rabia Nuray-Turan, and Sharad Mehrotra.
    In Annual International ACM SIGIR Conference, July 20-24, 2008.
    [Download Paper]

  17. Adaptive Graphical Approach to Entity Resolution.
    Stella Chen, Dmitri V. Kalashnikov, Sharad Mehrotra.
    In Proc. of ACM IEEE Joint Conference on Digital Libraries (ACM IEEE JCDL), June 17-23, 2007.
    [Download Paper]

  18. Self-tuning in Graph-based Reference Disambiguation.
    Rabia Nuray-Turan, Dmitri V. Kalashnikov, and Sharad Mehrotra.
    In Proc. of Int'l Conf. on Database Systems for Advanced Applications (DASFAA), Apr 9-12, 2007.
    [Download Paper]

  19. Disambiguation Algorithm for People Search on the Web.
    Dmitri V. Kalashnikov, Stella Chen, Rabia Nuray, Sharad Mehrotra, and Naveen Ashish.
    In Proc. of IEEE International Conference on Data Engineering (IEEE ICDE), short publication, April 16-20, 2007.
    [Download Paper]

  20. Domain-independent data cleaning via analysis of entity-relationship graph.
    Dmitri V. Kalashnikov and Sharad Mehrotra
    In ACM Transactions on Database Systems (ACM TODS), June 2006
    [Download Paper] [Code]

  21. Exploiting relationships for object consolidation.
    Zhaoqi Chen, Dmitri V. Kalashnikov, and Sharad Mehrotra.
    In Proc. of International ACM SIGMOD Workshop on Information Quality in Information Systems (ACM IQIS), June 13-17, 2005.
    [Download Paper]

  22. Exploiting relationships for domain-independent data cleaning.
    Dmitri V. Kalashnikov, Sharad Mehrotra, and Zhaoqi Chen.
    In Proc. of SIAM International Conference on Data Mining (SIAM Data Mining), April 21--23, 2005.
    [Download Paper] [Code]

Software


  • RelDC - code for relationship-based entity resolution
  • Acknowledgement

    This material is based upon work supported by the National Science Foundation under Grant No. 1118114. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.


    © 2013 SHERLOCK @ UCI. All Rights Reserved.