When: Friday, Jan. 12, 2018 @ 11 a.m. - Noon
Where: Marcus Nanotechnology Building, Room 1116-1117
Title: Multi-Dimensional Analysis of Massive Text Corpora
Speaker: Jiawei Han Abel Bliss Professor,
Department of Computer Science,
University of Illinois at Urbana-Champaign
The real-world big data are largely unstructured and interconnected, in the form of natural language text. It is highly desirable to conduct multi-dimensional analysis on massive text data. However, this poses a major challenge on how to transform unstructured text data into structured text and analyze such data in multidimensional space. To facilitate such analytical functionality, we propose a textcube modeling and discuss how to construct such cubes from massive text corpora and how to conduct multidimensional OLAP analysis using such textcubes. In the past several years, we have developed a text mining approach that only requires distant or minimal supervision but relies on massive data. We show (i) quality phrases can be mined from such massive text data, (ii) types can be extracted from massive text data with distant supervision, (iii) entities, attributes and values can be discovered by meta-path directed pattern discovery, (iv) faceted taxonomy can be constructed from massive corpora, (v) textcubes can be constructed from massive text, and (v) multi-dimensional analysis can be conducted on such cubes. We show such a paradigm represents a promising direction at turning massive text data into structured and useful knowledge.
Jiawei Han is Abel Bliss Professor in the Department of Computer Science, University of Illinois at Urbana-Champaign. He has been researching into data mining, information network analysis, database systems, and data warehousing, with over 900 journal and conference publications. He has chaired or served on many program committees of international conferences in most data mining and database conferences. He also served as the founding Editor-In-Chief of ACM Transactions on Knowledge Discovery from Data and the Director of Information Network Academic Research Center supported by U.S. Army Research Lab (2009-2016), and is the co-Director of KnowEnG, an NIH funded Center of Excellence in Big Data Computing since 2014. He is Fellow of ACM, Fellow of IEEE, and received 2004 ACM SIGKDD Innovations Award, 2005 IEEE Computer Society Technical Achievement Award, and 2009 M. Wallace McDowell Award from IEEE Computer Society. His co-authored book “Data Mining: Concepts and Techniques” has been adopted as a textbook popularly worldwide.