January 15 (Wednesday), 2014
Time |
Title |
Invited Speakers |
12:00-13:00 |
Emerging Technologies for Big Data Management and Analytics |
Dr. Divy Agrawal (Professor, Dept. of Computer Science, Univ. of California at Santa Barbara & Visiting Scientist, Advertising Infrastructure,Google Inc.) |
14:10-15:40 |
Towards an Intelligent Keyword Search over XML and Relational Databases |
Dr. Tok Wang Ling (Professor, Dept. of Computer Science, National University of Singapore) |
Keynote Speech 1: Emerging Technologies for Big Data Management and Analytics
Dr. Divy Agrawal, Professor in the Dept. of Computer Science, Univ. of California at Santa Barbara
Visiting Scientist, Advertising Infrastructure, Google Inc.
Abstract:
During the past decade Google has been instrumental in establishing the broader research agenda in the context of Big Data management and Big Data analytics.
With the advent of BigTable and related technologies (Google File System, Chubby Lock Service, and Paxos based Distributed Consensus protocol),
Google initiated a data management revolution called NoSQL that has taken both data management researchers and practitioners by the storm. Numerous offerings,
both proprietary and in the open-source domain, are now available that essentially mimic Google’s approach for managing Big Data. Similarly, Google’s MapReduce
paradigm has resulted in the abandonment of established data analytics paradigm both within Google as well as in the broader commercial arena.
While academics and practitioners are enamored with Google’s Big Data technologies that are almost a decade old, Google is continuing to define the future
research agenda in the context of Big Data. In particular, Google has recognized the deficiencies of NoSQL approach for data management especially in the context
of data-centric products and services. In the recent past, Google has revealed a flurry of next-generation Big Data management technologies that provide stronger
consistency guarantees similar to traditional database management solutions. Notable examples being Megastore, Spanner, and a distributed database management
solution called F1. What is noteworthy is that all these technologies are inherently designed to be scalable and are multi-homed (i.e., can withstand large-scale
datacenter outages). In the same vein, in the context of Big Data analytics, Google has developed key technologies such as: Dremel, Photon, Power-drill,
and Mill-wheel. Dremel is a system that enables interactive analysis (as opposed to batched analysis using MapReduce) of Web-scale datasets. Photon is a system
that enables fault-tolerant and scalable joining of continuous data streams (e.g., query logs with advertising click logs). Power-drill is an analytic engine
that is capable of processing trillions of cells with a single mouse-click. Finally, Mill-wheel is a system that has been developed for fault-tolerant stream
processing at Internet scale.
In this presentation, we will introduce and summarize these point solutions from the recent research papers published by the research and engineering teams at
Google. The goal of this undertaking is to underscore that there is more to Big Data management and analytics than just BigTable and MapReduce especially in a
broader research and development context of Big Data.
|
Short Biography
Dr. Divyakant Agrawal is a Professor of Computer Science and the Director of Engineering Computing Infrastructure at the University of California
at Santa Barbara. His research expertise is in the areas of database systems, distributed computing, data warehousing, and large-scale information
systems. From January 2006 through December 2007, Dr. Agrawal served as VP of Data Solutions and Advertising Systems at the Internet Search Company
ASK.com. Dr. Agrawal has also served as a Visiting Senior Research Scientist at the NEC Laboratories of America in Cupertino, CA from 1997 to 2009.
During his professional career, Dr. Agrawal has served on numerous Program Committees of International Conferences, Symposia, and Workshops and
served as an editor of the journal of Distributed and Parallel Databases (1993-2008), and the VLDB journal (2003-2008). He currently serves as the
Editor-in-Chief of Distributed and Parallel Databases and is on the editorial boards of the ACM Transactions on Database Systems and IEEE Transactions
of Knowledge and Data Engineering. He has recently been elected to the Board of Trustees of the VLDB Endowment and elected to serve on the Executive
Committee of ACM Special Interest Group SIGSPATIAL. Dr. Agrawal's research philosophy is to develop data management solutions that are theoretically
sound and are relevant in practice. He has published more than 320 research manuscripts in prestigious forums (journals, conferences, symposia, and
workshops) on wide range of topics related to data management and distributed systems and has advised more than 35 Doctoral students during his academic
career. He received the 2011 Outstanding Graduate Mentor Award from the Academic Senate at UC Santa Barbara. Recently, Dr. Agrawal has been recognized
as an Association of Computing Machinery (ACM) Distinguished Scientist in 2010 and was inducted as an ACM Fellow in 2012. He has also been inducted as
a Fellow of IEEE in 2012. His current interests are in the area of scalable data management and data analysis in Cloud Computing environments, security
and privacy of data in the cloud, and scalable analytics over social networks data and social media. He is currently on a sabbatical leave from UCSB
and is serving as a Visiting Scientist in the Advertising Infrastructure Group at Google, Inc. in Mountain View, CA
|
Keynote Speech 2: Towards an Intelligent Keyword Search over XML and Relational Databases
Dr. Tok Wang Ling, Professor in the Dept. of Computer Science, National University of Singapore
Abstract:
Keyword search has been the major form of retrieval method in information retrieval system, and has become an important way for novice to explore
data-centric XML and relational databases (RDB). Recent years have witnessed many approaches proposed for keyword search over XML and RDB.
For XML keyword search, existing approaches are structure-based because they mainly rely on the exploration of the structure of XML data. These approaches
can be classified as tree-based and graph-based search. The tree-based search is used when an XML document is modeled as a tree, i.e. with no ID references
(IDREFs), while the graph-based search is used for XML documents with IDREFs. Almost all tree-based approaches are based on some variations of LCA
(Least Common Ancestor) semantics such as SLCA and ELCA. Due to the unawareness of semantics in XML data, these LCA-based approaches suffer from several serious
problems such as meaningless answers, duplicated answers, missing answers, etc.
For RDB keyword search, existing approaches are also structure-based because they rely on the foreign key-key references of RDB. These approaches can be classified
as data graph based and schema graph based. Data graph based keyword search on relational databases takes a relational database as a data graph. Each node in the
data graph represents a tuple in some relation in the database and each edge between two nodes in the data graph represents a foreign key-key reference between
the two tuples represented by these two nodes. An answer to a keyword query is defined as a minimal connected subgraph which contains nodes that match keywords
in the keyword query. On the other hand, schema graph based keyword search takes a relational database schema as a schema graph. Each node in the schema graph
represents a relation in the database and each edge between two nodes in the schema graph represents a foreign key-key reference between the two relations
represented by these two nodes. To answer a keyword query, a set of SQL queries are generated wrt possible interpretations of the keyword query. The results of
the SQL queries are considered as the answers. Without considering semantics in the database, these RDB keyword search techniques suffer from the problems of
retrieving incomplete, duplicated, and meaningless answers. Moreover, the retrieved answers are highly dependent on the schema of the relational database and
difficult to understand their intuitive meanings.
In this presentation, we point out mismatches between answers returned and the common user expectations in keyword search in XML and RDB. We analyze these mismatches and discover that the main reasons are due to the unawareness of the semantics of object, relationship, and attribute of object/relationship in databases. We refer to them as ORA-semantics. To capture the ORA-semantics, we propose Object Relationship (OR) data graph for XML and Object Relationship Mixed (ORM) data graph for RDB. Based on OR data graph and ORM data graph, we achieve an intelligent keyword search over XML and RDB which avoids the problems mentioned above.
To further facilitate the usability of keyword search, we also show our ongoing work to enhance the expressive power of keyword queries. Particularly, we enable users to explicitly indicate their search intentions with keywords matching relation name, attribute name, and tag name. We also handle recursive relationships and
identifier-dependency relationships (IDD) in databases. We incorporate aggregate functions into keyword queries so that users can explore databases with aggregate queries.
|
Short Biography
Dr. Tok Wang LING is a professor in the Department of Computer Science at the National University of Singapore. He was Head of IT Division,
Deputy Head of the Department of Information Systems and Computer Science, and Vice Dean of the School of Computing. He received his PhD and
M.Math, both in Computer Science, from University of Waterloo (Canada), and BSc in Mathematics from Nanyang University (Singapore). His research
interests include Database Modeling, Semi-Structured Data Modeling, XML Twig Pattern Query Processing, and Keyword Query Processing over XML and
Relational Databases. He serves/served on the steering committees of 4 international conferences, including ER and DASFAA. He served as Conference
Co-chair of 10 international conferences, including ER 2004, DASFAA 2005, SIGMOD 2007, and VLDB 2010, and as Program Committee Co-chair of 6
international conferences, including DASFAA 1995, and ER 1998, 2003 and 2011. He received the ACM Recognition of Service Award in 2007, the DASFAA
Outstanding Contributions Award in 2010, and the Peter P. Chen Award at ER 2011. He is an ER Fellow.
|