BigComp 2014

Keynote Speeches

January 15 (Wednesday), 2014

Time	Title	Invited Speakers
12:00-13:00	Emerging Technologies for Big Data Management and Analytics	Dr. Divy Agrawal (Professor, Dept. of Computer Science, Univ. of California at Santa Barbara & Visiting Scientist, Advertising Infrastructure,Google Inc.)
14:10-15:40	Towards an Intelligent Keyword Search over XML and Relational Databases	Dr. Tok Wang Ling (Professor, Dept. of Computer Science, National University of Singapore)

Keynote Speech 1: Emerging Technologies for Big Data Management and Analytics
Dr. Divy Agrawal, Professor in the Dept. of Computer Science, Univ. of California at Santa Barbara Visiting Scientist, Advertising Infrastructure, Google Inc.

Abstract:
During the past decade Google has been instrumental in establishing the broader research agenda in the context of Big Data management and Big Data analytics. With the advent of BigTable and related technologies (Google File System, Chubby Lock Service, and Paxos based Distributed Consensus protocol), Google initiated a data management revolution called NoSQL that has taken both data management researchers and practitioners by the storm. Numerous offerings, both proprietary and in the open-source domain, are now available that essentially mimic Google’s approach for managing Big Data. Similarly, Google’s MapReduce paradigm has resulted in the abandonment of established data analytics paradigm both within Google as well as in the broader commercial arena.

While academics and practitioners are enamored with Google’s Big Data technologies that are almost a decade old, Google is continuing to define the future research agenda in the context of Big Data. In particular, Google has recognized the deficiencies of NoSQL approach for data management especially in the context of data-centric products and services. In the recent past, Google has revealed a flurry of next-generation Big Data management technologies that provide stronger consistency guarantees similar to traditional database management solutions. Notable examples being Megastore, Spanner, and a distributed database management solution called F1. What is noteworthy is that all these technologies are inherently designed to be scalable and are multi-homed (i.e., can withstand large-scale datacenter outages). In the same vein, in the context of Big Data analytics, Google has developed key technologies such as: Dremel, Photon, Power-drill, and Mill-wheel. Dremel is a system that enables interactive analysis (as opposed to batched analysis using MapReduce) of Web-scale datasets. Photon is a system that enables fault-tolerant and scalable joining of continuous data streams (e.g., query logs with advertising click logs). Power-drill is an analytic engine that is capable of processing trillions of cells with a single mouse-click. Finally, Mill-wheel is a system that has been developed for fault-tolerant stream processing at Internet scale.

In this presentation, we will introduce and summarize these point solutions from the recent research papers published by the research and engineering teams at Google. The goal of this undertaking is to underscore that there is more to Big Data management and analytics than just BigTable and MapReduce especially in a broader research and development context of Big Data.

Short Biography
Dr. Divyakant Agrawal is a Professor of Computer Science and the Director of Engineering Computing Infrastructure at the University of California at Santa Barbara. His research expertise is in the areas of database systems, distributed computing, data warehousing, and large-scale information systems. From January 2006 through December 2007, Dr. Agrawal served as VP of Data Solutions and Advertising Systems at the Internet Search Company ASK.com. Dr. Agrawal has also served as a Visiting Senior Research Scientist at the NEC Laboratories of America in Cupertino, CA from 1997 to 2009. During his professional career, Dr. Agrawal has served on numerous Program Committees of International Conferences, Symposia, and Workshops and served as an editor of the journal of Distributed and Parallel Databases (1993-2008), and the VLDB journal (2003-2008). He currently serves as the Editor-in-Chief of Distributed and Parallel Databases and is on the editorial boards of the ACM Transactions on Database Systems and IEEE Transactions of Knowledge and Data Engineering. He has recently been elected to the Board of Trustees of the VLDB Endowment and elected to serve on the Executive Committee of ACM Special Interest Group SIGSPATIAL. Dr. Agrawal's research philosophy is to develop data management solutions that are theoretically sound and are relevant in practice. He has published more than 320 research manuscripts in prestigious forums (journals, conferences, symposia, and workshops) on wide range of topics related to data management and distributed systems and has advised more than 35 Doctoral students during his academic career. He received the 2011 Outstanding Graduate Mentor Award from the Academic Senate at UC Santa Barbara. Recently, Dr. Agrawal has been recognized as an Association of Computing Machinery (ACM) Distinguished Scientist in 2010 and was inducted as an ACM Fellow in 2012. He has also been inducted as a Fellow of IEEE in 2012. His current interests are in the area of scalable data management and data analysis in Cloud Computing environments, security and privacy of data in the cloud, and scalable analytics over social networks data and social media. He is currently on a sabbatical leave from UCSB and is serving as a Visiting Scientist in the Advertising Infrastructure Group at Google, Inc. in Mountain View, CA

Keynote Speech 2: Towards an Intelligent Keyword Search over XML and Relational Databases
Dr. Tok Wang Ling, Professor in the Dept. of Computer Science, National University of Singapore

Abstract:
Keyword search has been the major form of retrieval method in information retrieval system, and has become an important way for novice to explore data-centric XML and relational databases (RDB). Recent years have witnessed many approaches proposed for keyword search over XML and RDB.

For XML keyword search, existing approaches are structure-based because they mainly rely on the exploration of the structure of XML data. These approaches can be classified as tree-based and graph-based search. The tree-based search is used when an XML document is modeled as a tree, i.e. with no ID references (IDREFs), while the graph-based search is used for XML documents with IDREFs. Almost all tree-based approaches are based on some variations of LCA (Least Common Ancestor) semantics such as SLCA and ELCA. Due to the unawareness of semantics in XML data, these LCA-based approaches suffer from several serious problems such as meaningless answers, duplicated answers, missing answers, etc.

For RDB keyword search, existing approaches are also structure-based because they rely on the foreign key-key references of RDB. These approaches can be classified as data graph based and schema graph based. Data graph based keyword search on relational databases takes a relational database as a data graph. Each node in the data graph represents a tuple in some relation in the database and each edge between two nodes in the data graph represents a foreign key-key reference between the two tuples represented by these two nodes. An answer to a keyword query is defined as a minimal connected subgraph which contains nodes that match keywords in the keyword query. On the other hand, schema graph based keyword search takes a relational database schema as a schema graph. Each node in the schema graph represents a relation in the database and each edge between two nodes in the schema graph represents a foreign key-key reference between the two relations represented by these two nodes. To answer a keyword query, a set of SQL queries are generated wrt possible interpretations of the keyword query. The results of the SQL queries are considered as the answers. Without considering semantics in the database, these RDB keyword search techniques suffer from the problems of retrieving incomplete, duplicated, and meaningless answers. Moreover, the retrieved answers are highly dependent on the schema of the relational database and difficult to understand their intuitive meanings.

In this presentation, we point out mismatches between answers returned and the common user expectations in keyword search in XML and RDB. We analyze these mismatches and discover that the main reasons are due to the unawareness of the semantics of object, relationship, and attribute of object/relationship in databases. We refer to them as ORA-semantics. To capture the ORA-semantics, we propose Object Relationship (OR) data graph for XML and Object Relationship Mixed (ORM) data graph for RDB. Based on OR data graph and ORM data graph, we achieve an intelligent keyword search over XML and RDB which avoids the problems mentioned above.

To further facilitate the usability of keyword search, we also show our ongoing work to enhance the expressive power of keyword queries. Particularly, we enable users to explicitly indicate their search intentions with keywords matching relation name, attribute name, and tag name. We also handle recursive relationships and identifier-dependency relationships (IDD) in databases. We incorporate aggregate functions into keyword queries so that users can explore databases with aggregate queries.

Short Biography
Dr. Tok Wang LING is a professor in the Department of Computer Science at the National University of Singapore. He was Head of IT Division, Deputy Head of the Department of Information Systems and Computer Science, and Vice Dean of the School of Computing. He received his PhD and M.Math, both in Computer Science, from University of Waterloo (Canada), and BSc in Mathematics from Nanyang University (Singapore). His research interests include Database Modeling, Semi-Structured Data Modeling, XML Twig Pattern Query Processing, and Keyword Query Processing over XML and Relational Databases. He serves/served on the steering committees of 4 international conferences, including ER and DASFAA. He served as Conference Co-chair of 10 international conferences, including ER 2004, DASFAA 2005, SIGMOD 2007, and VLDB 2010, and as Program Committee Co-chair of 6 international conferences, including DASFAA 1995, and ER 1998, 2003 and 2011. He received the ACM Recognition of Service Award in 2007, the DASFAA Outstanding Contributions Award in 2010, and the Peter P. Chen Award at ER 2011. He is an ER Fellow.

Keynote Speech 1: Emerging Technologies for Big Data Management and Analytics Dr. Divy Agrawal, Professor in the Dept. of Computer Science, Univ. of California at Santa Barbara Visiting Scientist, Advertising Infrastructure, Google Inc.

Keynote Speech 2: Towards an Intelligent Keyword Search over XML and Relational Databases Dr. Tok Wang Ling, Professor in the Dept. of Computer Science, National University of Singapore

Keynote Speech 1: Emerging Technologies for Big Data Management and Analytics
Dr. Divy Agrawal, Professor in the Dept. of Computer Science, Univ. of California at Santa Barbara Visiting Scientist, Advertising Infrastructure, Google Inc.

Keynote Speech 2: Towards an Intelligent Keyword Search over XML and Relational Databases
Dr. Tok Wang Ling, Professor in the Dept. of Computer Science, National University of Singapore