By Leslie Harper,2014-05-28 14:27
10 views 0



     China Faculty Summit

     November, 2010

     Maggie Johnson Director of Education and University Relations


     ? ? ? Google's Faculty Summits A relatively small gathering Rotating faculty invitations Goals o Explain o Interact o Learn o Have fun

     China Faculty Summit Themes

     Commerce ? Mobile ? Education and Curriculum Development

     Agenda: Day one

     Keynote 1: Google Research Overview Keynote 2: Google China R&D Keynote 3: Mobile 2014 Commerce Track

     Talk 1: "Large Scale Data Mining in Product Search", Si Li, senior software engineer, Google China Talk 2: "Online Shopping & Social Media", Yonggang Wang, Staff Software Engineer, Google China Round Table

     Mobile Track

     Talk 1: "Mobile Peer-to-Peer Streaming", Prof. Gary Chan, Hong Kong University of Science and Technology Talk 2: "Location Determination for Mobile Devices", Dr. Tsuwei Chen, Senior Software Engineer, Google Round Table

     Agenda: Day Two

     Education and Curriculum Development ? Google University Relations: a Global Perspective ? Google University Relations in China ? Collaboration Example 1: Tsinghua University, Cloud Computing ? Collaboration Example 2: Zhejiang University, Android ? Collaboration Example 3: Sun Yat-Sen University, Web technologies

     Innovation and Research at Google


     Organizing the world??s information and making it universally accessible and useful.

     Google Scale in Commerce

     Over 1 million AdWords advertisers worldwide ? Over 1 million AdSense publishers worldwide host ads ? Via the Google Ad Network, AdSense publishers reach over 80% of global internet users in 100 countries and 20 languages ? YouTube is monetizing over a billion video views per week globally ? In 2009, Google generated $54 billion of economic activity for businesses, website publishers, and nonprofits, in the US alone. Similar benefits elsewhere.

     Google Scale in Hardware and Storage

     Giga 109, Tera 1012, Peta 1015, Exa 1018, Zetta1021 ? Publicized: Bigtable of 70 petabytes, 10M ops/sec. ? Some representative numbers:

    o Storage: 1018 -> 1020-21 o Users: 109 -> 1010 o Network: 1020, now, ->1021/yr (32 KB/sec. for 1B people) ? Warehouse computing possibilities

     A variety of science engineering challenges

     Focus on Innovation to Benefits our Users

     Commitment to advancing technology ? Rich domain of work due to our mission ? Internal consensus that production issues are often as challenging/fun as pure invention ? Technical leverage

     1.Google Common Distributed System 2.A Focus on Services 3.Empiricism and a Holistic Approach to Design

     Our Innovation Culture

     ? ? ? We work very hard to have a culture of innovation Managers try hard to say "yes" and empower teams 20% time Focus on research, distributed across the organization o Impacting Google necessitates broad, diverse involvement in science and engineering o Research is done both in our research team and in our engineering organization, organized opportunistically o Research ideas can immediately influence products and product experience can motivate and shape research agenda

     The "Lab"


     Research Challenges in Distributed Computing

     Alternative designs that would give better energy efficiency at lower utilization ? Server O.S. design aimed at many highly-connected machines in one building ? Unifying abstractions for exploiting parallelism beyond intertransaction parallelism and map-reduce ? Latency reduction ? A general model of replication ? Machine learning techniques applied to monitoring/controlling such systems ? Automatic, dynamic world-wide placement of data & computation to minimize latency and/or cost, given constraints on ? Building retrieval systems that efficiently and usably deal with ACLs ? Holistic models of privacy ? The user interface to the user??s diverse processing and state

     Totally Transparent Processing

     For all d in D, all l in L, all m in M, and all c in C D: The set of all end- L: The set of all user access devices human languages M: The set of all modalities C: The set of all corpora

     Personal Computers Phone Media Players/Readers Telematics Set-top Boxes Appliances Health devices ??

     Current languages Historical languages Other forms of human notation Possible language specialization Formal languages ??

     Text Image Audio Video Graphics Other sensor-based data ??

     The normal web The deep web Periodicals Books Catalogs Blogs Geodata Scientific datasets Health data ??

     Totally Transparent Processing

     Selected Research Areas

     ? ? ? ? ? Machine Translation Speech Structured Data Vision Operations Research Digital Humanities

     Machine Translation @ Google

     Statistical Machine Translation ? Model translation process with an automated statistical model ? Learning from data: monolingual & bilingual o More data: better translation quality ? Computationally expensive approach o Models have many hundreds of Gigabyte of data Results: ? Much better translation quality ? Ongoing progress o Constant feedback and improvement ? 58 languages (so far) o recently: Haitian Creole, Urdu, Georgian, ????, Latin

     More Usage

     Chrome/Toolbar (websites). YouTube (CLIR, captions, snippets). Reader (feeds). GMail, Docs, Spreadsheets, and more

     Speech Technology at Google

     Much of the world??s information is spoken ?C we need to recognize it before we can organize it:

     YouTube transcription and translation (breaking the language barrier for YouTube access) Voicemail transcription

     Mobile is the fastest growing and most widespread platform for communication and services that has ever existed

     Spoken input and output is key to usability Our goal is completely ubiquitous availability of speech i/o (every application/service, every usage scenario, every language)

     How do we get there?

     Delivery from the cloud ?C support constant iteration and refinement Operating at large scale ?C train huge statistical models on huge amounts of data

     Structured Data on the Web

     Discovery and search for structured data: ? The deep Web -- significant gap in coverage ? Structured tables on the Web -- not leveraged in search

     Enable easy creation, management, sharing and publishing of structured data: ? Fusion Tables: www.google.com/fusiontables

     Google Fusion Tables

     host, manage, collaborate on, visualize, and publish data tables online

     What can I do with Fusion Tables? Host data online - and stay in control ? control can be at the level of columns or rows Re-use data without making copies Collaborate on the details ? Merge data from multiple tables ? Comment on individual rows, columns or cells Make a map (or chart or timeline) in minutes! Manage data via our site or an API

     Fusion Table Example Gallery

     Easy Data Upload, Attribution recorded

     Easily Create Informative Maps

     baby steps towards the dream platform ? DEMO:


     circle of blue

     Discussions, Data Integration

     Computer Vision

     Advance state-of-the art in 3 key areas of image/audio/video analysis and apply results to our multimedia products. o Semantic Interpretation: Generate human understandable description of content. (eg. auto-tagging videos on YouTube, Image annotation, etc.) o Matching: Find similar entities from a large corpus. (eg. "find similar" on image search, etc ). o Synthesis: Generate better images/video by understanding the statistics of a large corpus of images. (eg. better facades in 3D building on Google Earth, automatic shadow removal from aerial images etc.)

     Semantic Interpretation sample problem - Video Annotation

     Video metadata has a cognitive cost on the user because they have to type it in, be careful about what keywords they use, and in general try to make their video searchable ? Many uploaders don??t have the motivation, or energy to provide proper metadata ? Noisy metadata hurts everyone ?C spam, misspellings, acronyms, etc.

     Operations Research Challenges

     Size: Optimization is often NP Complete o The tools are barely keeping up with the problems. ? Uncertainty: Data is often fuzzy. o How do you route cars when there are roadblocks, new one-ways, traffic jams? o How well can you optimize against forecasted data, how do you react if the forecast is bad?

     Operations Research Opportunities

     Machine Learning can help us in two ways: o By providing guidance towards good solutions. o By qualifying valid solutions. o By reducing the search space. ? Large computing resources means we can try a bit harder. ? Crowd-Sourcing means better data, better feedback, better evaluations of algorithms and solutions. ? Having all our code open-source means we can collaborate on building the best set of tools. o See http://code.google.com/p/or-tools

     An Application: Earth Engine

     Initial Earth Engine Motivation: Forest Carbon Tracking




     Rondonia, Brazil

     UNEP: "Atlas of our Changing Environment"


     Original image

     Original image ???? is divided into 256px sub-units.

     Sub-units are distributed

     Sub-units are distributed ???? to separate machines.

     Sub-units are distributed ???? to separate machines ???? where they can be processed in parallel.

     Thousands can be processed simultaneously

     Result is reassembled

     Result is reassembled ???? into a finished image


     Global-scale earth observation and informatics platform

     ? ? ? ? ? ? ? ? ? ? ? ? For public benefit, and to support emerging green economy Help science come out of research lab and into operational use, at scale Unprecedented catalog of earth observation data for mining and analysis Promote transparency, reproducibility, collaboration, ??open science?? Intrinsically-parallel pixel processing system Built-in Google algorithms as well as user-supplied Earth Engine API for 3rd party algorithm development Access control, versioning, provenance Online and desktop versions (open source desktop version) Every available Landsat and MODIS scene (more satellites coming) Commercial datasets (very high resolution satellite imagery) Environmental data (atmospheric, ocean, terrestrial) User-supplied (ex: in-situ data collected via Android phones)

     Very fast computation of scientific map products

     On a lot of useful data

     Digital Humanities and Education

     Illuminating the Humanities

     Q: What can you do with: ? 12 million books in ? over 400 languages ? comprised of 5 billion pages and 2 trillion words ? ????all digitized?

     A: Look to the humanities for new questions????

     For example, what are the differences between early and later editions of:

     ºìÂ?ÃÎ (Dream of the Red Chamber), published in 1784

     1. Early versions are Rouge versions (Ö???) with 80 chapters, "The Story of The Stone (Ê? Í??Ç)", (10+ editions) 2. Around 30 years after the first edition, the book was amended with another 40 chapters, thus the novel was 120 chapters, called Cheng-Gao versions (?Ì?ß??).

     Digital Humanities Awards

     Research program supporting university research taking a computational approach to traditional humanist questions. US program, Summer 2010 ? 12 projects ? 23 researchers ? 15 universities European program, Winter 2010 ? 15 projects planned

     Curriculum Development

     Seeding and supporting computing curriculum development

     o o o

     Exploring computational thinking in K12 (google.com/edu/ect) CS4HS: High school computer science (cs4hs.com) Undergraduate open source CS curriculum: Google Code University (code.google.com/edu)

     Supporting our Academic Institutions


     o o o


     Research Awards Programs - 230+ projects funded in the last year Next Due date Feb 1, 2011 Research-awards@google.com Focused Grant Program Mobile 2014 Visiting Faculty Program - 20 faculty (ongoing) University-relations@google.com Ph.D Fellowship Program 2009: 13 students supported in US 2010: 15 in US, 15 in EMEA, 2 in China (and more to come) Over 150 other scholarships, most in China ~1000 interns worldwide

     Final Thoughts

     Scale of Communication and Computing is profound Endless opportunity for technical growth Rapid innovation in

    science/technology and value to consumers o We are providing increased support for academic institutions in computer science and related areas

     o o o

     It's a most exciting area in which to innovate

     Thank you! Ð?Ð?


Report this document

For any questions or suggestions please email