Technology and innovation are constantly evolving, thanks to the contributions of pioneers like Luc Vincent – a visionary who has overseen some of the most advanced computer vision and geospatial projects on the planet.
Technology and innovation are constantly evolving, thanks to the contributions of pioneers like Luc Vincent – a visionary who has overseen some of the most advanced computer vision and geospatial projects on the planet. Luc started his career working for Rochester based Xerox, leading out an R&D team for their Palo Alto Research Center. From there he moved onto Google, where he used his 20% time (time employees get to work on innovations outside their direct job) on building a project that would come to be known as Google Street View. Luc bootstrapped that program, transforming the way in which we navigate and explore our world. Luc would later go on to make sizable advancements at Lyft where he led their self-driving car efforts, and Meta, where he would work on advanced AR goggles. In this post, we delve into the profound insights Luc gathered along his journey, in building remarkable technologies that we use every day.
The Dawn of Google Street View
I kicked off my career immersed in document imaging, focusing my efforts on reading machines tailored for the visually impaired as well as general purpose Optical Character Recognition. This eventually led me to the Xerox Palo Alto Research Center (PARC), the birthplace of many computing technologies we now take for granted, where I led a range of projects focused around document recognition and compression.
With eight cutting-edge cameras in tow, our reimagined vehicles began offering a holistic view of the streets. In 2007, the world was introduced to Street View through Google Maps. The Sensation was immediate. We had not only hatched an idea but also executed it well enough to create a compelling user experience that users found uniquely useful and oddly addictive.
After over a decade working on document related technologies I got the opportunity to join Google and help take Google Books, which had not publicly launched yet, to the next level. But upon joining Google, I was immediately introduced to a crazy idea: collecting imagery at Street Level and making it universally accessible and useful, at scale. Seeing the immense potential and backed by Google’s spirit of innovation, I eagerly adopted this as a 20% side project. With an enthusiastic team of interns and fellow technologists, we build an end-to-end demo platform, collecting, processing and publishing street-level panoramas in a map context. While initially complex, our endeavors prompted a eureka moment: the birth of the 360-degree camera configuration.
With a custom 8-camera rosette configuration, our reimagined vehicles began offering a holistic view of our streets. In 2007, the world was introduced to Street View through Google Maps. The sensation was immediate. We had not only hatched an idea but executed it well enough to create a compelling user experience that early users found uniquely useful and oddly addictive.
As Street View’s popularity soared, scaling became our new challenge. We innovated on camera technology, honed data processing, and integrated AI to autonomously extract information such as street signs and house numbers from the imagery. From a few cities, I steered our team to expand across over a hundred countries around the globe, accumulating data spanning tens of millions of miles.
A Decade at Google and the Leap to Lyft
My contributions to Google extended beyond Street View. I ventured deeper into geospatial imaging, leading projects involving aerial and satellite data at Global scale and playing a pivotal role in shaping Google Maps’ iconic 3D city views. Piloting planes and crafting computer vision pipelines for such vast scales was daunting, but our tenacity ensured that our offerings thrived. After 12 long and exciting years at Google, it was time for a new adventure. My journey took me to Lyft, a platform most known for its ride-hailing service. However, my vision there was to innovate in the realm of self-driving cars. I embarked on building Lyft’s Level 5 division, a unique venture focused on building robotaxis designed specifically for the Lyft platform.
At its core, the concept of self-driving cars revolves around a sensing platform on the road, making use of cameras and lidars. As you might imagine, this involves a heavy dose of computer vision and AI models. These models play a crucial role in understanding the environment around the vehicle, deciphering data both online and offline, ensuring that every journey is safe and efficient.
The Intricacies of Level 4 and Level 5 Autonomy
Although we’re witnessing a surge in autonomous vehicles, with a couple of services operational in cities like San Francisco, the journey to Level 4 autonomy has been slow. The reason? The sheer complexity of the problem might have been underestimated by many.
Getting to Level 5 – where vehicles can drive anywhere, under any condition – is a lofty goal. Some might even argue that there is no business case for Level 5 Autonomy. As for Level 4, while companies like Waymo and Cruise are making significant strides, there’s still much ground to cover before L4 robotaxi services become commercially viable on a large scale.
Some of the challenges, such as AV Perception, are well understood and no longer a bottleneck. However, safe motion planning under a wide range of real-world conditions is proving more challenging than anybody anticipated a few years ago. And the ultimate challenge in the world of Level 4 and Level 5 Autonomy may be validation: how do you prove that your AVs are capable of operating safely under all possible conditions? And how safe is “safe enough”? We still have a lot of work to do!
The Future: AR Glasses and Contextual Understanding
Post-Lyft, I found myself drawn to the world of Augmented Reality (AR) glasses. As of now, most AR glasses in the market are chunky and not particularly stylish. But imagine a future where they’re as sleek as everyday eyewear, equipped with cameras, microphones, displays, and computing capabilities.
These glasses would essentially be an extension of ourselves, seeing what we see and hearing what we hear. By processing this information, AR glasses could assist us in various aspects of our lives. Whether it’s reminding us of someone’s name or helping us communicate more effectively, the possibilities are endless.
Building AI models to power these AR glasses is no small feat. They have to process a plethora of information from video to audio, all in real-time, primarily on-device. It’s a daunting yet thrilling challenge, one that holds the promise of transforming the way we interact with the world.