#thank goodness for digital art cause I would *not* be able to afford the traditional equivalent | Explore Tumblr posts and blogs

monsieurpopp · 8 years ago

Text

Where Art Thy Data, Brother? The Digital Economy’s Natural Resource and the Impact on AI Startups’ Product Strategies

Today, businesses are being transformed by three inexorable trends. First, businesses are creating massive amounts of data stemming from a dizzying plethora of sources. That data must consequently be managed at scale. Second, the “cloud” is the most efficient and scalable platform to provide the type of fast, reliable, and financially viable storage and analysis of that data. That is due in part to the economies of scale brought about by some of the largest technology firm’s capital expenditures procuring massive server farms. Indeed, what would have taken million in dollars in the past to procure computing power can simply be done with a credit card in a matter of minutes. And third, businesses should now look to predict behaviors and trends from data captured from edge sources (i.e. devices, mobile apps, sensors). In that vein, predictions are gleamed by analyzing in real time the data obtained from the “edge”, which can now be done affordably thanks to the economies of scale brought about by cloud computing. Moving forward, businesses should be able to look at data in real time rather than review historical trends to make the most intelligent decisions about the future of the business. To draw an analogy, just as we’ve segwayed from the hardware stack to the software stack (buying servers to buying the software services enabled by those servers), firms are moving from asking the question about their business: “What happened?” to “What should we do now?”

Artificial Intelligence, the prodigal child of the marriage between “cloud” and “data”, can help businesses answer that last question. While I may be a clearly biased Microsoft employee, I still find the Redmond based behemoth’s world view of the implications of Artificial Intelligence to be universally accepted by others in the space. As the Microsoft AI Data Marketing team would suggest, we’ve moved from descriptive analytics (“what happened?”), to diagnostic (“why did it happen?”), and now to predictive (“what will happen?”). The next frontier is the promise of AI: the ability to be prescriptive (“How can we make it happen?”)[1]. Nowadays, businesses can reallocate money that would have previously been spent on capital IT expenditures to R&D investments geared towards innovating their core business. Take for instance a typical Internet of Things use case around device monitoring for a company that sells hardware appliances to their business customers. The business selling that hardware could leverage a cohort of cloud based services (i.e data warehousing, data analysis, machine learning) to monitor the device by connecting the hardware to the cloud. Rather than having to build out the infrastructure to run those cloud services for effective device monitoring and maintenance, the business can leverage a cloud vendor’s capabilities; the analysis collected from running these intensive compute workloads can ultimately drive efficiencies, impacting the business’s bottom line. For instance, the firm could build higher quality devices that break less by analyzing patterns of what parts cause the most devices to break prematurely. This will create efficiencies in production and maintenance, all thanks to the cloud based machine learning service that analyzed patterns in device wear and tear.

In an overview of the vendor space offering such cloud based services, we’ve come to see the centralization of AI enriched software solutions within the hands of a few multinational publicly traded technology firms. It would also seem that firm that creates the most complete digital profiles of the end users based on their preferences will be able to provide the most compelling and relevant experiences for those same end customers. This type of relationship in turn creates near insurmountable barriers to entry. For instance, Google’s monopoly on search is near impossible to break as it has collected petabytes of data on user search queries and preferences, which they’ve continued to feed into their algorithms to make their results even more relevant. In contrast, DuckDuckGo can’t compete, even if they tout their differentiation as a search engine that eschews collecting user data. One could argue that more people prefer to use Google that DuckDuckGo because their search results are simply better[2].

That ability to provide such recommendation – which is at the heart of Google, Amazon, Microsoft’s market power – stems directly from data. That view is not unique. In their May 6th 2017 edition, The Economist argued data is the newest natural resource in the modern digital economy.[3] Consumers have shown they are willing to forfeit some of their privacy for more contextual digital experiences, which is powered by user past behavior, preferences, and actions (their “data” effectively). For instance, companies like Facebook and Tencent collect terabytes of data on user behavior and preferences; AI techniques like Machine Learning extract more value from that data by being able to smartly predict and recommend products, services, and goods that users are more inclined to purchase based on patterns underlining their digital lives. Even non-traditional software vendors are leveraging the data they collect to compete in their markets. Tesla, a marker of electric cars, uses its latest models to collect data from their customers, which allows them to optimize its self-driving algorithms that they one day hope to embed in all their cars. Talk about a differentiated product from their competitors at GM & Ford! These latter two have also respectively acquired Cruise Automation and partnered with Argo AI to counter Tesla, Uber, and Waymo’s momentum in the self driving space. Indeed, they ultimately hope to create new business models and spur growth for their less than healthy car businesses. These examples lead us to believe that whomever has the most data will most likely erect the deepest moats around their core businesses.

But what does that mean for startups trying to offer AI applications to their end users? How do they compete with these kingpins if data is king? At first glance, it would not seem so. These firms are spinning up large AI product investments. Indeed, Azure has a set of Cognitive Services that can be embedded or extended into applications. Google has TensorFlow. Amazon is starting to provide externally available set of ML algorithms. These behemoths also keep using their massive resources to corral large teams and even larger resources. Microsoft alone spends $12 billion a year on AI advancements and employs 5,000 technical staff members to work solely on advancing of that technology. The Google Brain Team is a separate group that can leverage Google’s gargantuan resources in their quest to usher in AI as well. And if organic product development doesn’t work, these firms will reach into their deep pockets and scoop up any startups that is complementary or anathema to their own offerings. Facebook notoriously bought WhatsApp for $22 billion; Microsoft, LinkedIn for $26.2 billion; IBM the Weather Company for $2 billion to power their Watson services. So even if a startup reaches critical mass, they’re more likely to be ingested by a larger firm that stay independent and deal with the challenges that come with such status. At first glance many of the advances being made in Cognitive Services, Machine Learning, AI are baked in these large companies; and not many startups who seek to go after the holy grail of AI have the resources or the runway to get to a point where they are a self sustainable business.[4]

This essay will challenge the belief that large entrenched vendors are monopolistic and have erected such moats that it is near impossible for startups to create meaningful and differentiated AI based consumer and enterprise products. There are still growing niche markets and use cases that startups can dominate as the Googles and Microsofts of the world can’t be everywhere at once. The question that this paper will seek to answer is how should these startups think about the evolution of their product as to foster business growth and sustainability. With that in mind, I will focus on how startups should construct their product strategies in light of the fact that the best AI powered businesses must rely on data. I will consequently argue that a startup’s ability to create AI enriched software product stems from the value of their data, which can come from one of two sources: either from the customers they serve (who willingly agree to share it), or from the public domain (i.e. users or software partners that are willing to distribute it).

The data is public, but your access may not always be

Artificial Intelligence, a machine’s ability to act as a human would, has often been classified into two categories by researchers over the last sixty years. As Andreesen Horowitz elucidates in their Introduction to AI[5], the first category, dubbed “Classical”, originated around 60 years following the 1956 Research Project on AI. During that time frame, scientists have attempted to use logical rules to mimic intelligence. For instance, they would represent the world in data structures, and apply logic rules to deduce something from the data set. This framework is often called “symbolic AI” as robots are essentially given symbol representations of rules for them to deduce the correct course of action. An example of a symbolic system would be a “if then” statement.

In contrast, the “Modern” definition of AI touts that once a data set is orchestrated, a set of machine learning techniques can be applied to that data in order to solve different classes of problems. The idea then is to “use a set of techniques like logistic regression, decision trees, Gaussian Naïve Bayes, random forest, k-nearest neighbors, deep learning” so that eventually a program is trained to predict actions from new, incoming data sets as they are fed into the engine.[6] Some have argued that AI and Machine Learning are different: while Machine Learning enables programs to predict the future, AI programs machines to make decisions.[7] That being said, many AI techniques stem from machine learning, so the two are nonetheless intertwined. For instance, the concept of neural nets – an AI concept whereby interconnected group of computer programs act as “nodes”, much as your neurons are interconnected and layered in your brain – is based on the fundamentals of machine learning. As ML is based on training models, neural nets’ prerequisite is to train the triggers of specific computer node neural pathways. In laymen’s term, ML stems from feeding a black box data points, letting the program in the black box identify patterns and unique identifiers, and from there, coming out with decisions.[8]

In both the “Classical” and “Modern” definitions, data is at the root of algorithms, which is at the root of AI. As Image 1 below illustrates, the value of Machine Learning stems from the algorithms applied onto the raw data. Thirty-four startups were purchased in 2017[9] because these nimble companies have created unique methods to processing data and applying machine learning techniques to create new digital products. Microsoft, Google, and AWS may provide some ML APIs, but these startups have specific use cases that these large vendors don’t have the breadth to tackle all at once. So to assume they will monopolize the AI space doesn’t do justice to the use cases and verticals startups can address.

That being said, if in 2017 the startups that have the best algorithms will outflank their competitors; at some point, there will be parity on the algorithms, and data will be the biggest differentiator and fodder for the end software to be the most powerful. For instance, today one startup may be better than another in the food delivery space – DoorDash vs UberEats vs Postmastes - because one may tailor the recommendations by virtue of their algorithm. At some point, there should be parity amongst the firms building those models; the algorithms will be commoditized. Eventually, the algorithms can only smarter if it has more data to build from. Take a drone analytics as a service use case. Today, the hardware and the capture imaging process is becoming a more competitive space because there aren’t many levers of distinction there. To differentiate, the next generation of drone analytics will be around applying AI to data being collected and analyzed. Rather than simply taking pictures of a roof, the drone will be able to recognize damage patterns and signs to thus automate the process of identifying whether an insurance claim is viable or not. Today the drone startups with the best algorithm will win that battle field; in three years from now, when that algorithm gets commoditized (which could be built on top of ML models sold as a platform like Google TensorFlow or Azure ML), whoever is able to recognize objects and patterns with the highest accuracy and confidence based on the richest data set will have the most differentiated end product. So if the data is the ultimate differentiation, then the firms must ensure that their data pipe does not get cut off; in other words, the value of the software diminishes if it has less data to feed its machine learning.

Figure 1: Algorithms will provide the first level of differentiation, but ultimately, as those gets commoditized, the depth and breadth of data will provide the ultimate competitive barrier to entry. Source: Wintellect’s Guide to Azure Machine Learning

If we look at some of the more successful startups, one will notice that the data powering their AI products is mostly collected from the public domain. In this scenario, the startup sells a multi-tenant AI driven Software as a Service, where the value is driven from external public data and network effects; the more data and the more users consume their software, providing the vendor with more patterns and data points, the more powerful, valuable, and differentiable their AI powered software becomes. For example, DoorDash, an on-demand delivery startup, used AI successfully to the point that they saw a 25% lift in sales from machine learned food recommendations to their end users.[10] Such recommendations were tailored by the vast amount of data that those users agreed DoorDash could leverage to improve recommendations and the overall end use experience. [11] Vertical based startups building very specific use cases – iOT for delivery tracking (Samsara), log analysis (Loom Systems) – have also begun to come out of the woodwork. Similar to DoorDash, they collect data from their users, and feed it into their machine learning models to tailor their solution to the context and unique needs of individual user and customer. Alternatively, companies can ingest their data from third party sources, which is considered public in the sense that it can be consumed by readily available third party APIs. Take for instance intelligent CRM software affinity.co, which acts as a sort of RelateIQ /LinkedIn hybrid. The software can predict which connections will result in the warmest relationships within prospective client accounts. Their application can predict that information based on the data it collects from external source - LinkedIn, social media, your address book, among other – and then feeds into its algorithms. In both use cases – where data is supplied by the public or third party vendors - the more data a startup collects from their end users, the more powerful their AI becomes (since it has more data to learn more). In other words, there is a direct correlation between the amount of data training the model and the “stickiness” for the end product.

In this case, these startups can’t necessarily create moats protecting the data they collect, because the data is either available to everyone or can be hoarded by jealous third party vendors. Nothing precludes a competitor to copy DoorDash’s model and recreate their own experience – take for instance UberEats or Postmates. The Affinity example demonstrates that sometimes data is collected from within the closed gardens managed by third party vendors. Some are open (think of Twitter with their developer APIs), but there’s always the danger that that data hose can be abruptly closed off (speaking of Twitter…). It’s the same problem we’ve seen with developers and ISVs building on top of platforms that seem to be open to them, all to be closed off as soon as those competitors become too dangerous to the platforms they stand on.

Consequently, the startup must think about cornering two data ingestion strategies before their competitors do. First, they have to receive their customers’ consent to aggregate their data so that they can feed it into their algorithms for training purpose. This is what we saw in DoorDash’s terms of service. Uber does this as well – they can tailor their pricing depending on the paths and surges in demand based on traffic data and route patterns. Their terms and condition stipulate that users’ data is indeed anonymized and aggregated – so a sound legal terms and conditions suffices here. But second, if that data cannot be appropriated by users, it should be ingested by a network of data partners. In this cases, outright licensing deals or product partnerships is the way to go because it ensures that the data pipe does not get cut off; there’s a commercial agreement backing that up. Additionally, in the agreement, the startup can negotiate exclusivity for a pre-determined frame of time, precluding any competitors from making the same agreement. Even at the end of that agreement, should a competitor replicate that same alliance, the first startups will have had first mover advantage – they would have more data for longer to train their models. This is also assuming that the vendor don’t sweeten the renewal of the product partnership in year two. Partner with is a key motion to ensure that the pipe remains open all whilst closing it temporarily from their competitors.

The Gatekeeper is your friend

What happens if the AI solution is sold as a platform and not simply as a pre-built piece of software? Let’s first define platform. A platform means it can support plugins and extensions – it has a core functionality, but different customers can build on top of it. In the software as a service delivery model, the clients download a piece of software that fosters a user experience that doesn’t change from customer to customer; it’s how you use it that does. In the platform delivery model, the client builds on top of a core set of functionalities that they pay for access; the interface may be the same on a customer to customer basis, but the data powering the service is distinct for each user. In the AI world, the delivery model mirrors usually that of SaaS – that is, the customer interacts with an application. It ultimately depends on where the data stems from – from the public as we saw previously, or from the customer, as we shall soon see. The AI software is powered by data on a customer to customer basis –rather than from multiple sources that supports a product that everyone can use. The distinction is nuanced, but the data delivery and ingestion model is not.

A few examples to draw the distinction. Take for instance Infer - they’ve built a predictive analytics and AI applications that makes it easier for their enterprise customers to rank and score their leads, consequently optimizing sales teams’ focus on prospects that are more likely to close. While they’ve built their AI stack by training their models with sample data, the real value – the ROI – for customers is when it’s applied to their own data. Right now Infer competes because it has the best algorithms; but it needs more data from their business customers to 1) show value to those customers and 2) improve their models over time. If you recall Wintellect’s machine learning, Infer has built the ML algorithms, but the data raw data comes from their customers rather than the public. The value of their software is based on the data customer provides to them to learn from; the customer plugs in their data to see the ROI of their solution. Another example - Cylance leverages AI methods to help their enterprise customers predict and thwart cyberattacks that would take place in their network. Sure, they sell an application delivered through mobile and web apps that an administrator can monitor, but the software has to be deployed over the customers’ network. Each customer has to deploy Cylance’s software, and that process looks different from a customer to customer basis because no infrastructure is the same. The value that Cylance brings is the novelty of their algorithms that already has the base signals to identify malicious behavior; and over time, they will learn from their customers and improve their algorithms’ ability to thwart cyberattacks.

In the case where the AI software is delivered as a platform, the startup selling the service can apply their algorithms to data that is proprietary on a customer to customer basis. Customers provide their network, their infrastructure, their data, and the startup’s AI product lives within that environment. The data is closed off within the walls of the customers they serve, which is more advantageous than the SaaS model in the sense the product can be applied safely within those ringfences. In fact, there’s less of a risk that the product will be devalued because the data stems from the customer rather than the public domain. There is no public data hose that can get cut off at any time; the customer dictates that by signing off on a commercial agreement with their vendor. Additionally, the startups product is very “sticky.” Ripping out the AI powered product from the infrastructure is timely and costly; a competitor will have to build a ROI model that takes in consideration the time necessary to take off the existing service and the loss in business impact by having to wait for the value of the competitive product to materialize.

The challenge then is to scale the business; startups offering an AI driven platform requires heavy upfront development on a customer to customer basis (think professional services) and continual improvement of algorithms that are at risk of being commoditized. It’s incredibly time consuming to deploy a software service within a customers’ network rather than simply deliver it as a software as service. In SaaS you can access the software through a mobile or web app – the onboarding process is pretty straightforward. It takes time and resources (think pre-sales engineers and consultants) to deploy an AI solution delivered in the network. Additionally, the solution being delivered may be differentiable in 2017 because no one else has developed similar algorithms, but that may not be the case for long. Take the AI powered cybersecurity example. There are several software companies – Invincia, Harvest.ai, Niara, Darktrace, Deep instinct to name a few[12] - that tout they address the same or similar use case that Cylance addresses. To play devil’s advocate, security is already a competitive field where differentiation is hard to come by. But if these companies will have differentiate themselves based on the value their algorithms bring to the table, how do they do so? As we’ve seen, to those that have the most data, the best algorithms. The data is where they will have to compete ultimately.

Therefore, startups that sell their AI software as a platform have to be proactive in their go-to-market and data acquisition strategies in two ways. First, to scale the deployment of their software, startups should invest in partnering with professional services firms. Delivering services is a low margin business – if you’re in the business of selling an AI product, minimizing the percentage of your revenue streams stemming from professional services is important for the business to be profitable long term. Partnering with firms that specialize in that – from digital agencies to global system integrators – means that when you win a customer, you can leverage your partners’ trained consultants to do the heavy lifting for you. Additionally, if the startup keeps bringing business to the partner, that partner in turn will proactively sell you solution (and their professional services in tandem) proactively to their customer basis. The startup has doubly protected their own go-to-market and opened up a new route to win customers that they don’t have the time or the bandwidth to go after. Second, the startup should learn as much from the data collected from their individual customers, and from there, improve their algorithm. If a vendor is in the market first, and their survival in a few years is dictated on the premise that their technology is superior, everything stems from data.

Oligopolistic Competition?

In the use cases previously discussed, we’ve argued that data is at the root of the algorithms. If algorithms are to get commoditized, why should startups continue to focus on developing their own? Won’t they simply get replaced by what these large vendors are already building? Amazon, Google, and Microsoft are already building out their own AI services and selling them as RESTful APIs. These vendors are collecting terabytes of data to feed into those services – making the products more intelligent, more resourceful, and ultimately, sticker for developers to embed in their own applications. Why compete with these behemoths if they already have such a head start?

The answer may be rooted in history. Salesforce sells a CRM product that is built on the infrastructure of cloud computing – it’s delivering a service over the internet. Amazon provides the infrastructure – networking, data storage & processing, virtual machines – to that type of service. These two giants announced a recent partnership where Amazon will power Salesforce’s product. Why is this relevant to our discussion? The two provide different services but on the same principle – that of cloud computing. Google and Microsoft provide the foundation for AI, but startups can actually build on top of what they’ve already done. Just as Salesforce sells a specific application (CRM) on top of the foundation of cloud computing, startups can actually build vertically focused applications on top of the foundation of Machine Learning algorithms (which Microsoft, Google, and Amazon is now building). Remember Kespry, the drone and analytics as service startup? They build off their object recognition for hail detection algorithms on top of Google TensorFlow. Moreover, Google and Microosft can’t be everywhere at once – the standard argument that big companies can’t be everywhere at once prevails once again. Take startup called ScaleAPI – they’re building a set of cognitive service APIs that facilitate “human intelligence.” More specifically they offer four distinct AI products (delivered as APIs): Image annotation, Audio Transcription, Categorization, Comparison, Data Collection. These are use cases that Microsoft and Google could address, but have chosen not to yet. Perhaps on one day they may copy ScaleAPI, but that action would only support the thesis of this argument: that ScaleAPI may be ahead of them by that point, because of the breadth and depth of data collected up to that point and fed into their ML algorithms.

It ultimately comes down to the data. It’s the newest currency in the digital world – the equivalent of gold in the 1848 era. And as history has shown, those that find the gold (data) first, wins the most riches (customers and revenue streams).

-----

[1] For a full account of the AI data marketing’s take on AI as a service, navigate to: https://channel9.msdn.com/Events/UKDX/Introducing-AI

[2] I draw that conclusion based on the fact that Google sees 3.5 billion searches a month to DuckDuckGo’s 15 million on average (source: http://duckduckgo.com/traffic.html) . Perhaps it’s a marketing issue that prevents DuckDuckGo from seeing higher volumes, but for the purpose of this article I’m assuming it’s because users have developed muscle memory to use Google, as the search engine has repeatedly proven their results to be the most accurate (thanks to their years of ML training) and delivered reliably (thanks to their prodigal and globally distributed server infrastructure).

[3] The Economist, May 6th, 2017. Two articles provided the fountain of sources for this essay: “The world’s most valuable resource,” pg 9; and “Fuel of the future,” pg. 19 – 22.

[4] Though pundits would argue their exits aren’t bad either considering we may be in the middle of AI’s overhyped promise. Only time will tell.

[5] http://aiplaybook.a16z.com/docs/guides/ai

[6] Ibid.

[7] https://medium.com/safegraph/a-non-technical-introduction-to-machine-learning-b49fce202ae8

[8] For the sake of simplicity we do not dive into supervised vs unsupervised learning, but for more on the subject, the aforementioned footnote provides a solid introduction to the subject. The book Artificial Intelligence: What Everyone Needs to Know by Jerry Kaplan provides a fundamental overview of AI as well.

[9] https://venturebeat.com/2017/05/28/tech-giants-acquired-34-ai-startups-in-q1-2017/

[10] https://venturebeat.com/2017/05/17/doordash-sees-25-lift-from-ai-recommendations/

[11] From DoorDash terms and conditions, “You hereby grant the Company a perpetual, irrevocable, transferable, fully paid, royalty-free, non-exclusive, worldwide, fully sublicenseable right and license to use, copy, display, publish, modify, remove, publicly perform, translate, create derivative works, distribute and/or otherwise use the User Content in connection with the Company’s business and in all forms now known or hereafter invented (“Uses”), without notification to and/or approval by you.” Source: https://www.doordash.com/terms/

[12] http://www.nanalyze.com/2017/04/6-ai-cybersecurity-startups/

0 notes