Tumgik
#( KAFKA COME BACK -user is offline- )
moltenfire · 5 months
Note
[ 📱 — sms ] i’m pregnant.
Tumblr media
( sms : kafka ) — ?????? ( sms : kafka ) — KAFKA PLEASE ANSWER ME ( sms : kafka ) — YOU CAN'T JUST SAY THAT AND GO OFFLINE
[ ask meme. ] // @kafkaisms
2 notes · View notes
newsjerk · 7 years
Text
How Google Book Search Got Lost
Google Books was the company’s first moonshot. But 15 years later, the project is stuck in low-Earth orbit.
Tumblr media
Books can do anything. As Franz Kafka once said, “A book must be the axe for the frozen sea inside us.”
It was Kafka, wasn’t it? Google confirms this. But where did he say it? Google offers links to some quotation websites, but they’re generally unreliable. (They misattribute everything, usually to Mark Twain.)
To answer such questions, you need Google Book Search, the tool that magically scours the texts of millions of digitized volumes. Just find the little “more” tab at the top of the Google results page — it’s right past Images, Videos, and News. Then click on it, find “Books,” and click on that. (That’s if you’re at your desk. On mobile, good luck locating it anywhere.)Google Book Search is amazing that way. When it started almost 15 years ago, it also seemed impossibly ambitious: An upstart tech company that had just tamed and organized the vast informational jungle of the web would now extend the reach of its search box into the offline world. By scanning millions of printed books from the libraries with which it partnered, it would import the entire body of pre-internet writing into its database.“You have thousands of years of human knowledge, and probably the highest-quality knowledge is captured in books,” Google cofounder Sergey Brin told The New Yorker at the time. “So not having that — it’s just too big an omission.”Today, Google is known for its moonshot culture, its willingness to take on gigantic challenges at global scale. Books was, by general agreement of veteran Googlers, the company’s first lunar mission. Scan All The Books!In its youth, Google Books inspired the world with a vision of a “library of utopia” that would extend online convenience to offline wisdom. At the time it seemed like a singularity for the written word: We’d upload all those pages into the ether, and they would somehow produce a phase-shift in human awareness. Instead, Google Books has settled into a quiet middle age of sourcing quotes and serving up snippets of text from the 25 million-plus tomes in its database.Google employees maintain that’s all they ever intended to achieve. Maybe so. But they sure got everyone else’s hopes up.Two things happened to Google Books on the way from moonshot vision to mundane reality. Soon after launch, it quickly fell from the idealistic ether into a legal bog, as authors fought Google’s right to index copyrighted works and publishers maneuvered to protect their industry from being Napsterized. A decade-long legal battle followed — one that finally ended last year, when the US Supreme Court turned down an appeal by the Authors Guild and definitively lifted the legal cloud that had so long hovered over Google’s book-related ambitions.But in that time, another change had come over Google Books, one that’s not all that unusual for institutions and people who get caught up in decade-long legal battles: It lost its drive and ambition.When I started work on this story, I feared at first that Books no longer existed as a discrete part of the Google organization — that Google had actually shut the project down. As with many aspects of Google, there’s always been some secrecy around Google Books, but this time, when I started asking questions, it closed up like a startled turtle. For weeks there didn’t seem to be anyone around or available who could or would speak to the current state of the Books effort.The Google Books “History” page trails off in 2007, and its blog stopped updating in 2012, after which it got folded into the main Google Search blog, where information about Books is nearly impossible to find. As a functioning and useful service, Google Books remained a going concern. But as a living project, with plans and announcements and institutional visibility, it seemed to have pulled a vanishing act. All of which felt weird, given the legal victory it had finally won.When I talked to alumni of the project who’d left Google, several mentioned that they suspected the company had stopped scanning books. Eventually, I learned that there are, indeed, still some Googlers working on Book Search, and they’re still adding new books, though at a significantly slower pacethan at the project’s peak around 2010–11.“We’re not focused on shiny features and things that are very visible to users,” says Stephane Jaskiewicz, a Google engineer who has worked on Books for a decade and now leads its team. “It’s more like behind the scenes work and perfecting the technology — acquiring content, processing it properly so that we can view the entire book online, and adjusting the search algorithm.”One focus of work has been a constant throughout Google Books’ life: improving the scanners that add new books to the “corpus,” as the database is known. At the birth of the project, in 2002, as Larry Page and Marissa Mayer set out to gauge how long it might take to Scan All The Books, they set up a digital camera on a stand and timed themselves with a metronome. Once the company got serious about ramping its scanning up to efficient scale, it started jealously guarding details of the operation.Jaskiewicz does say that the scanning stations keep evolving, with new revisions rolling out every six months. LED lighting, not widely available at the project’s start, has helped. So has studying more efficient techniques for human operators to flip pages. “It’s almost like finger-picking on a guitar,” Jaskiewicz says. “So we find people who have great ways of turning pages — where is the thumb and that kind of stuff.”Still, the bulk of the work at Google Books continues to be on “search quality” — making sure that you find the Kafka passage you need, fast. It’s an unglamorous game of inches — less moonshot and more, say, satellite maintenance.
To understand how Google Books arrived at this point, you need to know a few things about copyright law, which essentially divides books into three classes. Some books are in the public domain, which means you can do what you want with their texts — mostly, those published before 1923, as well as more recent books whose authors chose to release them from standard copyright. Plenty of more recent books are still in print and under copyright; if you want to do anything with these texts, you have to come to terms with their authors and publishers.
Then there’s the third category: books that are out of print but still under copyright, known informally as “orphan works.” It turns out there are a whole lot of these — “between 17 percent and 25 percent of published works and as much as 70 percent of specialized collections,” a study by the US Copyright Office suggests.
How many books is that? No one knows for sure because no one can say with any certainty exactly how many total books there are. The statistic depends on how you define “book,” which isn’t as easy as it sounds. In 2010 a Google engineer named Leonid Taycher wrote a blog post that examined Google Books’ metadata and concluded that the number (then) was about 130 million. Others looked at this work and called it “bunk.” The actual number is probably somewhat lower than Taycher’s figure yet considerably higher than Google Books’ current 25 million-plus.
Some large chunk of that large number, then, are “orphan works.” And until recently, they weren’t much of an issue. You could borrow them from a library or find them in a used bookstore, and that was that. But once Google Books proposed to scan them all and make them available to the internet, everyone seemed to want a piece of them.
The legal battle that ensued was, essentially, a custody fight over these orphans, in which Google, publishers, and authors each sought to control the process of ushering them into a new home for the digital age. The three parties eventually agreed on a grand compromise known as the Google Books Settlement, under which Google would go ahead and make the orphan works available in their entirety and set aside money to compensate rights holders who stepped forward. But in 2011, a federal judge rejected the settlement, ruling in favor of advocates who feared it would forever ensconce a private for-profit company as the registrar and toll collector of the universe’s library.
Once the settlement collapsed, Google went back to its scanning, and publishers pursued the burgeoning business of selling e-books, which had leapfrogged Google’s lead in the future-of-books race due to the success of Amazon’s Kindle. But the Authors Guild continued to press its lawsuit, charging that Google’s arrogation of the right to scan and index books without the permission of copyright holders was illegal. Google is wealthy, but not so wealthy that it could ignore the threat of multi-billion dollar copyright infringement penalties (thousands of dollars per book for millions of books). This was the proceeding that dragged on until the Supreme Court put it out of its misery last year — establishing once and for all that Google had a fair-use right to catalogue books and provide brief excerpts (“snippets”) in search results, just as it did with web pages.
That ruling represents a foundational achievement for the future of online research—Google’s and everyone else’s. “It’s now established precedent — everyone benefits,” says Erin Simon, Google Books’ product counsel today. “This is going to be in textbooks. It’s supremely important for understanding what fair use means.” (Simon also notes with a chuckle that when the suit was originally filed, she hadn’t yet started law school.)
The Authors Guild may have lost in court, but it believes the fight was worth it. Google “did it wrong from the beginning,” says James Gleick, president of the Guild’s board. “They plowed ahead without involving the creative community on whose backs they were building this new thing. The big companies have a droit du seigneur attitude toward creative work. They think, ‘We are the masters of the universe now.’ They should have just licensed the books instead.”
You’d think a Supreme Court victory would have meant a renewal of energy for Google Books: Rev up the scanners — full speed ahead! By all the evidence, that has not been the case. Partly that’s because the database is so huge already. “We have a fixed budget that we’re spending,” says Jaskiewicz. “At the beginning, we were scanning everything on every shelf. At some point we started getting a lot of duplicates.” Today Google gives its partner libraries “pick lists” instead.
There are plenty of other explanations for the dampening of Google’s ardor: The bad taste left from the lawsuits. The rise of shiny and exciting new ventures with more immediate payoffs. And also: the dawning realization that Scanning All The Books, however useful, might not change the world in any fundamental way.
To many bibliophiles, Google’s self-appointment as universal librarian never made sense: That role properly belonged to some public institution. Once Google popularized the notion that Scanning All The Books was a feasible undertaking, others lined up to tackle it. Brewster Kahle’s Internet Archive, which stores historical snapshots of the whole web, already had its own scanning operation. The Digital Public Library of America grew out of meetings at Harvard’s Berkman Center beginning in 2010 and now serves as a clearinghouse and consortium for the digital collections of many libraries and institutions.
When Google partnered with university libraries to scan their collections, it had agreed to give them each a copy of the scanning data, and in 2011 the HathiTrust began organizing and sharing those files. (It had to fend off the Authors Guild in court, too.) HathiTrust has 125 member organizations and institutions who “believe that we can better steward research and cultural heritage by working together than alone or by leaving it to an organization like Google,” says Mike Furlough, the trust’s director. And of course there’s the Library of Congress itself, whose new leader, Carla Hayden, has committed to opening up public access to its collections through digitization.
In a sense each of these outfits is a competitor to Google Books. But in reality, Google is so far ahead that none of them is likely to catch up. The consensus among observers is that it cost Google several hundred million dollars to build Google Books, and nobody else is going to spend that kind of money to perform the feat a second time.
Still, the nonprofits have a strength Google lacks: They’re not subject to the changing priorities of a gigantic technology corporation. They have a focused commitment around books, unencumbered by distractions like running one of the largest advertising businesses in the world or managing a smartphone ecosystem. Unlike Google, they’re not going to lose interest in seeking new ways to connect readers with books that might, a la Kafka, melt a frozen mind.
In popular mythology, interminable lawsuits turn into hungry maelstroms that drown the participants. (The archetype is Dickens’ Jarndyce v. Jarndyce from Bleak House, the generations-spanning estate fight whose legal fees eat up all the assets at stake.) In the tech business, court battles like the celebrated antitrust suit that plagued IBM for years tend to pinion giant corporations and provide new competitors with an opening to lap an incumbent. Google itself rose to dominate search while Microsoft was busy defending itself from the Justice Department.
Yet the Books fight was never as central to Google’s corporate being as that kind of all-consuming conflict. And it wasn’t all a waste, either. It taught Google something valuable.
As the Authors Guild’s Gleick points out, Google started Books with a “better ask forgiveness than permission” attitude that’s common today in the world of startups. In a sense, the company behaved like the Uber of intellectual property — a kind of read-sharing service — while expecting to be seen the way it saw itself, as a beneficent pantheon of wizards serving the entire human species. It was naive, and the stubborn opposition it aroused came as a shock.
But Google took away a lesson that helped it immeasurably as it grew and gained power: Engineering is great, but it’s not the answer to all problems. Sometimes you have to play politics, too — consult stakeholders, line up allies, compromise with rivals. As a result, Google assembled a crew of lobbyists and lawyers and approached other similar challenges — like navigating YouTube’s rights maze — with greater care and better results. It grew up. It came to understand that it could shoot for the moon, but it wouldn’t always get there.
It’s possible that Google might someday take another run at solving the orphan works problem. But it looks like it’s going to wait for others to take the lead. “I don’t know that there’s anything that we could do without a different legal framework,” says Jaskiewicz.
As I worked on this piece, I kept thinking back to a book I’d read a few years ago called Mr. Penumbra’s 24-Hour Bookstore, a whimsical, nerdy novel by Robin Sloan. It’s about a secret society dedicated to solving a centuries-old Name of the Rose-style mystery that’s rooted in bookmaking and typography. Google plays a critical supporting role in Penumbra, as the protagonist attempts to unravel the riddle at the story’s heart. As it turns out, even the company’s unrivaled informational prowess isn’t enough to do the trick. That takes a chance encounter between the protagonist and a particular book that provides an illuminating insight. It takes, in the phrase with which Sloan closes his tale, “exactly the right book, at exactly the right time.”
Penumbra reminds us that Google’s engineering mindset isn’t omnipotent. Breaking a challenge into approachable pieces, turning it into data, and applying efficient routines is a powerful way to work. It can carry you a good distance toward a “library of utopia,” but it won’t get you there.
And even if you get there, it isn’t utopia, anyway. The hard labor is still ahead. That’s because when you turn a book into data, you make it easy to find quotes and search snippets, but you don’t make it fundamentally easier to do the work of reading the book — that irreplaceable experience of allowing one’s own mind to be temporarily inhabited by the voice of another person.
To date, the full experience of reading a book requires human beings at both ends. An index like Google Books helps us find and analyze texts but, so far, making use of them is still our job. Maybe the quest to digitize all books was bound to end in disappointment, with no grand epiphany.
Like many tech-friendly bibliophiles, Sloan says he uses Google Books a lot, but is sad that it isn’t continuing to evolve and amaze us. “I wish it was a big glittering beautiful useful thing that was growing and getting more interesting all the time,” he says. He also wonders: We know Google can’t legally make its millions of books available for anyone to read in full — but what if it made them available for machines to read?
Machine-learning tools that analyze texts in new ways are advancing quickly today, Sloan notes, and “the culture around it has a real Homebrew Computer Club or early web feel to it right now.” But to progress, researchers need big troves of data to feed their programs.
“If Google could find a way to take that corpus, sliced and diced by genre, topic, time period, all the ways you can divide it, and make that available to machine-learning researchers and hobbyists at universities and out in the wild, I’ll bet there’s some really interesting work that could come out of that. Nobody knows what,” Sloan says. He assumes Google is already doing this internally. Jaskiewicz and others at Google would not say.
Maybe, when some neural network of the future achieves self-awareness and find itself paralyzed by Kafka-esque existential doubts, it will find solace, as so many of us do, in finding exactly the right book to shatter its psychic ice. Or maybe, unlike us, it will be able to read all the books we’ve scanned — really read them, in a way that makes sense of them. What would it do then?
1 note · View note
LARB presents an excerpt from Geert Lovink’s latest book, Sad by Design: On Platform Nihilism, which was released this month by Pluto Press.
¤
“Solitary tears are not wasted.” — René Char
“I dreamt about autocorrect last night.” — Darcie Wilder
“The personal is impersonal.”  — Mark Fisher
Try and dream, if you can, of a mourning app. The mobile has come dangerously close to our psychic bone, to the point where the two can no longer be separated. If only my phone could gently weep. McLuhan’s “extensions of man” has imploded right into the exhausted self. Social media and the psyche have fused, turning daily life into a “social reality” that — much like artificial and virtual reality — is overtaking our perception of the world and its inhabitants. Social reality is a corporate hybrid between handheld media and the psychic structure of the user. It’s a distributed form of social ranking that can no longer be reduced to the interests of state and corporate platforms. As online subjects, we too are implicit, far too deeply involved. Likes and followers define your social status. But what happens when nothing can motivate you anymore, when all the self-optimization techniques fail and you begin to carefully avoid these forms of emotional analytics? Compared to others your ranking is low — and this makes you sad.
Omnipresent social media places a claim on our elapsed time, our fractured lives. We’re all sad in our very own way. As there are no lulls or quiet moments anymore, the result is fatigue, depletion, and loss of energy. We’re becoming obsessed with waiting. How long have you been forgotten by your love ones? Time, meticulously measured on every app, tells us right to our face. Chronos hurts. Should I post something to attract attention and show I’m still here? Nobody likes me anymore. As the random messages keep relentlessly piling in, there’s no way to halt them, to take a moment and think it all through.
Delacroix once declared that every day which is not noted is like a day that does not exist. Diary writing used to fulfil that task. Elements of early blog culture tried to update the diary form for the online realm, but that moment has now passed. Unlike the blog entries of the Web 2.0 era, social media have surpassed the summary stage of the diary in a desperate attempt to keep up with real-time regime. Instagram Stories, for example, bring back the nostalgia of an unfolding chain of events — and then disappear at the end of the day, like a revenge act, a satire of ancient sentiments gone by. Storage will make the pain permanent. Better forget about it and move on.
In the online context, sadness appears as a short moment of indecisiveness, a flash that opens up the possibility of a reflection. The frequently used “sad” label is a vehicle, a strange attractor to enter the liquid mess called social media. Sadness is a container. Each and every situation can potentially be qualified as sad. Through this mild form of suffering we enter the blues of being in the world. When something’s sad, things around it become gray. You trust the machine because you feel you’re in control of it. You want to go from zero to hero. But then your propped-up ego implodes and the failure of self-esteem becomes apparent again.
The price of self-control in an age of instant gratification is high. We long to revolt against the restless zombie inside us, but we don’t know how. Our psychic armor is thin and eroded from within, open to behavioral modifications. Sadness arises at the point when we’re exhausted by the online world. After yet another app session in which we failed to make a date, purchased a ticket, and did a quick round of videos, the post-dopamine mood hits us hard. The sheer busyness and self-importance of the world makes you feel joyless. After a dive into the network, we’re drained and feel socially awkward. The swiping finger is tired, and we have to stop.
Sadness has neighboring feelings we can check out. There is the sense of worthlessness, blankness, joylessness, the fear of accelerating boredom, the feeling of nothingness, plain self-hatred while trying to get off drug dependency, those lapses of self-esteem, the laying low in the mornings, those moments of being overtaken by a sense of dread and alienation, up to your neck in crippling anxiety, there is the self-violence, panic attacks, and deep despondency before we cycle all the way back to reoccurring despair. We can go into the deep emotional territory of the Russian toska. Or we can think of online sadness as part of that moment of cosmic loneliness Camus imagined after God created the earth. I wish that every chat were never ending. But what do you do when your inability to respond takes over? You’re heartbroken and delete the session. After yet another stretch of compulsory engagement with those cruel Likes, silly comments, empty text messages, detached emails, and vacuous selfies, you feel empty and indifferent. You hover for a moment, vaguely unsatisfied. You want to stay calm, yet start to lose your edge, disgusted by your own Facebook Memories. But what’s this message that just came in? Strange. Did he respond?
Evidence that sadness today is designed is overwhelming. Take the social reality of WhatsApp. The gray and blue tick marks alongside each message in the app may seem a trivial detail, but let’s not ignore the mass anxiety it’s causing. Forget being ignored. Forget pretending you didn’t read a friend’s text. Some thought that this feature already existed, but in fact two gray tick marks signify only that a message was sent and received — not read. Even if you know what the double tick syndrome is about, it still incites jealousy, anxiety, and suspicion. It may be possible that ignorance is bliss, that by intentionally not knowing whether the person has seen or received the message, your relationship will improve. The bare-all nature of social media causes rifts between lovers who would rather not have this information. But in the information age, this does not bode well with the social pressure to be “on social,” as the Italians call it.
We should be careful to distinguish sadness from anomalies such as suicide, depression, and burnout. Everything and everyone can be called sad, but not everyone is depressed. Much like boredom, sadness is not a medical condition (though never say never because everything can be turned into one). No matter how brief and mild, sadness is the default mental state of the online billions. Its original intensity gets dissipated. It seeps out, becoming a general atmosphere, a chronic background condition. Occasionally — for a brief moment — we feel the loss. A seething rage emerges. After checking for the 10th time what someone said on Instagram, the pain of the social makes us feel miserable, and we put the phone away. Am I suffering from the phantom vibration syndrome? Wouldn’t it be nice if we were offline? Why’s life so tragic? He blocked me. At night, you read through the thread again. Do we need to quit again, to go cold turkey again? Others are supposed to move us, to arouse us, and yet we don’t feel anything anymore. The heart is frozen.
Social media anxiety has found its literary expressions, even if these take decidedly different forms than the despair on display in Franz Kafka’s letters to Felice Bauer. The willingness to publicly perform your own mental health is now a viable strategy in our attention economy. Take L.A. writer Melissa Broder, whose So Sad Today “twitterature” benefited from her previous literary activities as a poet. Broder is the contemporary expert in matters of apathy, sorrow, and uselessness. During one afternoon she can feel compulsive about cheesecakes, show her true self as an online exhibitionist, be lonely out in public, babble and then cry, go on about her short attention span, hate everything, and desire “to fuck up life.” In between taking care of her sick husband and the obligatory meeting with Santa Monica socialites, there are always more “insatiable spiritual holes” to be filled. The more we intensify events, the sadder we are once they’re over. The moment we leave, the urge for the next experiential high arises. As phone and life can no longer be separated, neither can we distinguish between real and virtual, fact or fiction, data or poetry. Broder’s polyamorous lifestyle is an integral part of the precarious condition. Instead of empathy, the cold despair invites us to see the larger picture of a society in permanent anxiety. If anything, Broder embodies Slavoj Žižek’s courage of hopelessness: “Forget the light at the end of the tunnel — it’s actually the headlight of a train about to hit us.”
Once the excitement has worn off, we seek distance, searching for mental detachment. The wish for “anti-experience” arises, as Mark Greif has described it. The reduction of feeling is an essential part of what he calls “the anaesthetic ideology.” If experience is the “habit of creating isolated moments within raw occurrence in order to save and recount them,” the desire to anaesthetize experience is a kind of immune response against “the stimulations of another modern novelty, the total aesthetic environment.”
Most of the time your eyes are glued to a screen, as if it’s now or never. As Gloria Estefan summarized the FOMO condition: “The sad truth is that opportunity doesn’t knock twice.” Then, you stand up and walk away from the intrusions. The fear of missing out backfires, the social battery is empty and you put the phone aside. This is the moment sadness arises. It’s all been too much, the intake has been pulverized and you shut down for a moment, poisoning him with your unanswered messages. According to Greif, “the hallmark of the conversion to anti-experience is a lowered threshold for eventfulness.” A Facebook event is the one you’re interested in, but do not attend. We observe others around us, yet are no longer part of the conversation: “They are nature’s creatures, in the full grace of modernity. The sad truth is that you still want to live in their world. It just somehow seems this world has changed to exile you.” You leave the online arena; you need to rest. This is an inverse movement from the constant quest for experience. That is, until we turn our heads away, grab the phone, swipe, and text back. God only knows what I’d be without the app.
Anxieties that go untreated build up to a breaking point. Yet unlike burnout, sadness is a continuous state of mind. Sadness pops up the second events start to fade away — and now you’re down in the rabbit hole once more. The perpetual now can no longer be captured and leaves us isolated, a scattered set of online subjects. What happens when the soul is caught in the permanent present? Is this what Franco Berardi calls the “slow cancellation of the future”? By scrolling, swiping, and flipping, we hungry ghosts try to fill the existential emptiness, frantically searching for a determining sign — and failing. When the phone hurts and you cry together, that’s technological sadness. “I miss your voice. Call, don’t text.”
We overcome sadness not through happiness, but rather, as Andrew Culp insisted, through a hatred of this world. Sadness occurs in situations where the stagnant “becoming” has turned into a blatant lie. We suffer, and there’s no form of absurdism that can offer an escape. Public access to a 21st-century version of Dadaism has been blocked. The absence of surrealism hurts. What could our social fantasies look like? Are legal constructs such as creative commons and cooperatives all we can come up with? It seems we’re trapped in smoothness, skimming a surface littered with impressions and notifications. The collective imaginary is on hold. What’s worse, this banality itself is seamless, offering no indicators of its dangers and distortions. As a result, we’ve become subdued. Has the possibility of myth become technologically impossible? Instead of creatively externalizing our inner shipwrecks, we project our need for strangeness on humanized robots. The digital is neither new nor old, but — to use Culp’s phrase — it will become cataclysmic when smooth services fall apart into tragic ruins. Faced with the limited possibilities of the individual domain, we cannot positively identify with the tragic manifestation of the collective being called social media. We can neither return to mysticism nor to positivism. The naïve act of communication is lost — and this is why we cry.
¤
Geert Lovink is a media theorist and internet critic and the author of Zero Comments, Networks Without a Cause, Social Media Abyss, and Sad by Design: On Platform Nihilism. He founded the Institute of Network Cultures at the Amsterdam University of Applied Sciences and teaches at the European Graduate School. He stopped using Facebook in 2010.
The post This Is Why We Cry: From “Sad by Design: On Platform Nihilism” appeared first on Los Angeles Review of Books.
from Los Angeles Review of Books http://bit.ly/2YAr2Re
0 notes
thegloober · 6 years
Text
Pretty low level, pretty big deal: Apache Kafka and Confluent Open Source go mainstream
Featured stories
Apache Kafka, the open-source distributed messaging system, has steadily carved a foothold as the de facto real-time standard for brokering messages in scale-out environments. And if you think you have seen this opener before, it’s because you have.
Also: Pulsar graduates to being an Apache top-level project
Besides being fellow ZDNet’s Tony Baer opener for his piece commenting on Kafka usage survey in July, you’ve probably read something along these lines elsewhere, or had that feeling yourself. Yes, Kafka is in most whiteboards, but it’s mostly the whiteboards of early adopters, was the gist of Baer’s analysis.
With Kafka Summit kicking off today San Francisco, we took the opportunity for a chat with Jay Kreps, Kafka co-creator and Confluent CEO, on all things Kafka, as well as the broader landscape.
Going mainstream
Kreps indicated his belief that in the last year Kafka has actually gone mainstream. As evidence to back this claim, he cited use cases in four out of five biggest banks in the US, as well as the Bank of Canada: “These are 200 year-old organizations, and they don’t just jump at the first technology out of Silicon Valley. We are going mainstream in a big way,” Kreps asserted, while also mentioning big retail use cases.
While we have no reason to question these use cases, it’s hard to assess whether this translates to adoption in the majority of the market as well. Traditionally, big finance and retail are on the forefront of real-time use case adoption.
Also: We interrupt this revolution: Apache Spark changes the rules of the game
Still, it may take a while for this to spill over, so it depends on what one considers “mainstream.” Looking at Kafka Summit, however, we see a mix of Confluent staff and household names, which is the norm for events of this magnitude.
But what is driving this adoption? Something pretty low level, which is a pretty big deal, according to Kreps: The ability to integrate disparate systems via messaging, and to do this at scale and in real time. It’s not that this is a novel idea – messaging has been around for a while and it’s the key premise of Enterprise Service Bus (ESB) solutions for years.
Conceptually, Kafka is not all that different. The difference, Kreps said, is that older systems were not able to handle the scale that Kafka can: “We can scale to trillions of messages. New style, cloud data systems are just better at this, such techniques did not exist before. We benefited as we came around a bit later.”
Going cloud and real-time
The cloud is something Kreps emphasized, and the discussion around the latest developments in the field was centered around it. The recent Cloudera – Hortonworks merger, for example, touches upon this as well, according to Kreps.
“It was a smart move. These were two companies competing on the same product, which makes the competition more fierce, ironically. You’d think it’s people with different views that compete more fiercely, but it’s actually people with similar views. That really showed also in the business model,” Kreps said.
Also: Kafka: The story so far
Kreps believes that this competition slowed down progress in core Hadoop, as the need for differentiation resulted in more attention towards edge features. Case in point, he noted, the fact that HDFS, Hadoop’s file system, which historically has been a key component of its value proposition, is no longer the most economic way to store loads of data — cloud storage is now.
This could also be interpreted as a sign of moving away from batch processing that Hadoop started from and more toward real-time processing. Although Hadoop has been gradually grown to a full ecosystem, including streaming engines, the majority of its use cases are still batch-oriented, believes Kreps. How this will evolve, time will tell.
The cloud is gaining gravity in terms of data, and data-infrastructure platforms need to work both there and on premise. (Image: ktsimage, Getty Images/iStockphoto)
Despite Kreps pointing out the cloud as a gravitational point, and Hadoop actually moving toward it in the last couple of years, Confluent is not going to pursue a cloud-only policy. As opposed to data science workloads, which can be hosted either on premise or in the cloud, the kind of data infrastructure that Kafka provided must work on both, argued Kreps.
Since many organizations still have huge investments in software and infrastructure built over years in their data centers, any move to the cloud will be gradual. Confluent’s hosted version of Kafka plus proprietary extensions will continue to work seamlessly with on-premise Kafka or Confluent open source, said Kreps. He also emphasized Kafka support for Kubernetes, noting that any stateful data system has to put in some effort to make this work.
Streaming coopetition and real-time machine learning
In terms of differentiation with other streaming platforms, Kreps pointed out that these are mostly geared toward analytics, while Kafka is infrastructure on which operational systems can be, and are, built. When wondering whether Kafka could also be moving in the analytics direction, Kreps did not give any such indication, and questioned the applicability of real-time machine learning (ML):
Also: An inside look at Apache Kafka adoption TechRepublic
“What is the use of a real-time machine learning platform? When i was in school, ironically the focus of my advisors was real-time ML — ironically, because ML was not very popular back then, let alone real-time ML.
We were struggling to name a mainstream production system using real-time ML. And the idea of having a ML algorithm retrain itself in real-time is not necessarily positive. Most of the time, the effort is to have enough checks and balances in places to make sure ML really works even when working with batch data.
And if you look at ML algorithms built by people who build databases and infrastructure, they are never as good, which is normal. There is a separate ecosystem for data science, and the best stuff is separate from the big infrastructure projects.
The reality is that Spark machine learning is mostly used for offline ML. Streaming brings together all the data needed for this, and Kafka works with other streaming platforms, too.”
Kafka is a key element of the streaming landscape, but it also works complementary to other streaming platforms.
More often than not, Kafka seems to be mentioned in the same breath, or whiteboard, with a number of other systems, including streaming ones. Although some might say this means it will be hard for Kafka to come into its own, its position in those architectures also means it’s equally hard to take it out of the equation.
Although no big announcement is reserved for this Kafka Summit, Kafka and Confluent have had a few of those in the last year — KSQL and version 5.0 being the most prominent ones — and seems to be well on the way to the mainstream.
Previous and related coverage:
Confluent release adds enterprise, developer, IoT savvy to Apache Kafka
Confluent, the company founded by the creators of streaming data platform Apache Kafka, is announcing a new release today. Confluent Platform 5.0, based on yesterday’s release of open source Kafka 2.0, adds enterprise security, new disaster recovery capabilities, lots of developer features, and important IoT support.
Hortonworks ups its Kafka Game
Ahead of the Strata conference next month, Hortonworks is focusing on streaming data as it introduces a new Kafka management tool and adds some refinements to its DataFlow product.
Kafka is establishing its toehold
Data pipelines were the headline from the third annual survey of Apache Kafka use. Behind anecdotal evidence of a growing user base, Kafka is still at the early adopter stage and skills remain hard to find.
Confluent brings fully-managed Kafka to the Google Cloud Platform
The partnership between Confluent and Google extends the Kafka ecosystem, making it easier to consume with Google Cloud services for machine learning, analytics and more.
Source: https://bloghyped.com/pretty-low-level-pretty-big-deal-apache-kafka-and-confluent-open-source-go-mainstream/
0 notes