#you can check if you’re invested in the soap opera that is my blog
Explore tagged Tumblr posts
great-tusk · 7 months ago
Text
adding two f/os but they immediately go to main. I’m like YOU HAVE BEEN PROMOTED! YOU ARE NOW TWO OF MY ELITE EMPLOYEES 😍😍😍.
5 notes · View notes
tinyozlion · 2 years ago
Text
Parsing Gundam Wing: 
A Field Guide to a Classic Anime
Tumblr media
So, you’re watching Gundam Wing.
Maybe you’re seeing it for the first time, maybe you’re returning to it as a fan of yore, maybe, like me, you’re trying to introduce it to a friend. Perhaps you are watching the English dub, or with English subtitles– doesn’t matter. There’s only one script for both. You’re trying to make it through the first episode, and oh baby, oh baby– What is this pacing? Who are these people? Why do they talk like that? What the fuck is a battle seed? Yes, the robots are cool, but the rest seems like a cursed soap opera. You are understandably hesitant to continue.
 This is normal. 
Take my hand. Shh, shh, come to my arms. You are safe now. I’m here to help. Welcome, brave explorer, to a classic 90's anime that once took America by storm. It’s very good. No, really, I promise, it’s very good.
Now, I can’t promise that you’ll love it! If the Space Opera And Ethics Course With Hot People and Robot Fights genre isn't for you, that's fine, take some hummus and go in peace my friend.
But the rest of you… we are brothers now.
Gundam Wing was lightning in a bottle when it first came out in North America, but even if that particular explosive debut will never come again, I think there’s still a wealth of enjoyment to be had for new fans– especially if like me you love stories that have a lot of depth, a lot of crunch, a lot of rewatch value, a lot of iterative shipping possibilities.
Tumblr media
I realize that we live in an era of total media saturation; the notion of investing time in ANYTHING new these days is a tough sell. There are infinitely many things to watch, read, play, listen to-- so how do you justify getting into something that’s kinda old and a little janky? With some parts that haven’t aged as well as others? Something that maybe takes a bit of extra participation to get into?
...As a spoonful of medicine to help ease the media-fatigue, let me share with you something that I've found true in my life: rough edges invite you to participate in creation.
Stories that invite (or demand, or beg) you to participate in them have their own unique value, completely distinct from the value of those that satisfy you immediately; they will reward you for investing in them your time, your creativity, your curiosity.
And I think this is one of those stories! I genuinely want more people to play in this sandbox with me! So, here I am, making a blog about it.
Tumblr media
--This is my guidebook to getting into Gundam Wing. It is intended to be a sort of companion, a primer-- maybe a walkthrough? An invitation? to this complex and juicy series.
This won't be a fan wiki-- those already exist, thank goodness, and people can check those out for themselves-- rather, I'd like to make this a repository of context and insight, and to fill in those troublesome missing pieces in the show that are big stumbling blocks to understanding what's going on!
I want to give you what I didn’t have when I started this series back in anno domini two-thousand-and-naught:
All the damn information.
Naturally, there will be more drawings, memes, crabs, very stupid gifs, and probably some crying throughout. And you have my word that I will try and present this in a comprehensive, inclusive, character-agnostic way.
...That said, I am not stealthy! I cannot possibly disguise the fact that I’ve given exponentially more thought to certain characters than to others. You will almost certainly be able to tell when you’ve encountered my Blorbos, my Special Interests--
Tumblr media Tumblr media
(Shh, Tinylion! Back in your teapot!)
--But! Look into my eyes. Here is an Absolute, Unwavering Truth about me:
I love, or at least love something about, Every. Single. Character in Gundam Wing. and I will do my absolute damndest to give them each their due diligence, because they're worth it, and so are you.
Thank you for coming, I hope you enjoy whatever you find here ♥
~ Wesley, and to a lesser extent, TinyLion
30 notes · View notes
lillianfromaccounting · 8 years ago
Note
Hey there! What are the fanfics that stuck with you, your all time favorites that need to be read and why?
Hi there nonny. This is going to be a really long post and I actually have a fic rec post I’ve been working on for two months+ now, so I’ll post that later (I want to link fics and I can’t do that at work, since most of the blogs I follow are nsfw). I’ve seen a few of these ‘which fics do you recommend’ posts floating around and I just want to mention how fic enjoyment is ultimately subjective. What I enjoy might not float your boat, etc. so please keep that in mind.
I also realized why I keep going back to certain authors, and a lot of it has to do with the fact that I trust their work. I’ve seen a lot of writing posts where it’s like “don’t be like others, write different things, set yourself apart” etc. etc. etc. and that’s wonderful if that’s what you want to do as a writer, but as a reader, I say give me 1000 coffee shop AUs because that’s what I’m comfortable with. That’s what I want when I escape–the safety and comfort of knowing that I will most likely enjoy this piece in the five or ten minutes I have to myself today. It’s not that I don’t like to read outside my comfort zone–I do, but it’s more of an emotional/mental investment for me, which is why I would readily read/reblog a coffee shop au fic before diving into the first chapter of that amazing multi-chapter fic everyone keeps reblogging. (and yes, I’ve been passive aggressively shamed for “reading the same things over and over” so I felt I need to put this here)
I was kinda hesitant to answer this ask at first because I’m not sure if you’re looking to expand your reading selection or trying to stir up shit or looking for validation or what not, so I’m sorry if I ultimately disappointed you if you were expecting to see a certain fic/author or yourself on the list and you’re not here.
On that note, some of my all-time favorites that have stuck with me and why (sorry about the lack of links, most of the authors have their masterlists in their profiles):
I will read almost anything by @fvckingavengers @just-call-me-mrs-captain @bovaria @avengersandchill @knittingknerdy because of what I said above with trust. I trust these authors and I know what I’m getting into. It does not mean these authors are predictable, because even with JCMMC’s latest fic “Funny Thing About Perfect” I knew exactly what to expect, but I was still surprised by some elements of it. I trust @fvckingavengers‘ smut. A/B/O isn’t my jam but I will read her one A/B/O fic because I trust her POV and I trust that I will come out of the fic okay. I have read Laur’s fics ad nauseum to the point where I can probably quote some of them. I trust @bovaria‘s fluffy AUs, but moreso, I trust that her fic will be a journey of cliffhangers. I trust the experience of reading her fics. Same for @avengersandchill. I don’t always know which direction her fics will turn, but I trust her to take me on the journey. I trust @knittingknerdy‘s storytelling and I will in fact read 1000 coffee shop AUs by Nadine (go read all her coffee shop AUs). I just realized that this group of writers could almost be categorized as the AU group. I love AUs.
Writers whom I love but I have to be in the mood to read their fics: @imagine-assembling-the-avengers Bonnie is one of the queen of angst in the Marvel fanfic universe. I love, love, love her stuff. She has inspired me so much. She writes *all* the Marvel characters so well. The other queen of angst is @writingcreatingstorytelling I love Ann’s writing and while I’ve taken a huge step away from RPFs, I still love her Natris universe. Being as these are angst queens, I know that I will cry when I read their works (this is not a bad thing). I just have to emotionally prep myself for it. Their fics really are beautiful and you should check them out.
All-around medal goes to @emilyevanston I will be the first to admit that I have not read all of her stuff yet. She writes amazing fics but most of them are longer form and require some emotional investment on my part. I absolutely LOVE her Playing It Safe fic, because she fixes all the issues I have with that movie. She writes just about everything: fluff, smut, angst, RPF, Marvel, steve, bucky, stucky, and so much more. She writes about heavy life themes but also really light stuff and everything in between. Go check out her masterlist because there’s a good chance that there is something there that you might like.
Fluffy and occasionally smutty, @avenger-nerd-mom writes fun fics that explores emotional consequences of various relationship situations. She’s written plus-sized characters, original stuff, and lots and lots of Chris. I loved her characterization of Chris and Emery so much in her Georgia on My Mind series that I almost quit writing, because I thought that nothing I could come up with would be better than that story, and I meant it in terms of writing for myself, not that I needed to write a better story than her. I meant there was nothing that I could write for myself, in my own little brain, that would be better than that Chris story. (I’ve since changed my mind, because I realized I needed other elements for myself, but there was a time when that fic met all my Chris delusional fantasy needs.) She’s currently writing what looks like an amazing collab with @devikafernando called Educating Thalia that you should check out! (unfortunately, professor/student is one of my handful of squicks so I won’t be reading it)
@sfdce also writes a hodgepodge of Chris and Steve fics that range from super duper floofy to super duper heavy material. I haven’t read all of Chris’ stuff, but every one that I’ve read has always left an impression.
@master-of-duct-tape is the first Chris RPF that I read and her smut has actually inspired me to explore certain things in real life (sorry if that’s TMI). also I started writing smut because of her encouragement. I go back to her Insomnia series from time to time and I absolutely LOVE her Johnny Storm fic. Her stuff is a little darker compared to the stuff I’ve listed above, so check her tags and warnings.
I read an article yesterday that reminded me of @evansscruff‘s fics. She writes Indian female characters and I absolutely love how she weaves the culture into her fics. As an Asian-American, I wholly appreciate the often common internal conflict of figuring out one’s identity and place in the world while balancing the various cultures, plus one’s favorite celebrity. :-)
There are my Chris & Chris’ characters girls: @beccaheartschrisevans @thelookingglassalice @ariallane @heather-lynn @chrisevans-imagines @daisykane535 Each of these authors bring something different to the table. Their fics are more like a long-term series or even like soap operas, but there’s amazing character growth and development through them.
@steveandyou writes beautiful, beautiful fluffy Steve literature. I love it all.
@angryschnauzer I love her fic Taking In Strays, which I’m not caught up on yet, but it’s because the emotions and life situations she writes about are so raw. Like I can only read one chapter at a time because of the emotional investment.
@thesuperhero-sessions is amazing. absolutely amazing premise and the characters are so on point. i am behind but i love all of it.
@bucky-is-my-precious writes a really long fic called Giving You the Choice and it’s been a really fun journey that I’ve followed the past two years.
@katiekeysburg writes spy/action stuff for various fandoms (Agent Carter, Guardians of the Galaxy, The Man from UNCLE, etc.) and while she is also my bff irl, I’m so glad that she started writing and I’m so proud of all the fics she’s written! They’re really good, especially if you like spy stuff.
I know I’m missing people (sorry!). I’ll try to finish my longer fic rec post with links and more authors later. (yeah that post is longer than this one)
Also, there are a ton of writers that I haven’t even started reading yet. I have about 200 posts in my drafts of saved fics to read. I haven’t even begun to dip into the Hiddles fandom of fics yet.
Like I said, I have reasons why I read certain fics/authors. Most of them are not perfect and might not be your cup of tea, but I still love them. If you are just looking to expand your reading selection, I hope you find something that you like.
74 notes · View notes
isearchgoood · 5 years ago
Text
Better Content Through NLP (Natural Language Processing) - Whiteboard Friday
Posted by RuthBurrReedy
Gone are the days of optimizing content solely for search engines. For modern SEO, your content needs to please both robots and humans. But how do you know that what you're writing can check the boxes for both man and machine?
In today's Whiteboard Friday, Ruth Burr Reedy focuses on part of her recent MozCon 2019 talk and teaches us all about how Google uses NLP (natural language processing) to truly understand content, plus how you can harness that knowledge to better optimize what you write for people and bots alike.
Click on the whiteboard image above to open a high resolution version in a new tab!
Video Transcription
Howdy, Moz fans. I'm Ruth Burr Reedy, and I am the Vice President of Strategy at UpBuild, a boutique technical marketing agency specializing in technical SEO and advanced web analytics. I recently spoke at MozCon on a basic framework for SEO and approaching changes to our industry that thinks about SEO in the light of we are humans who are marketing to humans, but we are using a machine as the intermediary.
Those videos will be available online at some point. [Editor's note: that point is now!] But today I wanted to talk about one point from my talk that I found really interesting and that has kind of changed the way that I approach content creation, and that is the idea that writing content that is easier for Google, a robot, to understand can actually make you a better writer and help you write better content for humans. It is a win-win. 
The relationships between entities, words, and how people search
To understand how Google is currently approaching parsing content and understanding what content is about, Google is spending a lot of time and a lot of energy and a lot of money on things like neural matching and natural language processing, which seek to understand basically when people talk, what are they talking about?
This goes along with the evolution of search to be more conversational. But there are a lot of times when someone is searching, but they don't totally know what they want, and Google still wants them to get what they want because that's how Google makes money. They are spending a lot of time trying to understand the relationships between entities and between words and how people use words to search.
The example that Danny Sullivan gave online, that I think is a really great example, is if someone is experiencing the soap opera effect on their TV. If you've ever seen a soap opera, you've noticed that they look kind of weird. Someone might be experiencing that, and not knowing what that's called they can't Google soap opera effect because they don't know about it.
They might search something like, "Why does my TV look funny?" Neural matching helps Google understand that when somebody is searching "Why does my TV look funny?" one possible answer might be the soap opera effect. So they can serve up that result, and people are happy. 
Understanding salience
As we're thinking about natural language processing, a core component of natural language processing is understanding salience.
Salience, content, and entities
Salience is a one-word way to sum up to what extent is this piece of content about this specific entity? At this point Google is really good at extracting entities from a piece of content. Entities are basically nouns, people, places, things, proper nouns, regular nouns.
Entities are things, people, etc., numbers, things like that. Google is really good at taking those out and saying, "Okay, here are all of the entities that are contained within this piece of content." Salience attempts to understand how they're related to each other, because what Google is really trying to understand when they're crawling a page is: What is this page about, and is this a good example of a page about this topic?
Salience really goes into the second piece. To what extent is any given entity be the topic of a piece of content? It's often amazing the degree to which a piece of content that a person has created is not actually about anything. I think we've all experienced that.
You're searching and you come to a page and you're like, "This was too vague. This was too broad. This said that it was about one thing, but it was actually about something else. I didn't find what I needed. This wasn't good information for me." As marketers, we're often on the other side of that, trying to get our clients to say what their product actually does on their website or say, "I know you think that you created a guide to Instagram for the holidays. But you actually wrote one paragraph about the holidays and then seven paragraphs about your new Instagram tool. This is not actually a blog post about Instagram for the holidays. It's a piece of content about your tool." These are the kinds of battles that we fight as marketers. 
Natural Language Processing (NLP) APIs
Fortunately, there are now a number of different APIs that you can use to understand natural language processing: 
IBM has one: https://www.ibm.com/watson/services/natural-language-understanding/ 
Google actually has a natural language processing API that's right here on https://cloud.google.com/natural-language/
Is it as sophisticated as what they're using on their own stuff? Probably not. But you can test it out. Put in a piece of content and see (a) what entities Google is able to extract from it, and (b) how salient Google feels each of these entities is to the piece of content as a whole. Again, to what degree is this piece of content about this thing?
So this natural language processing API, which you can try for free and it's actually not that expensive for an API if you want to build a tool with it, will assign each entity that it can extract a salient score between 0 and 1, saying, "Okay, how sure are we that this piece of content is about this thing versus just containing it?"
So the higher or the closer you get to 1, the more confident the tool is that this piece of content is about this thing. 0.9 would be really, really good. 0.01 means it's there, but they're not sure how well it's related. 
A delicious example of how salience and entities work
The example I have here, and this is not taken from a real piece of content — these numbers are made up, it's just an example — is if you had a chocolate chip cookie recipe, you would want chocolate cookies or chocolate chip cookies recipe, chocolate chip cookies, something like that to be the number one entity, the most salient entity, and you would want it to have a pretty high salient score.
You would want the tool to feel pretty confident, yes, this piece of content is about this topic. But what you can also see is the other entities it's extracting and to what degree they are also salient to the topic. So you can see things like if you have a chocolate chip cookie recipe, you would expect to see things like cookie, butter, sugar, 350, which is the temperature you heat your oven, all of the different things that come together to make a chocolate chip cookie recipe.
But I think that it's really, really important for us as SEOs to understand that salience is the future of related keywords. We're beyond the time when to optimize for chocolate chip cookie recipe, we would also be looking for things like chocolate recipe, chocolate chips, chocolate cookie recipe, things like that. Stems, variants, TF-IDF, these are all older methodologies for understanding what a piece of content is about.
Instead what we need to understand is what are the entities that Google, using its vast body of knowledge, using things like Freebase, using large portions of the internet, where is Google seeing these entities co-occur at such a rate that they feel reasonably confident that a piece of content on one entity in order to be salient to that entity would include these other entities?
Using an expert is the best way to create content that's salient to a topic
So chocolate chip cookie recipe, we're now also making sure we're adding things like butter, flour, sugar. This is actually really easy to do if you actually have a chocolate chip cookie recipe to put up there. This is I think what we're going to start seeing as a content trend in SEO is that the best way to create content that is salient to a topic is to have an actual expert in that topic create that content.
Somebody with deep knowledge of a topic is naturally going to include co-occurring terms, because they know how to create something that's about what it's supposed to be about. I think what we're going to start seeing is that people are going to have to start paying more for content marketing, frankly. Unfortunately, a lot of companies seem to think that content marketing is and should be cheap.
Content marketers, I feel you on that. It sucks, and it's no longer the case. We need to start investing in content and investing in experts to create that content so that they can create that deep, rich, salient content that everybody really needs. 
How can you use this API to improve your own SEO? 
One of the things that I like to do with this kind of information is look at — and this is something that I've done for years, just not in this context — but a prime optimization target in general is pages that rank for a topic, but they rank on page 2.
What this often means is that Google understands that that keyword is a topic of the page, but it doesn't necessarily understand that it is a good piece of content on that topic, that the page is actually solely about that content, that it's a good resource. In other words, the signal is there, but it's weak.
What you can do is take content that ranks but not well, run it through this natural language API or another natural language processing tool, and look at how the entities are extracted and how Google is determining that they're related to each other. Sometimes it might be that you need to do some disambiguation. So in this example, you'll notice that while chocolate cookies is called a work of art, and I agree, cookie here is actually called other.
This is because cookie means more than one thing. There's cookies, the baked good, but then there's also cookies, the packet of data. Both of those are legitimate uses of the word "cookie." Words have multiple meanings. If you notice that Google, that this natural language processing API is having trouble correctly classifying your entities, that's a good time to go in and do some disambiguation.
Make sure that the terms surrounding that term are clearly saying, "No, I mean the baked good, not the software piece of data." That's a really great way to kind of bump up your salience. Look at whether or not you have a strong salient score for your primary entity. You'd be amazed at how many pieces of content you can plug into this tool and the top, most salient entity is still only like a 0.01, a 0.14.
A lot of times the API is like "I think this is what it's about," but it's not sure. This is a great time to go in and bump up that content, make it more robust, and look at ways that you can make those entities easier to both extract and to relate to each other. This brings me to my second point, which is my new favorite thing in the world.
Writing for humans and writing for machines, you can now do both at the same time. You no longer have to, and you really haven't had to do this in a long time, but the idea that you might keyword stuff or otherwise create content for Google that your users might not see or care about is way, way, way over.
Now you can create content for Google that also is better for users, because the tenets of machine readability and human readability are moving closer and closer together. 
Tips for writing for human and machine readability:
What I've done here is I did some research not on natural language processing, but on writing for human readability, that is advice from writers, from writing experts on how to write better, clearer, easier to read, easier to understand content.Then I pulled out the pieces of advice that also work as pieces of advice for writing for natural language processing. So natural language processing, again, is the process by which Google or really anything that might be processing language tries to understand how entities are related to each other within a given body of content.
Short, simple sentences
Short, simple sentences. Write simply. Don't use a lot of flowery language. Short sentences and try to keep it to one idea per sentence. 
One idea per sentence
If you're running on, if you've got a lot of different clauses, if you're using a lot of pronouns and it's becoming confusing what you're talking about, that's not great for readers.
It also makes it harder for machines to parse your content. 
Connect questions to answers
Then closely connecting questions to answers. So don't say, "What is the best temperature to bake cookies? Well, let me tell you a story about my grandmother and my childhood," and 500 words later here's the answer. Connect questions to answers. 
What all three of those readability tips have in common is they boil down to reducing the semantic distance between entities.
If you want natural language processing to understand that two entities in your content are closely related, move them closer together in the sentence. Move the words closer together. Reduce the clutter, reduce the fluff, reduce the number of semantic hops that a robot might have to take between one entity and another to understand the relationship, and you've now created content that is more readable because it's shorter and easier to skim, but also easier for a robot to parse and understand.
Be specific first, then explain nuance
Going back to the example of "What is the best temperature to bake chocolate chip cookies at?" Now the real answer to what is the best temperature to bake chocolate cookies is it depends. Hello. Hi, I'm an SEO, and I just answered a question with it depends. It does depend.
That is true, and that is real, but it is not a good answer. It is also not the kind of thing that a robot could extract and reproduce in, for example, voice search or a featured snippet. If somebody says, "Okay, Google, what is a good temperature to bake cookies at?" and Google says, "It depends," that helps nobody even though it's true. So in order to write for both machine and human readability, be specific first and then you can explain nuance.
Then you can go into the details. So a better, just as correct answer to "What is the temperature to bake chocolate chip cookies?" is the best temperature to bake chocolate chip cookies is usually between 325 and 425 degrees, depending on your altitude and how crisp you like your cookie. That is just as true as it depends and, in fact, means the same thing as it depends, but it's a lot more specific.
It's a lot more precise. It uses real numbers. It provides a real answer. I've shortened the distance between the question and the answer. I didn't say it depends first. I said it depends at the end. That's the kind of thing that you can do to improve readability and understanding for both humans and machines.
Get to the point (don't bury the lede)
Get to the point. Don't bury the lead. All of you journalists who try to become content marketers, and then everybody in content marketing said, "Oh, you need to wait till the end to get to your point or they won't read the whole thing,"and you were like, "Don't bury the lead," you are correct. For those of you who aren't familiar with journalism speak, not burying the lead basically means get to the point upfront, at the top.
Include all the information that somebody would really need to get from that piece of content. If they don't read anything else, they read that one paragraph and they've gotten the gist. Then people who want to go deep can go deep. That's how people actually like to consume content, and surprisingly it doesn't mean they won't read the content. It just means they don't have to read it if they don't have time, if they need a quick answer.
The same is true with machines. Get to the point upfront. Make it clear right away what the primary entity, the primary topic, the primary focus of your content is and then get into the details. You'll have a much better structured piece of content that's easier to parse on all sides. 
Avoid jargon and "marketing speak"
Avoid jargon. Avoid marketing speak. Not only is it terrible and very hard to understand. You see this a lot. I'm going back again to the example of getting your clients to say what their products do. You work with a lot of B2B companies, you will you will often run into this. Yes, but what does it do? It provides solutions to streamline the workflow and blah, blah. Okay, what does it do? This is the kind of thing that can be really, really hard for companies to get out of their own heads about, but it's so important for users, for machines.
Avoid jargon. Avoid marketing speak. Not to get too tautological, but the more esoteric a word is, the less commonly it's used. That's actually what esoteric means. What that means is the less commonly a word is used, the less likely it is that Google is going to understand its semantic relationships to other entities.
Keep it simple. Be specific. Say what you mean. Wipe out all of the jargon. By wiping out jargon and kind of marketing speak and kind of the fluff that can happen in your content, you're also, once again, reducing the semantic distances between entities, making them easier to parse. 
Organize your information to match the user journey
Organize it and map it out to the user journey. Think about the information somebody might need and the order in which they might need it. 
Break out subtopics with headings
Then break it out with subheadings. This is like very, very basic writing advice, and yet you all aren't doing it. So if you're not going to do it for your users, do it for machines. 
Format lists with bullets or numbers
You can also really impact skimmability for users by breaking out lists with bullets or numbers.
The great thing about that is that breaking out a list with bullets or numbers also makes information easier for a robot to parse and extract. If a lot of these tips seem like they're the same tips that you would use to get featured snippets, they are, because featured snippets are actually a pretty good indicator that you're creating content that a robot can find, parse, understand, and extract, and that's what you want.
So if you're targeting featured snippets, you're probably already doing a lot of these things, good job. 
Grammar and spelling count!
The last thing, which I shouldn't have to say, but I'm going to say is that grammar and spelling and punctuation and things like that absolutely do count. They count to users. They don't count to all users, but they count to users. They also count to search engines.
Things like grammar, spelling, and punctuation are very, very easy signals for a machine to find and parse. Google has been specific in things, like the "Quality Rater Guidelines,"that a well-written, well-structured, well-spelled, grammatically correct document, that these are signs of authoritativeness. I'm not saying that having a greatly spelled document is going to mean that you immediately rocket to the top of the results.
I am saying that if you're not on that stuff, it's probably going to hurt you. So take the time to make sure everything is nice and tidy. You can use vernacular English. You don't have to be perfect "AP Style Guide" all the time. But make sure that you are formatting things properly from a grammatical standpoint as well as a technical standpoint. What I love about all of this, this is just good writing.
This is good writing. It's easy to understand. It's easy to parse. It's still so hard, especially in the marketing world, to get out of that world of jargon, to get to the point, to stop writing 2,000 words because we think we need 2,000 words, to really think about are we creating content that's about what we think it's about.
Use these tools to understand how readable, parsable, and understandable your content is
So my hope for the SEO world and for you is that you can use these tools not just to think about how to dial in the perfect keyword density or whatever to get an almost perfect score on the salience in the natural language processing API. What I'm hoping is that you will use these tools to help yourself understand how readable, how parsable, and how understandable your content is, how much your content is about what you say it's about and what you think it's about so you can create better stuff for users.
It makes the internet a better place, and it will probably make you some money as well. So these are my thoughts. I'd love to hear in the comments if you're using the natural language processing API now, if you've built a tool with it, if you want to build a tool with it, what do you think about this, how do you use this, how has it gone. Tell me all about it. Holla atcha girl.
Have a great Friday.
Video transcription by Speechpad.com
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
via Blogger https://ift.tt/2OepZEp #blogger #bloggingtips #bloggerlife #bloggersgetsocial #ontheblog #writersofinstagram #writingprompt #instapoetry #writerscommunity #writersofig #writersblock #writerlife #writtenword #instawriters #spilledink #wordgasm #creativewriting #poetsofinstagram #blackoutpoetry #poetsofig
0 notes
ductrungnguyen87 · 5 years ago
Text
Better Content Through NLP (Natural Language Processing) - Whiteboard Friday
Posted by RuthBurrReedy
Gone are the days of optimizing content solely for search engines. For modern SEO, your content needs to please both robots and humans. But how do you know that what you're writing can check the boxes for both man and machine?
In today's Whiteboard Friday, Ruth Burr Reedy focuses on part of her recent MozCon 2019 talk and teaches us all about how Google uses NLP (natural language processing) to truly understand content, plus how you can harness that knowledge to better optimize what you write for people and bots alike.
Click on the whiteboard image above to open a high resolution version in a new tab!
Video Transcription
Howdy, Moz fans. I'm Ruth Burr Reedy, and I am the Vice President of Strategy at UpBuild, a boutique technical marketing agency specializing in technical SEO and advanced web analytics. I recently spoke at MozCon on a basic framework for SEO and approaching changes to our industry that thinks about SEO in the light of we are humans who are marketing to humans, but we are using a machine as the intermediary.
Those videos will be available online at some point. [Editor's note: that point is now!] But today I wanted to talk about one point from my talk that I found really interesting and that has kind of changed the way that I approach content creation, and that is the idea that writing content that is easier for Google, a robot, to understand can actually make you a better writer and help you write better content for humans. It is a win-win. 
The relationships between entities, words, and how people search
To understand how Google is currently approaching parsing content and understanding what content is about, Google is spending a lot of time and a lot of energy and a lot of money on things like neural matching and natural language processing, which seek to understand basically when people talk, what are they talking about?
This goes along with the evolution of search to be more conversational. But there are a lot of times when someone is searching, but they don't totally know what they want, and Google still wants them to get what they want because that's how Google makes money. They are spending a lot of time trying to understand the relationships between entities and between words and how people use words to search.
The example that Danny Sullivan gave online, that I think is a really great example, is if someone is experiencing the soap opera effect on their TV. If you've ever seen a soap opera, you've noticed that they look kind of weird. Someone might be experiencing that, and not knowing what that's called they can't Google soap opera effect because they don't know about it.
They might search something like, "Why does my TV look funny?" Neural matching helps Google understand that when somebody is searching "Why does my TV look funny?" one possible answer might be the soap opera effect. So they can serve up that result, and people are happy. 
Understanding salience
As we're thinking about natural language processing, a core component of natural language processing is understanding salience.
Salience, content, and entities
Salience is a one-word way to sum up to what extent is this piece of content about this specific entity? At this point Google is really good at extracting entities from a piece of content. Entities are basically nouns, people, places, things, proper nouns, regular nouns.
Entities are things, people, etc., numbers, things like that. Google is really good at taking those out and saying, "Okay, here are all of the entities that are contained within this piece of content." Salience attempts to understand how they're related to each other, because what Google is really trying to understand when they're crawling a page is: What is this page about, and is this a good example of a page about this topic?
Salience really goes into the second piece. To what extent is any given entity be the topic of a piece of content? It's often amazing the degree to which a piece of content that a person has created is not actually about anything. I think we've all experienced that.
You're searching and you come to a page and you're like, "This was too vague. This was too broad. This said that it was about one thing, but it was actually about something else. I didn't find what I needed. This wasn't good information for me." As marketers, we're often on the other side of that, trying to get our clients to say what their product actually does on their website or say, "I know you think that you created a guide to Instagram for the holidays. But you actually wrote one paragraph about the holidays and then seven paragraphs about your new Instagram tool. This is not actually a blog post about Instagram for the holidays. It's a piece of content about your tool." These are the kinds of battles that we fight as marketers. 
Natural Language Processing (NLP) APIs
Fortunately, there are now a number of different APIs that you can use to understand natural language processing: 
IBM has one: https://www.ibm.com/watson/services/natural-language-understanding/ 
Google actually has a natural language processing API that's right here on https://cloud.google.com/natural-language/
Is it as sophisticated as what they're using on their own stuff? Probably not. But you can test it out. Put in a piece of content and see (a) what entities Google is able to extract from it, and (b) how salient Google feels each of these entities is to the piece of content as a whole. Again, to what degree is this piece of content about this thing?
So this natural language processing API, which you can try for free and it's actually not that expensive for an API if you want to build a tool with it, will assign each entity that it can extract a salient score between 0 and 1, saying, "Okay, how sure are we that this piece of content is about this thing versus just containing it?"
So the higher or the closer you get to 1, the more confident the tool is that this piece of content is about this thing. 0.9 would be really, really good. 0.01 means it's there, but they're not sure how well it's related. 
A delicious example of how salience and entities work
The example I have here, and this is not taken from a real piece of content — these numbers are made up, it's just an example — is if you had a chocolate chip cookie recipe, you would want chocolate cookies or chocolate chip cookies recipe, chocolate chip cookies, something like that to be the number one entity, the most salient entity, and you would want it to have a pretty high salient score.
You would want the tool to feel pretty confident, yes, this piece of content is about this topic. But what you can also see is the other entities it's extracting and to what degree they are also salient to the topic. So you can see things like if you have a chocolate chip cookie recipe, you would expect to see things like cookie, butter, sugar, 350, which is the temperature you heat your oven, all of the different things that come together to make a chocolate chip cookie recipe.
But I think that it's really, really important for us as SEOs to understand that salience is the future of related keywords. We're beyond the time when to optimize for chocolate chip cookie recipe, we would also be looking for things like chocolate recipe, chocolate chips, chocolate cookie recipe, things like that. Stems, variants, TF-IDF, these are all older methodologies for understanding what a piece of content is about.
Instead what we need to understand is what are the entities that Google, using its vast body of knowledge, using things like Freebase, using large portions of the internet, where is Google seeing these entities co-occur at such a rate that they feel reasonably confident that a piece of content on one entity in order to be salient to that entity would include these other entities?
Using an expert is the best way to create content that's salient to a topic
So chocolate chip cookie recipe, we're now also making sure we're adding things like butter, flour, sugar. This is actually really easy to do if you actually have a chocolate chip cookie recipe to put up there. This is I think what we're going to start seeing as a content trend in SEO is that the best way to create content that is salient to a topic is to have an actual expert in that topic create that content.
Somebody with deep knowledge of a topic is naturally going to include co-occurring terms, because they know how to create something that's about what it's supposed to be about. I think what we're going to start seeing is that people are going to have to start paying more for content marketing, frankly. Unfortunately, a lot of companies seem to think that content marketing is and should be cheap.
Content marketers, I feel you on that. It sucks, and it's no longer the case. We need to start investing in content and investing in experts to create that content so that they can create that deep, rich, salient content that everybody really needs. 
How can you use this API to improve your own SEO? 
One of the things that I like to do with this kind of information is look at — and this is something that I've done for years, just not in this context — but a prime optimization target in general is pages that rank for a topic, but they rank on page 2.
What this often means is that Google understands that that keyword is a topic of the page, but it doesn't necessarily understand that it is a good piece of content on that topic, that the page is actually solely about that content, that it's a good resource. In other words, the signal is there, but it's weak.
What you can do is take content that ranks but not well, run it through this natural language API or another natural language processing tool, and look at how the entities are extracted and how Google is determining that they're related to each other. Sometimes it might be that you need to do some disambiguation. So in this example, you'll notice that while chocolate cookies is called a work of art, and I agree, cookie here is actually called other.
This is because cookie means more than one thing. There's cookies, the baked good, but then there's also cookies, the packet of data. Both of those are legitimate uses of the word "cookie." Words have multiple meanings. If you notice that Google, that this natural language processing API is having trouble correctly classifying your entities, that's a good time to go in and do some disambiguation.
Make sure that the terms surrounding that term are clearly saying, "No, I mean the baked good, not the software piece of data." That's a really great way to kind of bump up your salience. Look at whether or not you have a strong salient score for your primary entity. You'd be amazed at how many pieces of content you can plug into this tool and the top, most salient entity is still only like a 0.01, a 0.14.
A lot of times the API is like "I think this is what it's about," but it's not sure. This is a great time to go in and bump up that content, make it more robust, and look at ways that you can make those entities easier to both extract and to relate to each other. This brings me to my second point, which is my new favorite thing in the world.
Writing for humans and writing for machines, you can now do both at the same time. You no longer have to, and you really haven't had to do this in a long time, but the idea that you might keyword stuff or otherwise create content for Google that your users might not see or care about is way, way, way over.
Now you can create content for Google that also is better for users, because the tenets of machine readability and human readability are moving closer and closer together. 
Tips for writing for human and machine readability:
What I've done here is I did some research not on natural language processing, but on writing for human readability, that is advice from writers, from writing experts on how to write better, clearer, easier to read, easier to understand content.Then I pulled out the pieces of advice that also work as pieces of advice for writing for natural language processing. So natural language processing, again, is the process by which Google or really anything that might be processing language tries to understand how entities are related to each other within a given body of content.
Short, simple sentences
Short, simple sentences. Write simply. Don't use a lot of flowery language. Short sentences and try to keep it to one idea per sentence. 
One idea per sentence
If you're running on, if you've got a lot of different clauses, if you're using a lot of pronouns and it's becoming confusing what you're talking about, that's not great for readers.
It also makes it harder for machines to parse your content. 
Connect questions to answers
Then closely connecting questions to answers. So don't say, "What is the best temperature to bake cookies? Well, let me tell you a story about my grandmother and my childhood," and 500 words later here's the answer. Connect questions to answers. 
What all three of those readability tips have in common is they boil down to reducing the semantic distance between entities.
If you want natural language processing to understand that two entities in your content are closely related, move them closer together in the sentence. Move the words closer together. Reduce the clutter, reduce the fluff, reduce the number of semantic hops that a robot might have to take between one entity and another to understand the relationship, and you've now created content that is more readable because it's shorter and easier to skim, but also easier for a robot to parse and understand.
Be specific first, then explain nuance
Going back to the example of "What is the best temperature to bake chocolate chip cookies at?" Now the real answer to what is the best temperature to bake chocolate cookies is it depends. Hello. Hi, I'm an SEO, and I just answered a question with it depends. It does depend.
That is true, and that is real, but it is not a good answer. It is also not the kind of thing that a robot could extract and reproduce in, for example, voice search or a featured snippet. If somebody says, "Okay, Google, what is a good temperature to bake cookies at?" and Google says, "It depends," that helps nobody even though it's true. So in order to write for both machine and human readability, be specific first and then you can explain nuance.
Then you can go into the details. So a better, just as correct answer to "What is the temperature to bake chocolate chip cookies?" is the best temperature to bake chocolate chip cookies is usually between 325 and 425 degrees, depending on your altitude and how crisp you like your cookie. That is just as true as it depends and, in fact, means the same thing as it depends, but it's a lot more specific.
It's a lot more precise. It uses real numbers. It provides a real answer. I've shortened the distance between the question and the answer. I didn't say it depends first. I said it depends at the end. That's the kind of thing that you can do to improve readability and understanding for both humans and machines.
Get to the point (don't bury the lede)
Get to the point. Don't bury the lead. All of you journalists who try to become content marketers, and then everybody in content marketing said, "Oh, you need to wait till the end to get to your point or they won't read the whole thing,"and you were like, "Don't bury the lead," you are correct. For those of you who aren't familiar with journalism speak, not burying the lead basically means get to the point upfront, at the top.
Include all the information that somebody would really need to get from that piece of content. If they don't read anything else, they read that one paragraph and they've gotten the gist. Then people who want to go deep can go deep. That's how people actually like to consume content, and surprisingly it doesn't mean they won't read the content. It just means they don't have to read it if they don't have time, if they need a quick answer.
The same is true with machines. Get to the point upfront. Make it clear right away what the primary entity, the primary topic, the primary focus of your content is and then get into the details. You'll have a much better structured piece of content that's easier to parse on all sides. 
Avoid jargon and "marketing speak"
Avoid jargon. Avoid marketing speak. Not only is it terrible and very hard to understand. You see this a lot. I'm going back again to the example of getting your clients to say what their products do. You work with a lot of B2B companies, you will you will often run into this. Yes, but what does it do? It provides solutions to streamline the workflow and blah, blah. Okay, what does it do? This is the kind of thing that can be really, really hard for companies to get out of their own heads about, but it's so important for users, for machines.
Avoid jargon. Avoid marketing speak. Not to get too tautological, but the more esoteric a word is, the less commonly it's used. That's actually what esoteric means. What that means is the less commonly a word is used, the less likely it is that Google is going to understand its semantic relationships to other entities.
Keep it simple. Be specific. Say what you mean. Wipe out all of the jargon. By wiping out jargon and kind of marketing speak and kind of the fluff that can happen in your content, you're also, once again, reducing the semantic distances between entities, making them easier to parse. 
Organize your information to match the user journey
Organize it and map it out to the user journey. Think about the information somebody might need and the order in which they might need it. 
Break out subtopics with headings
Then break it out with subheadings. This is like very, very basic writing advice, and yet you all aren't doing it. So if you're not going to do it for your users, do it for machines. 
Format lists with bullets or numbers
You can also really impact skimmability for users by breaking out lists with bullets or numbers.
The great thing about that is that breaking out a list with bullets or numbers also makes information easier for a robot to parse and extract. If a lot of these tips seem like they're the same tips that you would use to get featured snippets, they are, because featured snippets are actually a pretty good indicator that you're creating content that a robot can find, parse, understand, and extract, and that's what you want.
So if you're targeting featured snippets, you're probably already doing a lot of these things, good job. 
Grammar and spelling count!
The last thing, which I shouldn't have to say, but I'm going to say is that grammar and spelling and punctuation and things like that absolutely do count. They count to users. They don't count to all users, but they count to users. They also count to search engines.
Things like grammar, spelling, and punctuation are very, very easy signals for a machine to find and parse. Google has been specific in things, like the "Quality Rater Guidelines,"that a well-written, well-structured, well-spelled, grammatically correct document, that these are signs of authoritativeness. I'm not saying that having a greatly spelled document is going to mean that you immediately rocket to the top of the results.
I am saying that if you're not on that stuff, it's probably going to hurt you. So take the time to make sure everything is nice and tidy. You can use vernacular English. You don't have to be perfect "AP Style Guide" all the time. But make sure that you are formatting things properly from a grammatical standpoint as well as a technical standpoint. What I love about all of this, this is just good writing.
This is good writing. It's easy to understand. It's easy to parse. It's still so hard, especially in the marketing world, to get out of that world of jargon, to get to the point, to stop writing 2,000 words because we think we need 2,000 words, to really think about are we creating content that's about what we think it's about.
Use these tools to understand how readable, parsable, and understandable your content is
So my hope for the SEO world and for you is that you can use these tools not just to think about how to dial in the perfect keyword density or whatever to get an almost perfect score on the salience in the natural language processing API. What I'm hoping is that you will use these tools to help yourself understand how readable, how parsable, and how understandable your content is, how much your content is about what you say it's about and what you think it's about so you can create better stuff for users.
It makes the internet a better place, and it will probably make you some money as well. So these are my thoughts. I'd love to hear in the comments if you're using the natural language processing API now, if you've built a tool with it, if you want to build a tool with it, what do you think about this, how do you use this, how has it gone. Tell me all about it. Holla atcha girl.
Have a great Friday.
Video transcription by Speechpad.com
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
0 notes
lakelandseo · 5 years ago
Text
Better Content Through NLP (Natural Language Processing) - Whiteboard Friday
Posted by RuthBurrReedy
Gone are the days of optimizing content solely for search engines. For modern SEO, your content needs to please both robots and humans. But how do you know that what you're writing can check the boxes for both man and machine?
In today's Whiteboard Friday, Ruth Burr Reedy focuses on part of her recent MozCon 2019 talk and teaches us all about how Google uses NLP (natural language processing) to truly understand content, plus how you can harness that knowledge to better optimize what you write for people and bots alike.
Click on the whiteboard image above to open a high resolution version in a new tab!
Video Transcription
Howdy, Moz fans. I'm Ruth Burr Reedy, and I am the Vice President of Strategy at UpBuild, a boutique technical marketing agency specializing in technical SEO and advanced web analytics. I recently spoke at MozCon on a basic framework for SEO and approaching changes to our industry that thinks about SEO in the light of we are humans who are marketing to humans, but we are using a machine as the intermediary.
Those videos will be available online at some point. [Editor's note: that point is now!] But today I wanted to talk about one point from my talk that I found really interesting and that has kind of changed the way that I approach content creation, and that is the idea that writing content that is easier for Google, a robot, to understand can actually make you a better writer and help you write better content for humans. It is a win-win. 
The relationships between entities, words, and how people search
To understand how Google is currently approaching parsing content and understanding what content is about, Google is spending a lot of time and a lot of energy and a lot of money on things like neural matching and natural language processing, which seek to understand basically when people talk, what are they talking about?
This goes along with the evolution of search to be more conversational. But there are a lot of times when someone is searching, but they don't totally know what they want, and Google still wants them to get what they want because that's how Google makes money. They are spending a lot of time trying to understand the relationships between entities and between words and how people use words to search.
The example that Danny Sullivan gave online, that I think is a really great example, is if someone is experiencing the soap opera effect on their TV. If you've ever seen a soap opera, you've noticed that they look kind of weird. Someone might be experiencing that, and not knowing what that's called they can't Google soap opera effect because they don't know about it.
They might search something like, "Why does my TV look funny?" Neural matching helps Google understand that when somebody is searching "Why does my TV look funny?" one possible answer might be the soap opera effect. So they can serve up that result, and people are happy. 
Understanding salience
As we're thinking about natural language processing, a core component of natural language processing is understanding salience.
Salience, content, and entities
Salience is a one-word way to sum up to what extent is this piece of content about this specific entity? At this point Google is really good at extracting entities from a piece of content. Entities are basically nouns, people, places, things, proper nouns, regular nouns.
Entities are things, people, etc., numbers, things like that. Google is really good at taking those out and saying, "Okay, here are all of the entities that are contained within this piece of content." Salience attempts to understand how they're related to each other, because what Google is really trying to understand when they're crawling a page is: What is this page about, and is this a good example of a page about this topic?
Salience really goes into the second piece. To what extent is any given entity be the topic of a piece of content? It's often amazing the degree to which a piece of content that a person has created is not actually about anything. I think we've all experienced that.
You're searching and you come to a page and you're like, "This was too vague. This was too broad. This said that it was about one thing, but it was actually about something else. I didn't find what I needed. This wasn't good information for me." As marketers, we're often on the other side of that, trying to get our clients to say what their product actually does on their website or say, "I know you think that you created a guide to Instagram for the holidays. But you actually wrote one paragraph about the holidays and then seven paragraphs about your new Instagram tool. This is not actually a blog post about Instagram for the holidays. It's a piece of content about your tool." These are the kinds of battles that we fight as marketers. 
Natural Language Processing (NLP) APIs
Fortunately, there are now a number of different APIs that you can use to understand natural language processing: 
IBM has one: https://www.ibm.com/watson/services/natural-language-understanding/ 
Google actually has a natural language processing API that's right here on https://cloud.google.com/natural-language/
Is it as sophisticated as what they're using on their own stuff? Probably not. But you can test it out. Put in a piece of content and see (a) what entities Google is able to extract from it, and (b) how salient Google feels each of these entities is to the piece of content as a whole. Again, to what degree is this piece of content about this thing?
So this natural language processing API, which you can try for free and it's actually not that expensive for an API if you want to build a tool with it, will assign each entity that it can extract a salient score between 0 and 1, saying, "Okay, how sure are we that this piece of content is about this thing versus just containing it?"
So the higher or the closer you get to 1, the more confident the tool is that this piece of content is about this thing. 0.9 would be really, really good. 0.01 means it's there, but they're not sure how well it's related. 
A delicious example of how salience and entities work
The example I have here, and this is not taken from a real piece of content — these numbers are made up, it's just an example — is if you had a chocolate chip cookie recipe, you would want chocolate cookies or chocolate chip cookies recipe, chocolate chip cookies, something like that to be the number one entity, the most salient entity, and you would want it to have a pretty high salient score.
You would want the tool to feel pretty confident, yes, this piece of content is about this topic. But what you can also see is the other entities it's extracting and to what degree they are also salient to the topic. So you can see things like if you have a chocolate chip cookie recipe, you would expect to see things like cookie, butter, sugar, 350, which is the temperature you heat your oven, all of the different things that come together to make a chocolate chip cookie recipe.
But I think that it's really, really important for us as SEOs to understand that salience is the future of related keywords. We're beyond the time when to optimize for chocolate chip cookie recipe, we would also be looking for things like chocolate recipe, chocolate chips, chocolate cookie recipe, things like that. Stems, variants, TF-IDF, these are all older methodologies for understanding what a piece of content is about.
Instead what we need to understand is what are the entities that Google, using its vast body of knowledge, using things like Freebase, using large portions of the internet, where is Google seeing these entities co-occur at such a rate that they feel reasonably confident that a piece of content on one entity in order to be salient to that entity would include these other entities?
Using an expert is the best way to create content that's salient to a topic
So chocolate chip cookie recipe, we're now also making sure we're adding things like butter, flour, sugar. This is actually really easy to do if you actually have a chocolate chip cookie recipe to put up there. This is I think what we're going to start seeing as a content trend in SEO is that the best way to create content that is salient to a topic is to have an actual expert in that topic create that content.
Somebody with deep knowledge of a topic is naturally going to include co-occurring terms, because they know how to create something that's about what it's supposed to be about. I think what we're going to start seeing is that people are going to have to start paying more for content marketing, frankly. Unfortunately, a lot of companies seem to think that content marketing is and should be cheap.
Content marketers, I feel you on that. It sucks, and it's no longer the case. We need to start investing in content and investing in experts to create that content so that they can create that deep, rich, salient content that everybody really needs. 
How can you use this API to improve your own SEO? 
One of the things that I like to do with this kind of information is look at — and this is something that I've done for years, just not in this context — but a prime optimization target in general is pages that rank for a topic, but they rank on page 2.
What this often means is that Google understands that that keyword is a topic of the page, but it doesn't necessarily understand that it is a good piece of content on that topic, that the page is actually solely about that content, that it's a good resource. In other words, the signal is there, but it's weak.
What you can do is take content that ranks but not well, run it through this natural language API or another natural language processing tool, and look at how the entities are extracted and how Google is determining that they're related to each other. Sometimes it might be that you need to do some disambiguation. So in this example, you'll notice that while chocolate cookies is called a work of art, and I agree, cookie here is actually called other.
This is because cookie means more than one thing. There's cookies, the baked good, but then there's also cookies, the packet of data. Both of those are legitimate uses of the word "cookie." Words have multiple meanings. If you notice that Google, that this natural language processing API is having trouble correctly classifying your entities, that's a good time to go in and do some disambiguation.
Make sure that the terms surrounding that term are clearly saying, "No, I mean the baked good, not the software piece of data." That's a really great way to kind of bump up your salience. Look at whether or not you have a strong salient score for your primary entity. You'd be amazed at how many pieces of content you can plug into this tool and the top, most salient entity is still only like a 0.01, a 0.14.
A lot of times the API is like "I think this is what it's about," but it's not sure. This is a great time to go in and bump up that content, make it more robust, and look at ways that you can make those entities easier to both extract and to relate to each other. This brings me to my second point, which is my new favorite thing in the world.
Writing for humans and writing for machines, you can now do both at the same time. You no longer have to, and you really haven't had to do this in a long time, but the idea that you might keyword stuff or otherwise create content for Google that your users might not see or care about is way, way, way over.
Now you can create content for Google that also is better for users, because the tenets of machine readability and human readability are moving closer and closer together. 
Tips for writing for human and machine readability:
What I've done here is I did some research not on natural language processing, but on writing for human readability, that is advice from writers, from writing experts on how to write better, clearer, easier to read, easier to understand content.Then I pulled out the pieces of advice that also work as pieces of advice for writing for natural language processing. So natural language processing, again, is the process by which Google or really anything that might be processing language tries to understand how entities are related to each other within a given body of content.
Short, simple sentences
Short, simple sentences. Write simply. Don't use a lot of flowery language. Short sentences and try to keep it to one idea per sentence. 
One idea per sentence
If you're running on, if you've got a lot of different clauses, if you're using a lot of pronouns and it's becoming confusing what you're talking about, that's not great for readers.
It also makes it harder for machines to parse your content. 
Connect questions to answers
Then closely connecting questions to answers. So don't say, "What is the best temperature to bake cookies? Well, let me tell you a story about my grandmother and my childhood," and 500 words later here's the answer. Connect questions to answers. 
What all three of those readability tips have in common is they boil down to reducing the semantic distance between entities.
If you want natural language processing to understand that two entities in your content are closely related, move them closer together in the sentence. Move the words closer together. Reduce the clutter, reduce the fluff, reduce the number of semantic hops that a robot might have to take between one entity and another to understand the relationship, and you've now created content that is more readable because it's shorter and easier to skim, but also easier for a robot to parse and understand.
Be specific first, then explain nuance
Going back to the example of "What is the best temperature to bake chocolate chip cookies at?" Now the real answer to what is the best temperature to bake chocolate cookies is it depends. Hello. Hi, I'm an SEO, and I just answered a question with it depends. It does depend.
That is true, and that is real, but it is not a good answer. It is also not the kind of thing that a robot could extract and reproduce in, for example, voice search or a featured snippet. If somebody says, "Okay, Google, what is a good temperature to bake cookies at?" and Google says, "It depends," that helps nobody even though it's true. So in order to write for both machine and human readability, be specific first and then you can explain nuance.
Then you can go into the details. So a better, just as correct answer to "What is the temperature to bake chocolate chip cookies?" is the best temperature to bake chocolate chip cookies is usually between 325 and 425 degrees, depending on your altitude and how crisp you like your cookie. That is just as true as it depends and, in fact, means the same thing as it depends, but it's a lot more specific.
It's a lot more precise. It uses real numbers. It provides a real answer. I've shortened the distance between the question and the answer. I didn't say it depends first. I said it depends at the end. That's the kind of thing that you can do to improve readability and understanding for both humans and machines.
Get to the point (don't bury the lede)
Get to the point. Don't bury the lead. All of you journalists who try to become content marketers, and then everybody in content marketing said, "Oh, you need to wait till the end to get to your point or they won't read the whole thing,"and you were like, "Don't bury the lead," you are correct. For those of you who aren't familiar with journalism speak, not burying the lead basically means get to the point upfront, at the top.
Include all the information that somebody would really need to get from that piece of content. If they don't read anything else, they read that one paragraph and they've gotten the gist. Then people who want to go deep can go deep. That's how people actually like to consume content, and surprisingly it doesn't mean they won't read the content. It just means they don't have to read it if they don't have time, if they need a quick answer.
The same is true with machines. Get to the point upfront. Make it clear right away what the primary entity, the primary topic, the primary focus of your content is and then get into the details. You'll have a much better structured piece of content that's easier to parse on all sides. 
Avoid jargon and "marketing speak"
Avoid jargon. Avoid marketing speak. Not only is it terrible and very hard to understand. You see this a lot. I'm going back again to the example of getting your clients to say what their products do. You work with a lot of B2B companies, you will you will often run into this. Yes, but what does it do? It provides solutions to streamline the workflow and blah, blah. Okay, what does it do? This is the kind of thing that can be really, really hard for companies to get out of their own heads about, but it's so important for users, for machines.
Avoid jargon. Avoid marketing speak. Not to get too tautological, but the more esoteric a word is, the less commonly it's used. That's actually what esoteric means. What that means is the less commonly a word is used, the less likely it is that Google is going to understand its semantic relationships to other entities.
Keep it simple. Be specific. Say what you mean. Wipe out all of the jargon. By wiping out jargon and kind of marketing speak and kind of the fluff that can happen in your content, you're also, once again, reducing the semantic distances between entities, making them easier to parse. 
Organize your information to match the user journey
Organize it and map it out to the user journey. Think about the information somebody might need and the order in which they might need it. 
Break out subtopics with headings
Then break it out with subheadings. This is like very, very basic writing advice, and yet you all aren't doing it. So if you're not going to do it for your users, do it for machines. 
Format lists with bullets or numbers
You can also really impact skimmability for users by breaking out lists with bullets or numbers.
The great thing about that is that breaking out a list with bullets or numbers also makes information easier for a robot to parse and extract. If a lot of these tips seem like they're the same tips that you would use to get featured snippets, they are, because featured snippets are actually a pretty good indicator that you're creating content that a robot can find, parse, understand, and extract, and that's what you want.
So if you're targeting featured snippets, you're probably already doing a lot of these things, good job. 
Grammar and spelling count!
The last thing, which I shouldn't have to say, but I'm going to say is that grammar and spelling and punctuation and things like that absolutely do count. They count to users. They don't count to all users, but they count to users. They also count to search engines.
Things like grammar, spelling, and punctuation are very, very easy signals for a machine to find and parse. Google has been specific in things, like the "Quality Rater Guidelines,"that a well-written, well-structured, well-spelled, grammatically correct document, that these are signs of authoritativeness. I'm not saying that having a greatly spelled document is going to mean that you immediately rocket to the top of the results.
I am saying that if you're not on that stuff, it's probably going to hurt you. So take the time to make sure everything is nice and tidy. You can use vernacular English. You don't have to be perfect "AP Style Guide" all the time. But make sure that you are formatting things properly from a grammatical standpoint as well as a technical standpoint. What I love about all of this, this is just good writing.
This is good writing. It's easy to understand. It's easy to parse. It's still so hard, especially in the marketing world, to get out of that world of jargon, to get to the point, to stop writing 2,000 words because we think we need 2,000 words, to really think about are we creating content that's about what we think it's about.
Use these tools to understand how readable, parsable, and understandable your content is
So my hope for the SEO world and for you is that you can use these tools not just to think about how to dial in the perfect keyword density or whatever to get an almost perfect score on the salience in the natural language processing API. What I'm hoping is that you will use these tools to help yourself understand how readable, how parsable, and how understandable your content is, how much your content is about what you say it's about and what you think it's about so you can create better stuff for users.
It makes the internet a better place, and it will probably make you some money as well. So these are my thoughts. I'd love to hear in the comments if you're using the natural language processing API now, if you've built a tool with it, if you want to build a tool with it, what do you think about this, how do you use this, how has it gone. Tell me all about it. Holla atcha girl.
Have a great Friday.
Video transcription by Speechpad.com
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
0 notes
nutrifami · 5 years ago
Text
Better Content Through NLP (Natural Language Processing) - Whiteboard Friday
Posted by RuthBurrReedy
Gone are the days of optimizing content solely for search engines. For modern SEO, your content needs to please both robots and humans. But how do you know that what you're writing can check the boxes for both man and machine?
In today's Whiteboard Friday, Ruth Burr Reedy focuses on part of her recent MozCon 2019 talk and teaches us all about how Google uses NLP (natural language processing) to truly understand content, plus how you can harness that knowledge to better optimize what you write for people and bots alike.
Click on the whiteboard image above to open a high resolution version in a new tab!
Video Transcription
Howdy, Moz fans. I'm Ruth Burr Reedy, and I am the Vice President of Strategy at UpBuild, a boutique technical marketing agency specializing in technical SEO and advanced web analytics. I recently spoke at MozCon on a basic framework for SEO and approaching changes to our industry that thinks about SEO in the light of we are humans who are marketing to humans, but we are using a machine as the intermediary.
Those videos will be available online at some point. [Editor's note: that point is now!] But today I wanted to talk about one point from my talk that I found really interesting and that has kind of changed the way that I approach content creation, and that is the idea that writing content that is easier for Google, a robot, to understand can actually make you a better writer and help you write better content for humans. It is a win-win. 
The relationships between entities, words, and how people search
To understand how Google is currently approaching parsing content and understanding what content is about, Google is spending a lot of time and a lot of energy and a lot of money on things like neural matching and natural language processing, which seek to understand basically when people talk, what are they talking about?
This goes along with the evolution of search to be more conversational. But there are a lot of times when someone is searching, but they don't totally know what they want, and Google still wants them to get what they want because that's how Google makes money. They are spending a lot of time trying to understand the relationships between entities and between words and how people use words to search.
The example that Danny Sullivan gave online, that I think is a really great example, is if someone is experiencing the soap opera effect on their TV. If you've ever seen a soap opera, you've noticed that they look kind of weird. Someone might be experiencing that, and not knowing what that's called they can't Google soap opera effect because they don't know about it.
They might search something like, "Why does my TV look funny?" Neural matching helps Google understand that when somebody is searching "Why does my TV look funny?" one possible answer might be the soap opera effect. So they can serve up that result, and people are happy. 
Understanding salience
As we're thinking about natural language processing, a core component of natural language processing is understanding salience.
Salience, content, and entities
Salience is a one-word way to sum up to what extent is this piece of content about this specific entity? At this point Google is really good at extracting entities from a piece of content. Entities are basically nouns, people, places, things, proper nouns, regular nouns.
Entities are things, people, etc., numbers, things like that. Google is really good at taking those out and saying, "Okay, here are all of the entities that are contained within this piece of content." Salience attempts to understand how they're related to each other, because what Google is really trying to understand when they're crawling a page is: What is this page about, and is this a good example of a page about this topic?
Salience really goes into the second piece. To what extent is any given entity be the topic of a piece of content? It's often amazing the degree to which a piece of content that a person has created is not actually about anything. I think we've all experienced that.
You're searching and you come to a page and you're like, "This was too vague. This was too broad. This said that it was about one thing, but it was actually about something else. I didn't find what I needed. This wasn't good information for me." As marketers, we're often on the other side of that, trying to get our clients to say what their product actually does on their website or say, "I know you think that you created a guide to Instagram for the holidays. But you actually wrote one paragraph about the holidays and then seven paragraphs about your new Instagram tool. This is not actually a blog post about Instagram for the holidays. It's a piece of content about your tool." These are the kinds of battles that we fight as marketers. 
Natural Language Processing (NLP) APIs
Fortunately, there are now a number of different APIs that you can use to understand natural language processing: 
IBM has one: https://www.ibm.com/watson/services/natural-language-understanding/ 
Google actually has a natural language processing API that's right here on https://cloud.google.com/natural-language/
Is it as sophisticated as what they're using on their own stuff? Probably not. But you can test it out. Put in a piece of content and see (a) what entities Google is able to extract from it, and (b) how salient Google feels each of these entities is to the piece of content as a whole. Again, to what degree is this piece of content about this thing?
So this natural language processing API, which you can try for free and it's actually not that expensive for an API if you want to build a tool with it, will assign each entity that it can extract a salient score between 0 and 1, saying, "Okay, how sure are we that this piece of content is about this thing versus just containing it?"
So the higher or the closer you get to 1, the more confident the tool is that this piece of content is about this thing. 0.9 would be really, really good. 0.01 means it's there, but they're not sure how well it's related. 
A delicious example of how salience and entities work
The example I have here, and this is not taken from a real piece of content — these numbers are made up, it's just an example — is if you had a chocolate chip cookie recipe, you would want chocolate cookies or chocolate chip cookies recipe, chocolate chip cookies, something like that to be the number one entity, the most salient entity, and you would want it to have a pretty high salient score.
You would want the tool to feel pretty confident, yes, this piece of content is about this topic. But what you can also see is the other entities it's extracting and to what degree they are also salient to the topic. So you can see things like if you have a chocolate chip cookie recipe, you would expect to see things like cookie, butter, sugar, 350, which is the temperature you heat your oven, all of the different things that come together to make a chocolate chip cookie recipe.
But I think that it's really, really important for us as SEOs to understand that salience is the future of related keywords. We're beyond the time when to optimize for chocolate chip cookie recipe, we would also be looking for things like chocolate recipe, chocolate chips, chocolate cookie recipe, things like that. Stems, variants, TF-IDF, these are all older methodologies for understanding what a piece of content is about.
Instead what we need to understand is what are the entities that Google, using its vast body of knowledge, using things like Freebase, using large portions of the internet, where is Google seeing these entities co-occur at such a rate that they feel reasonably confident that a piece of content on one entity in order to be salient to that entity would include these other entities?
Using an expert is the best way to create content that's salient to a topic
So chocolate chip cookie recipe, we're now also making sure we're adding things like butter, flour, sugar. This is actually really easy to do if you actually have a chocolate chip cookie recipe to put up there. This is I think what we're going to start seeing as a content trend in SEO is that the best way to create content that is salient to a topic is to have an actual expert in that topic create that content.
Somebody with deep knowledge of a topic is naturally going to include co-occurring terms, because they know how to create something that's about what it's supposed to be about. I think what we're going to start seeing is that people are going to have to start paying more for content marketing, frankly. Unfortunately, a lot of companies seem to think that content marketing is and should be cheap.
Content marketers, I feel you on that. It sucks, and it's no longer the case. We need to start investing in content and investing in experts to create that content so that they can create that deep, rich, salient content that everybody really needs. 
How can you use this API to improve your own SEO? 
One of the things that I like to do with this kind of information is look at — and this is something that I've done for years, just not in this context — but a prime optimization target in general is pages that rank for a topic, but they rank on page 2.
What this often means is that Google understands that that keyword is a topic of the page, but it doesn't necessarily understand that it is a good piece of content on that topic, that the page is actually solely about that content, that it's a good resource. In other words, the signal is there, but it's weak.
What you can do is take content that ranks but not well, run it through this natural language API or another natural language processing tool, and look at how the entities are extracted and how Google is determining that they're related to each other. Sometimes it might be that you need to do some disambiguation. So in this example, you'll notice that while chocolate cookies is called a work of art, and I agree, cookie here is actually called other.
This is because cookie means more than one thing. There's cookies, the baked good, but then there's also cookies, the packet of data. Both of those are legitimate uses of the word "cookie." Words have multiple meanings. If you notice that Google, that this natural language processing API is having trouble correctly classifying your entities, that's a good time to go in and do some disambiguation.
Make sure that the terms surrounding that term are clearly saying, "No, I mean the baked good, not the software piece of data." That's a really great way to kind of bump up your salience. Look at whether or not you have a strong salient score for your primary entity. You'd be amazed at how many pieces of content you can plug into this tool and the top, most salient entity is still only like a 0.01, a 0.14.
A lot of times the API is like "I think this is what it's about," but it's not sure. This is a great time to go in and bump up that content, make it more robust, and look at ways that you can make those entities easier to both extract and to relate to each other. This brings me to my second point, which is my new favorite thing in the world.
Writing for humans and writing for machines, you can now do both at the same time. You no longer have to, and you really haven't had to do this in a long time, but the idea that you might keyword stuff or otherwise create content for Google that your users might not see or care about is way, way, way over.
Now you can create content for Google that also is better for users, because the tenets of machine readability and human readability are moving closer and closer together. 
Tips for writing for human and machine readability:
What I've done here is I did some research not on natural language processing, but on writing for human readability, that is advice from writers, from writing experts on how to write better, clearer, easier to read, easier to understand content.Then I pulled out the pieces of advice that also work as pieces of advice for writing for natural language processing. So natural language processing, again, is the process by which Google or really anything that might be processing language tries to understand how entities are related to each other within a given body of content.
Short, simple sentences
Short, simple sentences. Write simply. Don't use a lot of flowery language. Short sentences and try to keep it to one idea per sentence. 
One idea per sentence
If you're running on, if you've got a lot of different clauses, if you're using a lot of pronouns and it's becoming confusing what you're talking about, that's not great for readers.
It also makes it harder for machines to parse your content. 
Connect questions to answers
Then closely connecting questions to answers. So don't say, "What is the best temperature to bake cookies? Well, let me tell you a story about my grandmother and my childhood," and 500 words later here's the answer. Connect questions to answers. 
What all three of those readability tips have in common is they boil down to reducing the semantic distance between entities.
If you want natural language processing to understand that two entities in your content are closely related, move them closer together in the sentence. Move the words closer together. Reduce the clutter, reduce the fluff, reduce the number of semantic hops that a robot might have to take between one entity and another to understand the relationship, and you've now created content that is more readable because it's shorter and easier to skim, but also easier for a robot to parse and understand.
Be specific first, then explain nuance
Going back to the example of "What is the best temperature to bake chocolate chip cookies at?" Now the real answer to what is the best temperature to bake chocolate cookies is it depends. Hello. Hi, I'm an SEO, and I just answered a question with it depends. It does depend.
That is true, and that is real, but it is not a good answer. It is also not the kind of thing that a robot could extract and reproduce in, for example, voice search or a featured snippet. If somebody says, "Okay, Google, what is a good temperature to bake cookies at?" and Google says, "It depends," that helps nobody even though it's true. So in order to write for both machine and human readability, be specific first and then you can explain nuance.
Then you can go into the details. So a better, just as correct answer to "What is the temperature to bake chocolate chip cookies?" is the best temperature to bake chocolate chip cookies is usually between 325 and 425 degrees, depending on your altitude and how crisp you like your cookie. That is just as true as it depends and, in fact, means the same thing as it depends, but it's a lot more specific.
It's a lot more precise. It uses real numbers. It provides a real answer. I've shortened the distance between the question and the answer. I didn't say it depends first. I said it depends at the end. That's the kind of thing that you can do to improve readability and understanding for both humans and machines.
Get to the point (don't bury the lede)
Get to the point. Don't bury the lead. All of you journalists who try to become content marketers, and then everybody in content marketing said, "Oh, you need to wait till the end to get to your point or they won't read the whole thing,"and you were like, "Don't bury the lead," you are correct. For those of you who aren't familiar with journalism speak, not burying the lead basically means get to the point upfront, at the top.
Include all the information that somebody would really need to get from that piece of content. If they don't read anything else, they read that one paragraph and they've gotten the gist. Then people who want to go deep can go deep. That's how people actually like to consume content, and surprisingly it doesn't mean they won't read the content. It just means they don't have to read it if they don't have time, if they need a quick answer.
The same is true with machines. Get to the point upfront. Make it clear right away what the primary entity, the primary topic, the primary focus of your content is and then get into the details. You'll have a much better structured piece of content that's easier to parse on all sides. 
Avoid jargon and "marketing speak"
Avoid jargon. Avoid marketing speak. Not only is it terrible and very hard to understand. You see this a lot. I'm going back again to the example of getting your clients to say what their products do. You work with a lot of B2B companies, you will you will often run into this. Yes, but what does it do? It provides solutions to streamline the workflow and blah, blah. Okay, what does it do? This is the kind of thing that can be really, really hard for companies to get out of their own heads about, but it's so important for users, for machines.
Avoid jargon. Avoid marketing speak. Not to get too tautological, but the more esoteric a word is, the less commonly it's used. That's actually what esoteric means. What that means is the less commonly a word is used, the less likely it is that Google is going to understand its semantic relationships to other entities.
Keep it simple. Be specific. Say what you mean. Wipe out all of the jargon. By wiping out jargon and kind of marketing speak and kind of the fluff that can happen in your content, you're also, once again, reducing the semantic distances between entities, making them easier to parse. 
Organize your information to match the user journey
Organize it and map it out to the user journey. Think about the information somebody might need and the order in which they might need it. 
Break out subtopics with headings
Then break it out with subheadings. This is like very, very basic writing advice, and yet you all aren't doing it. So if you're not going to do it for your users, do it for machines. 
Format lists with bullets or numbers
You can also really impact skimmability for users by breaking out lists with bullets or numbers.
The great thing about that is that breaking out a list with bullets or numbers also makes information easier for a robot to parse and extract. If a lot of these tips seem like they're the same tips that you would use to get featured snippets, they are, because featured snippets are actually a pretty good indicator that you're creating content that a robot can find, parse, understand, and extract, and that's what you want.
So if you're targeting featured snippets, you're probably already doing a lot of these things, good job. 
Grammar and spelling count!
The last thing, which I shouldn't have to say, but I'm going to say is that grammar and spelling and punctuation and things like that absolutely do count. They count to users. They don't count to all users, but they count to users. They also count to search engines.
Things like grammar, spelling, and punctuation are very, very easy signals for a machine to find and parse. Google has been specific in things, like the "Quality Rater Guidelines,"that a well-written, well-structured, well-spelled, grammatically correct document, that these are signs of authoritativeness. I'm not saying that having a greatly spelled document is going to mean that you immediately rocket to the top of the results.
I am saying that if you're not on that stuff, it's probably going to hurt you. So take the time to make sure everything is nice and tidy. You can use vernacular English. You don't have to be perfect "AP Style Guide" all the time. But make sure that you are formatting things properly from a grammatical standpoint as well as a technical standpoint. What I love about all of this, this is just good writing.
This is good writing. It's easy to understand. It's easy to parse. It's still so hard, especially in the marketing world, to get out of that world of jargon, to get to the point, to stop writing 2,000 words because we think we need 2,000 words, to really think about are we creating content that's about what we think it's about.
Use these tools to understand how readable, parsable, and understandable your content is
So my hope for the SEO world and for you is that you can use these tools not just to think about how to dial in the perfect keyword density or whatever to get an almost perfect score on the salience in the natural language processing API. What I'm hoping is that you will use these tools to help yourself understand how readable, how parsable, and how understandable your content is, how much your content is about what you say it's about and what you think it's about so you can create better stuff for users.
It makes the internet a better place, and it will probably make you some money as well. So these are my thoughts. I'd love to hear in the comments if you're using the natural language processing API now, if you've built a tool with it, if you want to build a tool with it, what do you think about this, how do you use this, how has it gone. Tell me all about it. Holla atcha girl.
Have a great Friday.
Video transcription by Speechpad.com
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
0 notes
xaydungtruonggia · 5 years ago
Text
Better Content Through NLP (Natural Language Processing) - Whiteboard Friday
Posted by RuthBurrReedy
Gone are the days of optimizing content solely for search engines. For modern SEO, your content needs to please both robots and humans. But how do you know that what you're writing can check the boxes for both man and machine?
In today's Whiteboard Friday, Ruth Burr Reedy focuses on part of her recent MozCon 2019 talk and teaches us all about how Google uses NLP (natural language processing) to truly understand content, plus how you can harness that knowledge to better optimize what you write for people and bots alike.
Click on the whiteboard image above to open a high resolution version in a new tab!
Video Transcription
Howdy, Moz fans. I'm Ruth Burr Reedy, and I am the Vice President of Strategy at UpBuild, a boutique technical marketing agency specializing in technical SEO and advanced web analytics. I recently spoke at MozCon on a basic framework for SEO and approaching changes to our industry that thinks about SEO in the light of we are humans who are marketing to humans, but we are using a machine as the intermediary.
Those videos will be available online at some point. [Editor's note: that point is now!] But today I wanted to talk about one point from my talk that I found really interesting and that has kind of changed the way that I approach content creation, and that is the idea that writing content that is easier for Google, a robot, to understand can actually make you a better writer and help you write better content for humans. It is a win-win. 
The relationships between entities, words, and how people search
To understand how Google is currently approaching parsing content and understanding what content is about, Google is spending a lot of time and a lot of energy and a lot of money on things like neural matching and natural language processing, which seek to understand basically when people talk, what are they talking about?
This goes along with the evolution of search to be more conversational. But there are a lot of times when someone is searching, but they don't totally know what they want, and Google still wants them to get what they want because that's how Google makes money. They are spending a lot of time trying to understand the relationships between entities and between words and how people use words to search.
The example that Danny Sullivan gave online, that I think is a really great example, is if someone is experiencing the soap opera effect on their TV. If you've ever seen a soap opera, you've noticed that they look kind of weird. Someone might be experiencing that, and not knowing what that's called they can't Google soap opera effect because they don't know about it.
They might search something like, "Why does my TV look funny?" Neural matching helps Google understand that when somebody is searching "Why does my TV look funny?" one possible answer might be the soap opera effect. So they can serve up that result, and people are happy. 
Understanding salience
As we're thinking about natural language processing, a core component of natural language processing is understanding salience.
Salience, content, and entities
Salience is a one-word way to sum up to what extent is this piece of content about this specific entity? At this point Google is really good at extracting entities from a piece of content. Entities are basically nouns, people, places, things, proper nouns, regular nouns.
Entities are things, people, etc., numbers, things like that. Google is really good at taking those out and saying, "Okay, here are all of the entities that are contained within this piece of content." Salience attempts to understand how they're related to each other, because what Google is really trying to understand when they're crawling a page is: What is this page about, and is this a good example of a page about this topic?
Salience really goes into the second piece. To what extent is any given entity be the topic of a piece of content? It's often amazing the degree to which a piece of content that a person has created is not actually about anything. I think we've all experienced that.
You're searching and you come to a page and you're like, "This was too vague. This was too broad. This said that it was about one thing, but it was actually about something else. I didn't find what I needed. This wasn't good information for me." As marketers, we're often on the other side of that, trying to get our clients to say what their product actually does on their website or say, "I know you think that you created a guide to Instagram for the holidays. But you actually wrote one paragraph about the holidays and then seven paragraphs about your new Instagram tool. This is not actually a blog post about Instagram for the holidays. It's a piece of content about your tool." These are the kinds of battles that we fight as marketers. 
Natural Language Processing (NLP) APIs
Fortunately, there are now a number of different APIs that you can use to understand natural language processing: 
IBM has one: https://www.ibm.com/watson/services/natural-language-understanding/ 
Google actually has a natural language processing API that's right here on https://cloud.google.com/natural-language/
Is it as sophisticated as what they're using on their own stuff? Probably not. But you can test it out. Put in a piece of content and see (a) what entities Google is able to extract from it, and (b) how salient Google feels each of these entities is to the piece of content as a whole. Again, to what degree is this piece of content about this thing?
So this natural language processing API, which you can try for free and it's actually not that expensive for an API if you want to build a tool with it, will assign each entity that it can extract a salient score between 0 and 1, saying, "Okay, how sure are we that this piece of content is about this thing versus just containing it?"
So the higher or the closer you get to 1, the more confident the tool is that this piece of content is about this thing. 0.9 would be really, really good. 0.01 means it's there, but they're not sure how well it's related. 
A delicious example of how salience and entities work
The example I have here, and this is not taken from a real piece of content — these numbers are made up, it's just an example — is if you had a chocolate chip cookie recipe, you would want chocolate cookies or chocolate chip cookies recipe, chocolate chip cookies, something like that to be the number one entity, the most salient entity, and you would want it to have a pretty high salient score.
You would want the tool to feel pretty confident, yes, this piece of content is about this topic. But what you can also see is the other entities it's extracting and to what degree they are also salient to the topic. So you can see things like if you have a chocolate chip cookie recipe, you would expect to see things like cookie, butter, sugar, 350, which is the temperature you heat your oven, all of the different things that come together to make a chocolate chip cookie recipe.
But I think that it's really, really important for us as SEOs to understand that salience is the future of related keywords. We're beyond the time when to optimize for chocolate chip cookie recipe, we would also be looking for things like chocolate recipe, chocolate chips, chocolate cookie recipe, things like that. Stems, variants, TF-IDF, these are all older methodologies for understanding what a piece of content is about.
Instead what we need to understand is what are the entities that Google, using its vast body of knowledge, using things like Freebase, using large portions of the internet, where is Google seeing these entities co-occur at such a rate that they feel reasonably confident that a piece of content on one entity in order to be salient to that entity would include these other entities?
Using an expert is the best way to create content that's salient to a topic
So chocolate chip cookie recipe, we're now also making sure we're adding things like butter, flour, sugar. This is actually really easy to do if you actually have a chocolate chip cookie recipe to put up there. This is I think what we're going to start seeing as a content trend in SEO is that the best way to create content that is salient to a topic is to have an actual expert in that topic create that content.
Somebody with deep knowledge of a topic is naturally going to include co-occurring terms, because they know how to create something that's about what it's supposed to be about. I think what we're going to start seeing is that people are going to have to start paying more for content marketing, frankly. Unfortunately, a lot of companies seem to think that content marketing is and should be cheap.
Content marketers, I feel you on that. It sucks, and it's no longer the case. We need to start investing in content and investing in experts to create that content so that they can create that deep, rich, salient content that everybody really needs. 
How can you use this API to improve your own SEO? 
One of the things that I like to do with this kind of information is look at — and this is something that I've done for years, just not in this context — but a prime optimization target in general is pages that rank for a topic, but they rank on page 2.
What this often means is that Google understands that that keyword is a topic of the page, but it doesn't necessarily understand that it is a good piece of content on that topic, that the page is actually solely about that content, that it's a good resource. In other words, the signal is there, but it's weak.
What you can do is take content that ranks but not well, run it through this natural language API or another natural language processing tool, and look at how the entities are extracted and how Google is determining that they're related to each other. Sometimes it might be that you need to do some disambiguation. So in this example, you'll notice that while chocolate cookies is called a work of art, and I agree, cookie here is actually called other.
This is because cookie means more than one thing. There's cookies, the baked good, but then there's also cookies, the packet of data. Both of those are legitimate uses of the word "cookie." Words have multiple meanings. If you notice that Google, that this natural language processing API is having trouble correctly classifying your entities, that's a good time to go in and do some disambiguation.
Make sure that the terms surrounding that term are clearly saying, "No, I mean the baked good, not the software piece of data." That's a really great way to kind of bump up your salience. Look at whether or not you have a strong salient score for your primary entity. You'd be amazed at how many pieces of content you can plug into this tool and the top, most salient entity is still only like a 0.01, a 0.14.
A lot of times the API is like "I think this is what it's about," but it's not sure. This is a great time to go in and bump up that content, make it more robust, and look at ways that you can make those entities easier to both extract and to relate to each other. This brings me to my second point, which is my new favorite thing in the world.
Writing for humans and writing for machines, you can now do both at the same time. You no longer have to, and you really haven't had to do this in a long time, but the idea that you might keyword stuff or otherwise create content for Google that your users might not see or care about is way, way, way over.
Now you can create content for Google that also is better for users, because the tenets of machine readability and human readability are moving closer and closer together. 
Tips for writing for human and machine readability:
What I've done here is I did some research not on natural language processing, but on writing for human readability, that is advice from writers, from writing experts on how to write better, clearer, easier to read, easier to understand content.Then I pulled out the pieces of advice that also work as pieces of advice for writing for natural language processing. So natural language processing, again, is the process by which Google or really anything that might be processing language tries to understand how entities are related to each other within a given body of content.
Short, simple sentences
Short, simple sentences. Write simply. Don't use a lot of flowery language. Short sentences and try to keep it to one idea per sentence. 
One idea per sentence
If you're running on, if you've got a lot of different clauses, if you're using a lot of pronouns and it's becoming confusing what you're talking about, that's not great for readers.
It also makes it harder for machines to parse your content. 
Connect questions to answers
Then closely connecting questions to answers. So don't say, "What is the best temperature to bake cookies? Well, let me tell you a story about my grandmother and my childhood," and 500 words later here's the answer. Connect questions to answers. 
What all three of those readability tips have in common is they boil down to reducing the semantic distance between entities.
If you want natural language processing to understand that two entities in your content are closely related, move them closer together in the sentence. Move the words closer together. Reduce the clutter, reduce the fluff, reduce the number of semantic hops that a robot might have to take between one entity and another to understand the relationship, and you've now created content that is more readable because it's shorter and easier to skim, but also easier for a robot to parse and understand.
Be specific first, then explain nuance
Going back to the example of "What is the best temperature to bake chocolate chip cookies at?" Now the real answer to what is the best temperature to bake chocolate cookies is it depends. Hello. Hi, I'm an SEO, and I just answered a question with it depends. It does depend.
That is true, and that is real, but it is not a good answer. It is also not the kind of thing that a robot could extract and reproduce in, for example, voice search or a featured snippet. If somebody says, "Okay, Google, what is a good temperature to bake cookies at?" and Google says, "It depends," that helps nobody even though it's true. So in order to write for both machine and human readability, be specific first and then you can explain nuance.
Then you can go into the details. So a better, just as correct answer to "What is the temperature to bake chocolate chip cookies?" is the best temperature to bake chocolate chip cookies is usually between 325 and 425 degrees, depending on your altitude and how crisp you like your cookie. That is just as true as it depends and, in fact, means the same thing as it depends, but it's a lot more specific.
It's a lot more precise. It uses real numbers. It provides a real answer. I've shortened the distance between the question and the answer. I didn't say it depends first. I said it depends at the end. That's the kind of thing that you can do to improve readability and understanding for both humans and machines.
Get to the point (don't bury the lede)
Get to the point. Don't bury the lead. All of you journalists who try to become content marketers, and then everybody in content marketing said, "Oh, you need to wait till the end to get to your point or they won't read the whole thing,"and you were like, "Don't bury the lead," you are correct. For those of you who aren't familiar with journalism speak, not burying the lead basically means get to the point upfront, at the top.
Include all the information that somebody would really need to get from that piece of content. If they don't read anything else, they read that one paragraph and they've gotten the gist. Then people who want to go deep can go deep. That's how people actually like to consume content, and surprisingly it doesn't mean they won't read the content. It just means they don't have to read it if they don't have time, if they need a quick answer.
The same is true with machines. Get to the point upfront. Make it clear right away what the primary entity, the primary topic, the primary focus of your content is and then get into the details. You'll have a much better structured piece of content that's easier to parse on all sides. 
Avoid jargon and "marketing speak"
Avoid jargon. Avoid marketing speak. Not only is it terrible and very hard to understand. You see this a lot. I'm going back again to the example of getting your clients to say what their products do. You work with a lot of B2B companies, you will you will often run into this. Yes, but what does it do? It provides solutions to streamline the workflow and blah, blah. Okay, what does it do? This is the kind of thing that can be really, really hard for companies to get out of their own heads about, but it's so important for users, for machines.
Avoid jargon. Avoid marketing speak. Not to get too tautological, but the more esoteric a word is, the less commonly it's used. That's actually what esoteric means. What that means is the less commonly a word is used, the less likely it is that Google is going to understand its semantic relationships to other entities.
Keep it simple. Be specific. Say what you mean. Wipe out all of the jargon. By wiping out jargon and kind of marketing speak and kind of the fluff that can happen in your content, you're also, once again, reducing the semantic distances between entities, making them easier to parse. 
Organize your information to match the user journey
Organize it and map it out to the user journey. Think about the information somebody might need and the order in which they might need it. 
Break out subtopics with headings
Then break it out with subheadings. This is like very, very basic writing advice, and yet you all aren't doing it. So if you're not going to do it for your users, do it for machines. 
Format lists with bullets or numbers
You can also really impact skimmability for users by breaking out lists with bullets or numbers.
The great thing about that is that breaking out a list with bullets or numbers also makes information easier for a robot to parse and extract. If a lot of these tips seem like they're the same tips that you would use to get featured snippets, they are, because featured snippets are actually a pretty good indicator that you're creating content that a robot can find, parse, understand, and extract, and that's what you want.
So if you're targeting featured snippets, you're probably already doing a lot of these things, good job. 
Grammar and spelling count!
The last thing, which I shouldn't have to say, but I'm going to say is that grammar and spelling and punctuation and things like that absolutely do count. They count to users. They don't count to all users, but they count to users. They also count to search engines.
Things like grammar, spelling, and punctuation are very, very easy signals for a machine to find and parse. Google has been specific in things, like the "Quality Rater Guidelines,"that a well-written, well-structured, well-spelled, grammatically correct document, that these are signs of authoritativeness. I'm not saying that having a greatly spelled document is going to mean that you immediately rocket to the top of the results.
I am saying that if you're not on that stuff, it's probably going to hurt you. So take the time to make sure everything is nice and tidy. You can use vernacular English. You don't have to be perfect "AP Style Guide" all the time. But make sure that you are formatting things properly from a grammatical standpoint as well as a technical standpoint. What I love about all of this, this is just good writing.
This is good writing. It's easy to understand. It's easy to parse. It's still so hard, especially in the marketing world, to get out of that world of jargon, to get to the point, to stop writing 2,000 words because we think we need 2,000 words, to really think about are we creating content that's about what we think it's about.
Use these tools to understand how readable, parsable, and understandable your content is
So my hope for the SEO world and for you is that you can use these tools not just to think about how to dial in the perfect keyword density or whatever to get an almost perfect score on the salience in the natural language processing API. What I'm hoping is that you will use these tools to help yourself understand how readable, how parsable, and how understandable your content is, how much your content is about what you say it's about and what you think it's about so you can create better stuff for users.
It makes the internet a better place, and it will probably make you some money as well. So these are my thoughts. I'd love to hear in the comments if you're using the natural language processing API now, if you've built a tool with it, if you want to build a tool with it, what do you think about this, how do you use this, how has it gone. Tell me all about it. Holla atcha girl.
Have a great Friday.
Video transcription by Speechpad.com
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
0 notes
kjt-lawyers · 5 years ago
Text
Better Content Through NLP (Natural Language Processing) - Whiteboard Friday
Posted by RuthBurrReedy
Gone are the days of optimizing content solely for search engines. For modern SEO, your content needs to please both robots and humans. But how do you know that what you're writing can check the boxes for both man and machine?
In today's Whiteboard Friday, Ruth Burr Reedy focuses on part of her recent MozCon 2019 talk and teaches us all about how Google uses NLP (natural language processing) to truly understand content, plus how you can harness that knowledge to better optimize what you write for people and bots alike.
Click on the whiteboard image above to open a high resolution version in a new tab!
Video Transcription
Howdy, Moz fans. I'm Ruth Burr Reedy, and I am the Vice President of Strategy at UpBuild, a boutique technical marketing agency specializing in technical SEO and advanced web analytics. I recently spoke at MozCon on a basic framework for SEO and approaching changes to our industry that thinks about SEO in the light of we are humans who are marketing to humans, but we are using a machine as the intermediary.
Those videos will be available online at some point. [Editor's note: that point is now!] But today I wanted to talk about one point from my talk that I found really interesting and that has kind of changed the way that I approach content creation, and that is the idea that writing content that is easier for Google, a robot, to understand can actually make you a better writer and help you write better content for humans. It is a win-win. 
The relationships between entities, words, and how people search
To understand how Google is currently approaching parsing content and understanding what content is about, Google is spending a lot of time and a lot of energy and a lot of money on things like neural matching and natural language processing, which seek to understand basically when people talk, what are they talking about?
This goes along with the evolution of search to be more conversational. But there are a lot of times when someone is searching, but they don't totally know what they want, and Google still wants them to get what they want because that's how Google makes money. They are spending a lot of time trying to understand the relationships between entities and between words and how people use words to search.
The example that Danny Sullivan gave online, that I think is a really great example, is if someone is experiencing the soap opera effect on their TV. If you've ever seen a soap opera, you've noticed that they look kind of weird. Someone might be experiencing that, and not knowing what that's called they can't Google soap opera effect because they don't know about it.
They might search something like, "Why does my TV look funny?" Neural matching helps Google understand that when somebody is searching "Why does my TV look funny?" one possible answer might be the soap opera effect. So they can serve up that result, and people are happy. 
Understanding salience
As we're thinking about natural language processing, a core component of natural language processing is understanding salience.
Salience, content, and entities
Salience is a one-word way to sum up to what extent is this piece of content about this specific entity? At this point Google is really good at extracting entities from a piece of content. Entities are basically nouns, people, places, things, proper nouns, regular nouns.
Entities are things, people, etc., numbers, things like that. Google is really good at taking those out and saying, "Okay, here are all of the entities that are contained within this piece of content." Salience attempts to understand how they're related to each other, because what Google is really trying to understand when they're crawling a page is: What is this page about, and is this a good example of a page about this topic?
Salience really goes into the second piece. To what extent is any given entity be the topic of a piece of content? It's often amazing the degree to which a piece of content that a person has created is not actually about anything. I think we've all experienced that.
You're searching and you come to a page and you're like, "This was too vague. This was too broad. This said that it was about one thing, but it was actually about something else. I didn't find what I needed. This wasn't good information for me." As marketers, we're often on the other side of that, trying to get our clients to say what their product actually does on their website or say, "I know you think that you created a guide to Instagram for the holidays. But you actually wrote one paragraph about the holidays and then seven paragraphs about your new Instagram tool. This is not actually a blog post about Instagram for the holidays. It's a piece of content about your tool." These are the kinds of battles that we fight as marketers. 
Natural Language Processing (NLP) APIs
Fortunately, there are now a number of different APIs that you can use to understand natural language processing: 
IBM has one: https://www.ibm.com/watson/services/natural-language-understanding/ 
Google actually has a natural language processing API that's right here on https://cloud.google.com/natural-language/
Is it as sophisticated as what they're using on their own stuff? Probably not. But you can test it out. Put in a piece of content and see (a) what entities Google is able to extract from it, and (b) how salient Google feels each of these entities is to the piece of content as a whole. Again, to what degree is this piece of content about this thing?
So this natural language processing API, which you can try for free and it's actually not that expensive for an API if you want to build a tool with it, will assign each entity that it can extract a salient score between 0 and 1, saying, "Okay, how sure are we that this piece of content is about this thing versus just containing it?"
So the higher or the closer you get to 1, the more confident the tool is that this piece of content is about this thing. 0.9 would be really, really good. 0.01 means it's there, but they're not sure how well it's related. 
A delicious example of how salience and entities work
The example I have here, and this is not taken from a real piece of content — these numbers are made up, it's just an example — is if you had a chocolate chip cookie recipe, you would want chocolate cookies or chocolate chip cookies recipe, chocolate chip cookies, something like that to be the number one entity, the most salient entity, and you would want it to have a pretty high salient score.
You would want the tool to feel pretty confident, yes, this piece of content is about this topic. But what you can also see is the other entities it's extracting and to what degree they are also salient to the topic. So you can see things like if you have a chocolate chip cookie recipe, you would expect to see things like cookie, butter, sugar, 350, which is the temperature you heat your oven, all of the different things that come together to make a chocolate chip cookie recipe.
But I think that it's really, really important for us as SEOs to understand that salience is the future of related keywords. We're beyond the time when to optimize for chocolate chip cookie recipe, we would also be looking for things like chocolate recipe, chocolate chips, chocolate cookie recipe, things like that. Stems, variants, TF-IDF, these are all older methodologies for understanding what a piece of content is about.
Instead what we need to understand is what are the entities that Google, using its vast body of knowledge, using things like Freebase, using large portions of the internet, where is Google seeing these entities co-occur at such a rate that they feel reasonably confident that a piece of content on one entity in order to be salient to that entity would include these other entities?
Using an expert is the best way to create content that's salient to a topic
So chocolate chip cookie recipe, we're now also making sure we're adding things like butter, flour, sugar. This is actually really easy to do if you actually have a chocolate chip cookie recipe to put up there. This is I think what we're going to start seeing as a content trend in SEO is that the best way to create content that is salient to a topic is to have an actual expert in that topic create that content.
Somebody with deep knowledge of a topic is naturally going to include co-occurring terms, because they know how to create something that's about what it's supposed to be about. I think what we're going to start seeing is that people are going to have to start paying more for content marketing, frankly. Unfortunately, a lot of companies seem to think that content marketing is and should be cheap.
Content marketers, I feel you on that. It sucks, and it's no longer the case. We need to start investing in content and investing in experts to create that content so that they can create that deep, rich, salient content that everybody really needs. 
How can you use this API to improve your own SEO? 
One of the things that I like to do with this kind of information is look at — and this is something that I've done for years, just not in this context — but a prime optimization target in general is pages that rank for a topic, but they rank on page 2.
What this often means is that Google understands that that keyword is a topic of the page, but it doesn't necessarily understand that it is a good piece of content on that topic, that the page is actually solely about that content, that it's a good resource. In other words, the signal is there, but it's weak.
What you can do is take content that ranks but not well, run it through this natural language API or another natural language processing tool, and look at how the entities are extracted and how Google is determining that they're related to each other. Sometimes it might be that you need to do some disambiguation. So in this example, you'll notice that while chocolate cookies is called a work of art, and I agree, cookie here is actually called other.
This is because cookie means more than one thing. There's cookies, the baked good, but then there's also cookies, the packet of data. Both of those are legitimate uses of the word "cookie." Words have multiple meanings. If you notice that Google, that this natural language processing API is having trouble correctly classifying your entities, that's a good time to go in and do some disambiguation.
Make sure that the terms surrounding that term are clearly saying, "No, I mean the baked good, not the software piece of data." That's a really great way to kind of bump up your salience. Look at whether or not you have a strong salient score for your primary entity. You'd be amazed at how many pieces of content you can plug into this tool and the top, most salient entity is still only like a 0.01, a 0.14.
A lot of times the API is like "I think this is what it's about," but it's not sure. This is a great time to go in and bump up that content, make it more robust, and look at ways that you can make those entities easier to both extract and to relate to each other. This brings me to my second point, which is my new favorite thing in the world.
Writing for humans and writing for machines, you can now do both at the same time. You no longer have to, and you really haven't had to do this in a long time, but the idea that you might keyword stuff or otherwise create content for Google that your users might not see or care about is way, way, way over.
Now you can create content for Google that also is better for users, because the tenets of machine readability and human readability are moving closer and closer together. 
Tips for writing for human and machine readability:
What I've done here is I did some research not on natural language processing, but on writing for human readability, that is advice from writers, from writing experts on how to write better, clearer, easier to read, easier to understand content.Then I pulled out the pieces of advice that also work as pieces of advice for writing for natural language processing. So natural language processing, again, is the process by which Google or really anything that might be processing language tries to understand how entities are related to each other within a given body of content.
Short, simple sentences
Short, simple sentences. Write simply. Don't use a lot of flowery language. Short sentences and try to keep it to one idea per sentence. 
One idea per sentence
If you're running on, if you've got a lot of different clauses, if you're using a lot of pronouns and it's becoming confusing what you're talking about, that's not great for readers.
It also makes it harder for machines to parse your content. 
Connect questions to answers
Then closely connecting questions to answers. So don't say, "What is the best temperature to bake cookies? Well, let me tell you a story about my grandmother and my childhood," and 500 words later here's the answer. Connect questions to answers. 
What all three of those readability tips have in common is they boil down to reducing the semantic distance between entities.
If you want natural language processing to understand that two entities in your content are closely related, move them closer together in the sentence. Move the words closer together. Reduce the clutter, reduce the fluff, reduce the number of semantic hops that a robot might have to take between one entity and another to understand the relationship, and you've now created content that is more readable because it's shorter and easier to skim, but also easier for a robot to parse and understand.
Be specific first, then explain nuance
Going back to the example of "What is the best temperature to bake chocolate chip cookies at?" Now the real answer to what is the best temperature to bake chocolate cookies is it depends. Hello. Hi, I'm an SEO, and I just answered a question with it depends. It does depend.
That is true, and that is real, but it is not a good answer. It is also not the kind of thing that a robot could extract and reproduce in, for example, voice search or a featured snippet. If somebody says, "Okay, Google, what is a good temperature to bake cookies at?" and Google says, "It depends," that helps nobody even though it's true. So in order to write for both machine and human readability, be specific first and then you can explain nuance.
Then you can go into the details. So a better, just as correct answer to "What is the temperature to bake chocolate chip cookies?" is the best temperature to bake chocolate chip cookies is usually between 325 and 425 degrees, depending on your altitude and how crisp you like your cookie. That is just as true as it depends and, in fact, means the same thing as it depends, but it's a lot more specific.
It's a lot more precise. It uses real numbers. It provides a real answer. I've shortened the distance between the question and the answer. I didn't say it depends first. I said it depends at the end. That's the kind of thing that you can do to improve readability and understanding for both humans and machines.
Get to the point (don't bury the lede)
Get to the point. Don't bury the lead. All of you journalists who try to become content marketers, and then everybody in content marketing said, "Oh, you need to wait till the end to get to your point or they won't read the whole thing,"and you were like, "Don't bury the lead," you are correct. For those of you who aren't familiar with journalism speak, not burying the lead basically means get to the point upfront, at the top.
Include all the information that somebody would really need to get from that piece of content. If they don't read anything else, they read that one paragraph and they've gotten the gist. Then people who want to go deep can go deep. That's how people actually like to consume content, and surprisingly it doesn't mean they won't read the content. It just means they don't have to read it if they don't have time, if they need a quick answer.
The same is true with machines. Get to the point upfront. Make it clear right away what the primary entity, the primary topic, the primary focus of your content is and then get into the details. You'll have a much better structured piece of content that's easier to parse on all sides. 
Avoid jargon and "marketing speak"
Avoid jargon. Avoid marketing speak. Not only is it terrible and very hard to understand. You see this a lot. I'm going back again to the example of getting your clients to say what their products do. You work with a lot of B2B companies, you will you will often run into this. Yes, but what does it do? It provides solutions to streamline the workflow and blah, blah. Okay, what does it do? This is the kind of thing that can be really, really hard for companies to get out of their own heads about, but it's so important for users, for machines.
Avoid jargon. Avoid marketing speak. Not to get too tautological, but the more esoteric a word is, the less commonly it's used. That's actually what esoteric means. What that means is the less commonly a word is used, the less likely it is that Google is going to understand its semantic relationships to other entities.
Keep it simple. Be specific. Say what you mean. Wipe out all of the jargon. By wiping out jargon and kind of marketing speak and kind of the fluff that can happen in your content, you're also, once again, reducing the semantic distances between entities, making them easier to parse. 
Organize your information to match the user journey
Organize it and map it out to the user journey. Think about the information somebody might need and the order in which they might need it. 
Break out subtopics with headings
Then break it out with subheadings. This is like very, very basic writing advice, and yet you all aren't doing it. So if you're not going to do it for your users, do it for machines. 
Format lists with bullets or numbers
You can also really impact skimmability for users by breaking out lists with bullets or numbers.
The great thing about that is that breaking out a list with bullets or numbers also makes information easier for a robot to parse and extract. If a lot of these tips seem like they're the same tips that you would use to get featured snippets, they are, because featured snippets are actually a pretty good indicator that you're creating content that a robot can find, parse, understand, and extract, and that's what you want.
So if you're targeting featured snippets, you're probably already doing a lot of these things, good job. 
Grammar and spelling count!
The last thing, which I shouldn't have to say, but I'm going to say is that grammar and spelling and punctuation and things like that absolutely do count. They count to users. They don't count to all users, but they count to users. They also count to search engines.
Things like grammar, spelling, and punctuation are very, very easy signals for a machine to find and parse. Google has been specific in things, like the "Quality Rater Guidelines,"that a well-written, well-structured, well-spelled, grammatically correct document, that these are signs of authoritativeness. I'm not saying that having a greatly spelled document is going to mean that you immediately rocket to the top of the results.
I am saying that if you're not on that stuff, it's probably going to hurt you. So take the time to make sure everything is nice and tidy. You can use vernacular English. You don't have to be perfect "AP Style Guide" all the time. But make sure that you are formatting things properly from a grammatical standpoint as well as a technical standpoint. What I love about all of this, this is just good writing.
This is good writing. It's easy to understand. It's easy to parse. It's still so hard, especially in the marketing world, to get out of that world of jargon, to get to the point, to stop writing 2,000 words because we think we need 2,000 words, to really think about are we creating content that's about what we think it's about.
Use these tools to understand how readable, parsable, and understandable your content is
So my hope for the SEO world and for you is that you can use these tools not just to think about how to dial in the perfect keyword density or whatever to get an almost perfect score on the salience in the natural language processing API. What I'm hoping is that you will use these tools to help yourself understand how readable, how parsable, and how understandable your content is, how much your content is about what you say it's about and what you think it's about so you can create better stuff for users.
It makes the internet a better place, and it will probably make you some money as well. So these are my thoughts. I'd love to hear in the comments if you're using the natural language processing API now, if you've built a tool with it, if you want to build a tool with it, what do you think about this, how do you use this, how has it gone. Tell me all about it. Holla atcha girl.
Have a great Friday.
Video transcription by Speechpad.com
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
0 notes
noithatotoaz · 5 years ago
Text
Better Content Through NLP (Natural Language Processing) - Whiteboard Friday
Posted by RuthBurrReedy
Gone are the days of optimizing content solely for search engines. For modern SEO, your content needs to please both robots and humans. But how do you know that what you're writing can check the boxes for both man and machine?
In today's Whiteboard Friday, Ruth Burr Reedy focuses on part of her recent MozCon 2019 talk and teaches us all about how Google uses NLP (natural language processing) to truly understand content, plus how you can harness that knowledge to better optimize what you write for people and bots alike.
Click on the whiteboard image above to open a high resolution version in a new tab!
Video Transcription
Howdy, Moz fans. I'm Ruth Burr Reedy, and I am the Vice President of Strategy at UpBuild, a boutique technical marketing agency specializing in technical SEO and advanced web analytics. I recently spoke at MozCon on a basic framework for SEO and approaching changes to our industry that thinks about SEO in the light of we are humans who are marketing to humans, but we are using a machine as the intermediary.
Those videos will be available online at some point. [Editor's note: that point is now!] But today I wanted to talk about one point from my talk that I found really interesting and that has kind of changed the way that I approach content creation, and that is the idea that writing content that is easier for Google, a robot, to understand can actually make you a better writer and help you write better content for humans. It is a win-win. 
The relationships between entities, words, and how people search
To understand how Google is currently approaching parsing content and understanding what content is about, Google is spending a lot of time and a lot of energy and a lot of money on things like neural matching and natural language processing, which seek to understand basically when people talk, what are they talking about?
This goes along with the evolution of search to be more conversational. But there are a lot of times when someone is searching, but they don't totally know what they want, and Google still wants them to get what they want because that's how Google makes money. They are spending a lot of time trying to understand the relationships between entities and between words and how people use words to search.
The example that Danny Sullivan gave online, that I think is a really great example, is if someone is experiencing the soap opera effect on their TV. If you've ever seen a soap opera, you've noticed that they look kind of weird. Someone might be experiencing that, and not knowing what that's called they can't Google soap opera effect because they don't know about it.
They might search something like, "Why does my TV look funny?" Neural matching helps Google understand that when somebody is searching "Why does my TV look funny?" one possible answer might be the soap opera effect. So they can serve up that result, and people are happy. 
Understanding salience
As we're thinking about natural language processing, a core component of natural language processing is understanding salience.
Salience, content, and entities
Salience is a one-word way to sum up to what extent is this piece of content about this specific entity? At this point Google is really good at extracting entities from a piece of content. Entities are basically nouns, people, places, things, proper nouns, regular nouns.
Entities are things, people, etc., numbers, things like that. Google is really good at taking those out and saying, "Okay, here are all of the entities that are contained within this piece of content." Salience attempts to understand how they're related to each other, because what Google is really trying to understand when they're crawling a page is: What is this page about, and is this a good example of a page about this topic?
Salience really goes into the second piece. To what extent is any given entity be the topic of a piece of content? It's often amazing the degree to which a piece of content that a person has created is not actually about anything. I think we've all experienced that.
You're searching and you come to a page and you're like, "This was too vague. This was too broad. This said that it was about one thing, but it was actually about something else. I didn't find what I needed. This wasn't good information for me." As marketers, we're often on the other side of that, trying to get our clients to say what their product actually does on their website or say, "I know you think that you created a guide to Instagram for the holidays. But you actually wrote one paragraph about the holidays and then seven paragraphs about your new Instagram tool. This is not actually a blog post about Instagram for the holidays. It's a piece of content about your tool." These are the kinds of battles that we fight as marketers. 
Natural Language Processing (NLP) APIs
Fortunately, there are now a number of different APIs that you can use to understand natural language processing: 
IBM has one: https://www.ibm.com/watson/services/natural-language-understanding/ 
Google actually has a natural language processing API that's right here on https://cloud.google.com/natural-language/
Is it as sophisticated as what they're using on their own stuff? Probably not. But you can test it out. Put in a piece of content and see (a) what entities Google is able to extract from it, and (b) how salient Google feels each of these entities is to the piece of content as a whole. Again, to what degree is this piece of content about this thing?
So this natural language processing API, which you can try for free and it's actually not that expensive for an API if you want to build a tool with it, will assign each entity that it can extract a salient score between 0 and 1, saying, "Okay, how sure are we that this piece of content is about this thing versus just containing it?"
So the higher or the closer you get to 1, the more confident the tool is that this piece of content is about this thing. 0.9 would be really, really good. 0.01 means it's there, but they're not sure how well it's related. 
A delicious example of how salience and entities work
The example I have here, and this is not taken from a real piece of content — these numbers are made up, it's just an example — is if you had a chocolate chip cookie recipe, you would want chocolate cookies or chocolate chip cookies recipe, chocolate chip cookies, something like that to be the number one entity, the most salient entity, and you would want it to have a pretty high salient score.
You would want the tool to feel pretty confident, yes, this piece of content is about this topic. But what you can also see is the other entities it's extracting and to what degree they are also salient to the topic. So you can see things like if you have a chocolate chip cookie recipe, you would expect to see things like cookie, butter, sugar, 350, which is the temperature you heat your oven, all of the different things that come together to make a chocolate chip cookie recipe.
But I think that it's really, really important for us as SEOs to understand that salience is the future of related keywords. We're beyond the time when to optimize for chocolate chip cookie recipe, we would also be looking for things like chocolate recipe, chocolate chips, chocolate cookie recipe, things like that. Stems, variants, TF-IDF, these are all older methodologies for understanding what a piece of content is about.
Instead what we need to understand is what are the entities that Google, using its vast body of knowledge, using things like Freebase, using large portions of the internet, where is Google seeing these entities co-occur at such a rate that they feel reasonably confident that a piece of content on one entity in order to be salient to that entity would include these other entities?
Using an expert is the best way to create content that's salient to a topic
So chocolate chip cookie recipe, we're now also making sure we're adding things like butter, flour, sugar. This is actually really easy to do if you actually have a chocolate chip cookie recipe to put up there. This is I think what we're going to start seeing as a content trend in SEO is that the best way to create content that is salient to a topic is to have an actual expert in that topic create that content.
Somebody with deep knowledge of a topic is naturally going to include co-occurring terms, because they know how to create something that's about what it's supposed to be about. I think what we're going to start seeing is that people are going to have to start paying more for content marketing, frankly. Unfortunately, a lot of companies seem to think that content marketing is and should be cheap.
Content marketers, I feel you on that. It sucks, and it's no longer the case. We need to start investing in content and investing in experts to create that content so that they can create that deep, rich, salient content that everybody really needs. 
How can you use this API to improve your own SEO? 
One of the things that I like to do with this kind of information is look at — and this is something that I've done for years, just not in this context — but a prime optimization target in general is pages that rank for a topic, but they rank on page 2.
What this often means is that Google understands that that keyword is a topic of the page, but it doesn't necessarily understand that it is a good piece of content on that topic, that the page is actually solely about that content, that it's a good resource. In other words, the signal is there, but it's weak.
What you can do is take content that ranks but not well, run it through this natural language API or another natural language processing tool, and look at how the entities are extracted and how Google is determining that they're related to each other. Sometimes it might be that you need to do some disambiguation. So in this example, you'll notice that while chocolate cookies is called a work of art, and I agree, cookie here is actually called other.
This is because cookie means more than one thing. There's cookies, the baked good, but then there's also cookies, the packet of data. Both of those are legitimate uses of the word "cookie." Words have multiple meanings. If you notice that Google, that this natural language processing API is having trouble correctly classifying your entities, that's a good time to go in and do some disambiguation.
Make sure that the terms surrounding that term are clearly saying, "No, I mean the baked good, not the software piece of data." That's a really great way to kind of bump up your salience. Look at whether or not you have a strong salient score for your primary entity. You'd be amazed at how many pieces of content you can plug into this tool and the top, most salient entity is still only like a 0.01, a 0.14.
A lot of times the API is like "I think this is what it's about," but it's not sure. This is a great time to go in and bump up that content, make it more robust, and look at ways that you can make those entities easier to both extract and to relate to each other. This brings me to my second point, which is my new favorite thing in the world.
Writing for humans and writing for machines, you can now do both at the same time. You no longer have to, and you really haven't had to do this in a long time, but the idea that you might keyword stuff or otherwise create content for Google that your users might not see or care about is way, way, way over.
Now you can create content for Google that also is better for users, because the tenets of machine readability and human readability are moving closer and closer together. 
Tips for writing for human and machine readability:
What I've done here is I did some research not on natural language processing, but on writing for human readability, that is advice from writers, from writing experts on how to write better, clearer, easier to read, easier to understand content.Then I pulled out the pieces of advice that also work as pieces of advice for writing for natural language processing. So natural language processing, again, is the process by which Google or really anything that might be processing language tries to understand how entities are related to each other within a given body of content.
Short, simple sentences
Short, simple sentences. Write simply. Don't use a lot of flowery language. Short sentences and try to keep it to one idea per sentence. 
One idea per sentence
If you're running on, if you've got a lot of different clauses, if you're using a lot of pronouns and it's becoming confusing what you're talking about, that's not great for readers.
It also makes it harder for machines to parse your content. 
Connect questions to answers
Then closely connecting questions to answers. So don't say, "What is the best temperature to bake cookies? Well, let me tell you a story about my grandmother and my childhood," and 500 words later here's the answer. Connect questions to answers. 
What all three of those readability tips have in common is they boil down to reducing the semantic distance between entities.
If you want natural language processing to understand that two entities in your content are closely related, move them closer together in the sentence. Move the words closer together. Reduce the clutter, reduce the fluff, reduce the number of semantic hops that a robot might have to take between one entity and another to understand the relationship, and you've now created content that is more readable because it's shorter and easier to skim, but also easier for a robot to parse and understand.
Be specific first, then explain nuance
Going back to the example of "What is the best temperature to bake chocolate chip cookies at?" Now the real answer to what is the best temperature to bake chocolate cookies is it depends. Hello. Hi, I'm an SEO, and I just answered a question with it depends. It does depend.
That is true, and that is real, but it is not a good answer. It is also not the kind of thing that a robot could extract and reproduce in, for example, voice search or a featured snippet. If somebody says, "Okay, Google, what is a good temperature to bake cookies at?" and Google says, "It depends," that helps nobody even though it's true. So in order to write for both machine and human readability, be specific first and then you can explain nuance.
Then you can go into the details. So a better, just as correct answer to "What is the temperature to bake chocolate chip cookies?" is the best temperature to bake chocolate chip cookies is usually between 325 and 425 degrees, depending on your altitude and how crisp you like your cookie. That is just as true as it depends and, in fact, means the same thing as it depends, but it's a lot more specific.
It's a lot more precise. It uses real numbers. It provides a real answer. I've shortened the distance between the question and the answer. I didn't say it depends first. I said it depends at the end. That's the kind of thing that you can do to improve readability and understanding for both humans and machines.
Get to the point (don't bury the lede)
Get to the point. Don't bury the lead. All of you journalists who try to become content marketers, and then everybody in content marketing said, "Oh, you need to wait till the end to get to your point or they won't read the whole thing,"and you were like, "Don't bury the lead," you are correct. For those of you who aren't familiar with journalism speak, not burying the lead basically means get to the point upfront, at the top.
Include all the information that somebody would really need to get from that piece of content. If they don't read anything else, they read that one paragraph and they've gotten the gist. Then people who want to go deep can go deep. That's how people actually like to consume content, and surprisingly it doesn't mean they won't read the content. It just means they don't have to read it if they don't have time, if they need a quick answer.
The same is true with machines. Get to the point upfront. Make it clear right away what the primary entity, the primary topic, the primary focus of your content is and then get into the details. You'll have a much better structured piece of content that's easier to parse on all sides. 
Avoid jargon and "marketing speak"
Avoid jargon. Avoid marketing speak. Not only is it terrible and very hard to understand. You see this a lot. I'm going back again to the example of getting your clients to say what their products do. You work with a lot of B2B companies, you will you will often run into this. Yes, but what does it do? It provides solutions to streamline the workflow and blah, blah. Okay, what does it do? This is the kind of thing that can be really, really hard for companies to get out of their own heads about, but it's so important for users, for machines.
Avoid jargon. Avoid marketing speak. Not to get too tautological, but the more esoteric a word is, the less commonly it's used. That's actually what esoteric means. What that means is the less commonly a word is used, the less likely it is that Google is going to understand its semantic relationships to other entities.
Keep it simple. Be specific. Say what you mean. Wipe out all of the jargon. By wiping out jargon and kind of marketing speak and kind of the fluff that can happen in your content, you're also, once again, reducing the semantic distances between entities, making them easier to parse. 
Organize your information to match the user journey
Organize it and map it out to the user journey. Think about the information somebody might need and the order in which they might need it. 
Break out subtopics with headings
Then break it out with subheadings. This is like very, very basic writing advice, and yet you all aren't doing it. So if you're not going to do it for your users, do it for machines. 
Format lists with bullets or numbers
You can also really impact skimmability for users by breaking out lists with bullets or numbers.
The great thing about that is that breaking out a list with bullets or numbers also makes information easier for a robot to parse and extract. If a lot of these tips seem like they're the same tips that you would use to get featured snippets, they are, because featured snippets are actually a pretty good indicator that you're creating content that a robot can find, parse, understand, and extract, and that's what you want.
So if you're targeting featured snippets, you're probably already doing a lot of these things, good job. 
Grammar and spelling count!
The last thing, which I shouldn't have to say, but I'm going to say is that grammar and spelling and punctuation and things like that absolutely do count. They count to users. They don't count to all users, but they count to users. They also count to search engines.
Things like grammar, spelling, and punctuation are very, very easy signals for a machine to find and parse. Google has been specific in things, like the "Quality Rater Guidelines,"that a well-written, well-structured, well-spelled, grammatically correct document, that these are signs of authoritativeness. I'm not saying that having a greatly spelled document is going to mean that you immediately rocket to the top of the results.
I am saying that if you're not on that stuff, it's probably going to hurt you. So take the time to make sure everything is nice and tidy. You can use vernacular English. You don't have to be perfect "AP Style Guide" all the time. But make sure that you are formatting things properly from a grammatical standpoint as well as a technical standpoint. What I love about all of this, this is just good writing.
This is good writing. It's easy to understand. It's easy to parse. It's still so hard, especially in the marketing world, to get out of that world of jargon, to get to the point, to stop writing 2,000 words because we think we need 2,000 words, to really think about are we creating content that's about what we think it's about.
Use these tools to understand how readable, parsable, and understandable your content is
So my hope for the SEO world and for you is that you can use these tools not just to think about how to dial in the perfect keyword density or whatever to get an almost perfect score on the salience in the natural language processing API. What I'm hoping is that you will use these tools to help yourself understand how readable, how parsable, and how understandable your content is, how much your content is about what you say it's about and what you think it's about so you can create better stuff for users.
It makes the internet a better place, and it will probably make you some money as well. So these are my thoughts. I'd love to hear in the comments if you're using the natural language processing API now, if you've built a tool with it, if you want to build a tool with it, what do you think about this, how do you use this, how has it gone. Tell me all about it. Holla atcha girl.
Have a great Friday.
Video transcription by Speechpad.com
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
0 notes
bfxenon · 5 years ago
Text
Better Content Through NLP (Natural Language Processing) - Whiteboard Friday
Posted by RuthBurrReedy
Gone are the days of optimizing content solely for search engines. For modern SEO, your content needs to please both robots and humans. But how do you know that what you're writing can check the boxes for both man and machine?
In today's Whiteboard Friday, Ruth Burr Reedy focuses on part of her recent MozCon 2019 talk and teaches us all about how Google uses NLP (natural language processing) to truly understand content, plus how you can harness that knowledge to better optimize what you write for people and bots alike.
Click on the whiteboard image above to open a high resolution version in a new tab!
Video Transcription
Howdy, Moz fans. I'm Ruth Burr Reedy, and I am the Vice President of Strategy at UpBuild, a boutique technical marketing agency specializing in technical SEO and advanced web analytics. I recently spoke at MozCon on a basic framework for SEO and approaching changes to our industry that thinks about SEO in the light of we are humans who are marketing to humans, but we are using a machine as the intermediary.
Those videos will be available online at some point. [Editor's note: that point is now!] But today I wanted to talk about one point from my talk that I found really interesting and that has kind of changed the way that I approach content creation, and that is the idea that writing content that is easier for Google, a robot, to understand can actually make you a better writer and help you write better content for humans. It is a win-win. 
The relationships between entities, words, and how people search
To understand how Google is currently approaching parsing content and understanding what content is about, Google is spending a lot of time and a lot of energy and a lot of money on things like neural matching and natural language processing, which seek to understand basically when people talk, what are they talking about?
This goes along with the evolution of search to be more conversational. But there are a lot of times when someone is searching, but they don't totally know what they want, and Google still wants them to get what they want because that's how Google makes money. They are spending a lot of time trying to understand the relationships between entities and between words and how people use words to search.
The example that Danny Sullivan gave online, that I think is a really great example, is if someone is experiencing the soap opera effect on their TV. If you've ever seen a soap opera, you've noticed that they look kind of weird. Someone might be experiencing that, and not knowing what that's called they can't Google soap opera effect because they don't know about it.
They might search something like, "Why does my TV look funny?" Neural matching helps Google understand that when somebody is searching "Why does my TV look funny?" one possible answer might be the soap opera effect. So they can serve up that result, and people are happy. 
Understanding salience
As we're thinking about natural language processing, a core component of natural language processing is understanding salience.
Salience, content, and entities
Salience is a one-word way to sum up to what extent is this piece of content about this specific entity? At this point Google is really good at extracting entities from a piece of content. Entities are basically nouns, people, places, things, proper nouns, regular nouns.
Entities are things, people, etc., numbers, things like that. Google is really good at taking those out and saying, "Okay, here are all of the entities that are contained within this piece of content." Salience attempts to understand how they're related to each other, because what Google is really trying to understand when they're crawling a page is: What is this page about, and is this a good example of a page about this topic?
Salience really goes into the second piece. To what extent is any given entity be the topic of a piece of content? It's often amazing the degree to which a piece of content that a person has created is not actually about anything. I think we've all experienced that.
You're searching and you come to a page and you're like, "This was too vague. This was too broad. This said that it was about one thing, but it was actually about something else. I didn't find what I needed. This wasn't good information for me." As marketers, we're often on the other side of that, trying to get our clients to say what their product actually does on their website or say, "I know you think that you created a guide to Instagram for the holidays. But you actually wrote one paragraph about the holidays and then seven paragraphs about your new Instagram tool. This is not actually a blog post about Instagram for the holidays. It's a piece of content about your tool." These are the kinds of battles that we fight as marketers. 
Natural Language Processing (NLP) APIs
Fortunately, there are now a number of different APIs that you can use to understand natural language processing: 
IBM has one: https://www.ibm.com/watson/services/natural-language-understanding/ 
Google actually has a natural language processing API that's right here on https://cloud.google.com/natural-language/
Is it as sophisticated as what they're using on their own stuff? Probably not. But you can test it out. Put in a piece of content and see (a) what entities Google is able to extract from it, and (b) how salient Google feels each of these entities is to the piece of content as a whole. Again, to what degree is this piece of content about this thing?
So this natural language processing API, which you can try for free and it's actually not that expensive for an API if you want to build a tool with it, will assign each entity that it can extract a salient score between 0 and 1, saying, "Okay, how sure are we that this piece of content is about this thing versus just containing it?"
So the higher or the closer you get to 1, the more confident the tool is that this piece of content is about this thing. 0.9 would be really, really good. 0.01 means it's there, but they're not sure how well it's related. 
A delicious example of how salience and entities work
The example I have here, and this is not taken from a real piece of content — these numbers are made up, it's just an example — is if you had a chocolate chip cookie recipe, you would want chocolate cookies or chocolate chip cookies recipe, chocolate chip cookies, something like that to be the number one entity, the most salient entity, and you would want it to have a pretty high salient score.
You would want the tool to feel pretty confident, yes, this piece of content is about this topic. But what you can also see is the other entities it's extracting and to what degree they are also salient to the topic. So you can see things like if you have a chocolate chip cookie recipe, you would expect to see things like cookie, butter, sugar, 350, which is the temperature you heat your oven, all of the different things that come together to make a chocolate chip cookie recipe.
But I think that it's really, really important for us as SEOs to understand that salience is the future of related keywords. We're beyond the time when to optimize for chocolate chip cookie recipe, we would also be looking for things like chocolate recipe, chocolate chips, chocolate cookie recipe, things like that. Stems, variants, TF-IDF, these are all older methodologies for understanding what a piece of content is about.
Instead what we need to understand is what are the entities that Google, using its vast body of knowledge, using things like Freebase, using large portions of the internet, where is Google seeing these entities co-occur at such a rate that they feel reasonably confident that a piece of content on one entity in order to be salient to that entity would include these other entities?
Using an expert is the best way to create content that's salient to a topic
So chocolate chip cookie recipe, we're now also making sure we're adding things like butter, flour, sugar. This is actually really easy to do if you actually have a chocolate chip cookie recipe to put up there. This is I think what we're going to start seeing as a content trend in SEO is that the best way to create content that is salient to a topic is to have an actual expert in that topic create that content.
Somebody with deep knowledge of a topic is naturally going to include co-occurring terms, because they know how to create something that's about what it's supposed to be about. I think what we're going to start seeing is that people are going to have to start paying more for content marketing, frankly. Unfortunately, a lot of companies seem to think that content marketing is and should be cheap.
Content marketers, I feel you on that. It sucks, and it's no longer the case. We need to start investing in content and investing in experts to create that content so that they can create that deep, rich, salient content that everybody really needs. 
How can you use this API to improve your own SEO? 
One of the things that I like to do with this kind of information is look at — and this is something that I've done for years, just not in this context — but a prime optimization target in general is pages that rank for a topic, but they rank on page 2.
What this often means is that Google understands that that keyword is a topic of the page, but it doesn't necessarily understand that it is a good piece of content on that topic, that the page is actually solely about that content, that it's a good resource. In other words, the signal is there, but it's weak.
What you can do is take content that ranks but not well, run it through this natural language API or another natural language processing tool, and look at how the entities are extracted and how Google is determining that they're related to each other. Sometimes it might be that you need to do some disambiguation. So in this example, you'll notice that while chocolate cookies is called a work of art, and I agree, cookie here is actually called other.
This is because cookie means more than one thing. There's cookies, the baked good, but then there's also cookies, the packet of data. Both of those are legitimate uses of the word "cookie." Words have multiple meanings. If you notice that Google, that this natural language processing API is having trouble correctly classifying your entities, that's a good time to go in and do some disambiguation.
Make sure that the terms surrounding that term are clearly saying, "No, I mean the baked good, not the software piece of data." That's a really great way to kind of bump up your salience. Look at whether or not you have a strong salient score for your primary entity. You'd be amazed at how many pieces of content you can plug into this tool and the top, most salient entity is still only like a 0.01, a 0.14.
A lot of times the API is like "I think this is what it's about," but it's not sure. This is a great time to go in and bump up that content, make it more robust, and look at ways that you can make those entities easier to both extract and to relate to each other. This brings me to my second point, which is my new favorite thing in the world.
Writing for humans and writing for machines, you can now do both at the same time. You no longer have to, and you really haven't had to do this in a long time, but the idea that you might keyword stuff or otherwise create content for Google that your users might not see or care about is way, way, way over.
Now you can create content for Google that also is better for users, because the tenets of machine readability and human readability are moving closer and closer together. 
Tips for writing for human and machine readability:
What I've done here is I did some research not on natural language processing, but on writing for human readability, that is advice from writers, from writing experts on how to write better, clearer, easier to read, easier to understand content.Then I pulled out the pieces of advice that also work as pieces of advice for writing for natural language processing. So natural language processing, again, is the process by which Google or really anything that might be processing language tries to understand how entities are related to each other within a given body of content.
Short, simple sentences
Short, simple sentences. Write simply. Don't use a lot of flowery language. Short sentences and try to keep it to one idea per sentence. 
One idea per sentence
If you're running on, if you've got a lot of different clauses, if you're using a lot of pronouns and it's becoming confusing what you're talking about, that's not great for readers.
It also makes it harder for machines to parse your content. 
Connect questions to answers
Then closely connecting questions to answers. So don't say, "What is the best temperature to bake cookies? Well, let me tell you a story about my grandmother and my childhood," and 500 words later here's the answer. Connect questions to answers. 
What all three of those readability tips have in common is they boil down to reducing the semantic distance between entities.
If you want natural language processing to understand that two entities in your content are closely related, move them closer together in the sentence. Move the words closer together. Reduce the clutter, reduce the fluff, reduce the number of semantic hops that a robot might have to take between one entity and another to understand the relationship, and you've now created content that is more readable because it's shorter and easier to skim, but also easier for a robot to parse and understand.
Be specific first, then explain nuance
Going back to the example of "What is the best temperature to bake chocolate chip cookies at?" Now the real answer to what is the best temperature to bake chocolate cookies is it depends. Hello. Hi, I'm an SEO, and I just answered a question with it depends. It does depend.
That is true, and that is real, but it is not a good answer. It is also not the kind of thing that a robot could extract and reproduce in, for example, voice search or a featured snippet. If somebody says, "Okay, Google, what is a good temperature to bake cookies at?" and Google says, "It depends," that helps nobody even though it's true. So in order to write for both machine and human readability, be specific first and then you can explain nuance.
Then you can go into the details. So a better, just as correct answer to "What is the temperature to bake chocolate chip cookies?" is the best temperature to bake chocolate chip cookies is usually between 325 and 425 degrees, depending on your altitude and how crisp you like your cookie. That is just as true as it depends and, in fact, means the same thing as it depends, but it's a lot more specific.
It's a lot more precise. It uses real numbers. It provides a real answer. I've shortened the distance between the question and the answer. I didn't say it depends first. I said it depends at the end. That's the kind of thing that you can do to improve readability and understanding for both humans and machines.
Get to the point (don't bury the lede)
Get to the point. Don't bury the lead. All of you journalists who try to become content marketers, and then everybody in content marketing said, "Oh, you need to wait till the end to get to your point or they won't read the whole thing,"and you were like, "Don't bury the lead," you are correct. For those of you who aren't familiar with journalism speak, not burying the lead basically means get to the point upfront, at the top.
Include all the information that somebody would really need to get from that piece of content. If they don't read anything else, they read that one paragraph and they've gotten the gist. Then people who want to go deep can go deep. That's how people actually like to consume content, and surprisingly it doesn't mean they won't read the content. It just means they don't have to read it if they don't have time, if they need a quick answer.
The same is true with machines. Get to the point upfront. Make it clear right away what the primary entity, the primary topic, the primary focus of your content is and then get into the details. You'll have a much better structured piece of content that's easier to parse on all sides. 
Avoid jargon and "marketing speak"
Avoid jargon. Avoid marketing speak. Not only is it terrible and very hard to understand. You see this a lot. I'm going back again to the example of getting your clients to say what their products do. You work with a lot of B2B companies, you will you will often run into this. Yes, but what does it do? It provides solutions to streamline the workflow and blah, blah. Okay, what does it do? This is the kind of thing that can be really, really hard for companies to get out of their own heads about, but it's so important for users, for machines.
Avoid jargon. Avoid marketing speak. Not to get too tautological, but the more esoteric a word is, the less commonly it's used. That's actually what esoteric means. What that means is the less commonly a word is used, the less likely it is that Google is going to understand its semantic relationships to other entities.
Keep it simple. Be specific. Say what you mean. Wipe out all of the jargon. By wiping out jargon and kind of marketing speak and kind of the fluff that can happen in your content, you're also, once again, reducing the semantic distances between entities, making them easier to parse. 
Organize your information to match the user journey
Organize it and map it out to the user journey. Think about the information somebody might need and the order in which they might need it. 
Break out subtopics with headings
Then break it out with subheadings. This is like very, very basic writing advice, and yet you all aren't doing it. So if you're not going to do it for your users, do it for machines. 
Format lists with bullets or numbers
You can also really impact skimmability for users by breaking out lists with bullets or numbers.
The great thing about that is that breaking out a list with bullets or numbers also makes information easier for a robot to parse and extract. If a lot of these tips seem like they're the same tips that you would use to get featured snippets, they are, because featured snippets are actually a pretty good indicator that you're creating content that a robot can find, parse, understand, and extract, and that's what you want.
So if you're targeting featured snippets, you're probably already doing a lot of these things, good job. 
Grammar and spelling count!
The last thing, which I shouldn't have to say, but I'm going to say is that grammar and spelling and punctuation and things like that absolutely do count. They count to users. They don't count to all users, but they count to users. They also count to search engines.
Things like grammar, spelling, and punctuation are very, very easy signals for a machine to find and parse. Google has been specific in things, like the "Quality Rater Guidelines,"that a well-written, well-structured, well-spelled, grammatically correct document, that these are signs of authoritativeness. I'm not saying that having a greatly spelled document is going to mean that you immediately rocket to the top of the results.
I am saying that if you're not on that stuff, it's probably going to hurt you. So take the time to make sure everything is nice and tidy. You can use vernacular English. You don't have to be perfect "AP Style Guide" all the time. But make sure that you are formatting things properly from a grammatical standpoint as well as a technical standpoint. What I love about all of this, this is just good writing.
This is good writing. It's easy to understand. It's easy to parse. It's still so hard, especially in the marketing world, to get out of that world of jargon, to get to the point, to stop writing 2,000 words because we think we need 2,000 words, to really think about are we creating content that's about what we think it's about.
Use these tools to understand how readable, parsable, and understandable your content is
So my hope for the SEO world and for you is that you can use these tools not just to think about how to dial in the perfect keyword density or whatever to get an almost perfect score on the salience in the natural language processing API. What I'm hoping is that you will use these tools to help yourself understand how readable, how parsable, and how understandable your content is, how much your content is about what you say it's about and what you think it's about so you can create better stuff for users.
It makes the internet a better place, and it will probably make you some money as well. So these are my thoughts. I'd love to hear in the comments if you're using the natural language processing API now, if you've built a tool with it, if you want to build a tool with it, what do you think about this, how do you use this, how has it gone. Tell me all about it. Holla atcha girl.
Have a great Friday.
Video transcription by Speechpad.com
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
0 notes
drummcarpentry · 5 years ago
Text
Better Content Through NLP (Natural Language Processing) - Whiteboard Friday
Posted by RuthBurrReedy
Gone are the days of optimizing content solely for search engines. For modern SEO, your content needs to please both robots and humans. But how do you know that what you're writing can check the boxes for both man and machine?
In today's Whiteboard Friday, Ruth Burr Reedy focuses on part of her recent MozCon 2019 talk and teaches us all about how Google uses NLP (natural language processing) to truly understand content, plus how you can harness that knowledge to better optimize what you write for people and bots alike.
Click on the whiteboard image above to open a high resolution version in a new tab!
Video Transcription
Howdy, Moz fans. I'm Ruth Burr Reedy, and I am the Vice President of Strategy at UpBuild, a boutique technical marketing agency specializing in technical SEO and advanced web analytics. I recently spoke at MozCon on a basic framework for SEO and approaching changes to our industry that thinks about SEO in the light of we are humans who are marketing to humans, but we are using a machine as the intermediary.
Those videos will be available online at some point. [Editor's note: that point is now!] But today I wanted to talk about one point from my talk that I found really interesting and that has kind of changed the way that I approach content creation, and that is the idea that writing content that is easier for Google, a robot, to understand can actually make you a better writer and help you write better content for humans. It is a win-win. 
The relationships between entities, words, and how people search
To understand how Google is currently approaching parsing content and understanding what content is about, Google is spending a lot of time and a lot of energy and a lot of money on things like neural matching and natural language processing, which seek to understand basically when people talk, what are they talking about?
This goes along with the evolution of search to be more conversational. But there are a lot of times when someone is searching, but they don't totally know what they want, and Google still wants them to get what they want because that's how Google makes money. They are spending a lot of time trying to understand the relationships between entities and between words and how people use words to search.
The example that Danny Sullivan gave online, that I think is a really great example, is if someone is experiencing the soap opera effect on their TV. If you've ever seen a soap opera, you've noticed that they look kind of weird. Someone might be experiencing that, and not knowing what that's called they can't Google soap opera effect because they don't know about it.
They might search something like, "Why does my TV look funny?" Neural matching helps Google understand that when somebody is searching "Why does my TV look funny?" one possible answer might be the soap opera effect. So they can serve up that result, and people are happy. 
Understanding salience
As we're thinking about natural language processing, a core component of natural language processing is understanding salience.
Salience, content, and entities
Salience is a one-word way to sum up to what extent is this piece of content about this specific entity? At this point Google is really good at extracting entities from a piece of content. Entities are basically nouns, people, places, things, proper nouns, regular nouns.
Entities are things, people, etc., numbers, things like that. Google is really good at taking those out and saying, "Okay, here are all of the entities that are contained within this piece of content." Salience attempts to understand how they're related to each other, because what Google is really trying to understand when they're crawling a page is: What is this page about, and is this a good example of a page about this topic?
Salience really goes into the second piece. To what extent is any given entity be the topic of a piece of content? It's often amazing the degree to which a piece of content that a person has created is not actually about anything. I think we've all experienced that.
You're searching and you come to a page and you're like, "This was too vague. This was too broad. This said that it was about one thing, but it was actually about something else. I didn't find what I needed. This wasn't good information for me." As marketers, we're often on the other side of that, trying to get our clients to say what their product actually does on their website or say, "I know you think that you created a guide to Instagram for the holidays. But you actually wrote one paragraph about the holidays and then seven paragraphs about your new Instagram tool. This is not actually a blog post about Instagram for the holidays. It's a piece of content about your tool." These are the kinds of battles that we fight as marketers. 
Natural Language Processing (NLP) APIs
Fortunately, there are now a number of different APIs that you can use to understand natural language processing: 
IBM has one: https://www.ibm.com/watson/services/natural-language-understanding/ 
Google actually has a natural language processing API that's right here on https://cloud.google.com/natural-language/
Is it as sophisticated as what they're using on their own stuff? Probably not. But you can test it out. Put in a piece of content and see (a) what entities Google is able to extract from it, and (b) how salient Google feels each of these entities is to the piece of content as a whole. Again, to what degree is this piece of content about this thing?
So this natural language processing API, which you can try for free and it's actually not that expensive for an API if you want to build a tool with it, will assign each entity that it can extract a salient score between 0 and 1, saying, "Okay, how sure are we that this piece of content is about this thing versus just containing it?"
So the higher or the closer you get to 1, the more confident the tool is that this piece of content is about this thing. 0.9 would be really, really good. 0.01 means it's there, but they're not sure how well it's related. 
A delicious example of how salience and entities work
The example I have here, and this is not taken from a real piece of content — these numbers are made up, it's just an example — is if you had a chocolate chip cookie recipe, you would want chocolate cookies or chocolate chip cookies recipe, chocolate chip cookies, something like that to be the number one entity, the most salient entity, and you would want it to have a pretty high salient score.
You would want the tool to feel pretty confident, yes, this piece of content is about this topic. But what you can also see is the other entities it's extracting and to what degree they are also salient to the topic. So you can see things like if you have a chocolate chip cookie recipe, you would expect to see things like cookie, butter, sugar, 350, which is the temperature you heat your oven, all of the different things that come together to make a chocolate chip cookie recipe.
But I think that it's really, really important for us as SEOs to understand that salience is the future of related keywords. We're beyond the time when to optimize for chocolate chip cookie recipe, we would also be looking for things like chocolate recipe, chocolate chips, chocolate cookie recipe, things like that. Stems, variants, TF-IDF, these are all older methodologies for understanding what a piece of content is about.
Instead what we need to understand is what are the entities that Google, using its vast body of knowledge, using things like Freebase, using large portions of the internet, where is Google seeing these entities co-occur at such a rate that they feel reasonably confident that a piece of content on one entity in order to be salient to that entity would include these other entities?
Using an expert is the best way to create content that's salient to a topic
So chocolate chip cookie recipe, we're now also making sure we're adding things like butter, flour, sugar. This is actually really easy to do if you actually have a chocolate chip cookie recipe to put up there. This is I think what we're going to start seeing as a content trend in SEO is that the best way to create content that is salient to a topic is to have an actual expert in that topic create that content.
Somebody with deep knowledge of a topic is naturally going to include co-occurring terms, because they know how to create something that's about what it's supposed to be about. I think what we're going to start seeing is that people are going to have to start paying more for content marketing, frankly. Unfortunately, a lot of companies seem to think that content marketing is and should be cheap.
Content marketers, I feel you on that. It sucks, and it's no longer the case. We need to start investing in content and investing in experts to create that content so that they can create that deep, rich, salient content that everybody really needs. 
How can you use this API to improve your own SEO? 
One of the things that I like to do with this kind of information is look at — and this is something that I've done for years, just not in this context — but a prime optimization target in general is pages that rank for a topic, but they rank on page 2.
What this often means is that Google understands that that keyword is a topic of the page, but it doesn't necessarily understand that it is a good piece of content on that topic, that the page is actually solely about that content, that it's a good resource. In other words, the signal is there, but it's weak.
What you can do is take content that ranks but not well, run it through this natural language API or another natural language processing tool, and look at how the entities are extracted and how Google is determining that they're related to each other. Sometimes it might be that you need to do some disambiguation. So in this example, you'll notice that while chocolate cookies is called a work of art, and I agree, cookie here is actually called other.
This is because cookie means more than one thing. There's cookies, the baked good, but then there's also cookies, the packet of data. Both of those are legitimate uses of the word "cookie." Words have multiple meanings. If you notice that Google, that this natural language processing API is having trouble correctly classifying your entities, that's a good time to go in and do some disambiguation.
Make sure that the terms surrounding that term are clearly saying, "No, I mean the baked good, not the software piece of data." That's a really great way to kind of bump up your salience. Look at whether or not you have a strong salient score for your primary entity. You'd be amazed at how many pieces of content you can plug into this tool and the top, most salient entity is still only like a 0.01, a 0.14.
A lot of times the API is like "I think this is what it's about," but it's not sure. This is a great time to go in and bump up that content, make it more robust, and look at ways that you can make those entities easier to both extract and to relate to each other. This brings me to my second point, which is my new favorite thing in the world.
Writing for humans and writing for machines, you can now do both at the same time. You no longer have to, and you really haven't had to do this in a long time, but the idea that you might keyword stuff or otherwise create content for Google that your users might not see or care about is way, way, way over.
Now you can create content for Google that also is better for users, because the tenets of machine readability and human readability are moving closer and closer together. 
Tips for writing for human and machine readability:
What I've done here is I did some research not on natural language processing, but on writing for human readability, that is advice from writers, from writing experts on how to write better, clearer, easier to read, easier to understand content.Then I pulled out the pieces of advice that also work as pieces of advice for writing for natural language processing. So natural language processing, again, is the process by which Google or really anything that might be processing language tries to understand how entities are related to each other within a given body of content.
Short, simple sentences
Short, simple sentences. Write simply. Don't use a lot of flowery language. Short sentences and try to keep it to one idea per sentence. 
One idea per sentence
If you're running on, if you've got a lot of different clauses, if you're using a lot of pronouns and it's becoming confusing what you're talking about, that's not great for readers.
It also makes it harder for machines to parse your content. 
Connect questions to answers
Then closely connecting questions to answers. So don't say, "What is the best temperature to bake cookies? Well, let me tell you a story about my grandmother and my childhood," and 500 words later here's the answer. Connect questions to answers. 
What all three of those readability tips have in common is they boil down to reducing the semantic distance between entities.
If you want natural language processing to understand that two entities in your content are closely related, move them closer together in the sentence. Move the words closer together. Reduce the clutter, reduce the fluff, reduce the number of semantic hops that a robot might have to take between one entity and another to understand the relationship, and you've now created content that is more readable because it's shorter and easier to skim, but also easier for a robot to parse and understand.
Be specific first, then explain nuance
Going back to the example of "What is the best temperature to bake chocolate chip cookies at?" Now the real answer to what is the best temperature to bake chocolate cookies is it depends. Hello. Hi, I'm an SEO, and I just answered a question with it depends. It does depend.
That is true, and that is real, but it is not a good answer. It is also not the kind of thing that a robot could extract and reproduce in, for example, voice search or a featured snippet. If somebody says, "Okay, Google, what is a good temperature to bake cookies at?" and Google says, "It depends," that helps nobody even though it's true. So in order to write for both machine and human readability, be specific first and then you can explain nuance.
Then you can go into the details. So a better, just as correct answer to "What is the temperature to bake chocolate chip cookies?" is the best temperature to bake chocolate chip cookies is usually between 325 and 425 degrees, depending on your altitude and how crisp you like your cookie. That is just as true as it depends and, in fact, means the same thing as it depends, but it's a lot more specific.
It's a lot more precise. It uses real numbers. It provides a real answer. I've shortened the distance between the question and the answer. I didn't say it depends first. I said it depends at the end. That's the kind of thing that you can do to improve readability and understanding for both humans and machines.
Get to the point (don't bury the lede)
Get to the point. Don't bury the lead. All of you journalists who try to become content marketers, and then everybody in content marketing said, "Oh, you need to wait till the end to get to your point or they won't read the whole thing,"and you were like, "Don't bury the lead," you are correct. For those of you who aren't familiar with journalism speak, not burying the lead basically means get to the point upfront, at the top.
Include all the information that somebody would really need to get from that piece of content. If they don't read anything else, they read that one paragraph and they've gotten the gist. Then people who want to go deep can go deep. That's how people actually like to consume content, and surprisingly it doesn't mean they won't read the content. It just means they don't have to read it if they don't have time, if they need a quick answer.
The same is true with machines. Get to the point upfront. Make it clear right away what the primary entity, the primary topic, the primary focus of your content is and then get into the details. You'll have a much better structured piece of content that's easier to parse on all sides. 
Avoid jargon and "marketing speak"
Avoid jargon. Avoid marketing speak. Not only is it terrible and very hard to understand. You see this a lot. I'm going back again to the example of getting your clients to say what their products do. You work with a lot of B2B companies, you will you will often run into this. Yes, but what does it do? It provides solutions to streamline the workflow and blah, blah. Okay, what does it do? This is the kind of thing that can be really, really hard for companies to get out of their own heads about, but it's so important for users, for machines.
Avoid jargon. Avoid marketing speak. Not to get too tautological, but the more esoteric a word is, the less commonly it's used. That's actually what esoteric means. What that means is the less commonly a word is used, the less likely it is that Google is going to understand its semantic relationships to other entities.
Keep it simple. Be specific. Say what you mean. Wipe out all of the jargon. By wiping out jargon and kind of marketing speak and kind of the fluff that can happen in your content, you're also, once again, reducing the semantic distances between entities, making them easier to parse. 
Organize your information to match the user journey
Organize it and map it out to the user journey. Think about the information somebody might need and the order in which they might need it. 
Break out subtopics with headings
Then break it out with subheadings. This is like very, very basic writing advice, and yet you all aren't doing it. So if you're not going to do it for your users, do it for machines. 
Format lists with bullets or numbers
You can also really impact skimmability for users by breaking out lists with bullets or numbers.
The great thing about that is that breaking out a list with bullets or numbers also makes information easier for a robot to parse and extract. If a lot of these tips seem like they're the same tips that you would use to get featured snippets, they are, because featured snippets are actually a pretty good indicator that you're creating content that a robot can find, parse, understand, and extract, and that's what you want.
So if you're targeting featured snippets, you're probably already doing a lot of these things, good job. 
Grammar and spelling count!
The last thing, which I shouldn't have to say, but I'm going to say is that grammar and spelling and punctuation and things like that absolutely do count. They count to users. They don't count to all users, but they count to users. They also count to search engines.
Things like grammar, spelling, and punctuation are very, very easy signals for a machine to find and parse. Google has been specific in things, like the "Quality Rater Guidelines,"that a well-written, well-structured, well-spelled, grammatically correct document, that these are signs of authoritativeness. I'm not saying that having a greatly spelled document is going to mean that you immediately rocket to the top of the results.
I am saying that if you're not on that stuff, it's probably going to hurt you. So take the time to make sure everything is nice and tidy. You can use vernacular English. You don't have to be perfect "AP Style Guide" all the time. But make sure that you are formatting things properly from a grammatical standpoint as well as a technical standpoint. What I love about all of this, this is just good writing.
This is good writing. It's easy to understand. It's easy to parse. It's still so hard, especially in the marketing world, to get out of that world of jargon, to get to the point, to stop writing 2,000 words because we think we need 2,000 words, to really think about are we creating content that's about what we think it's about.
Use these tools to understand how readable, parsable, and understandable your content is
So my hope for the SEO world and for you is that you can use these tools not just to think about how to dial in the perfect keyword density or whatever to get an almost perfect score on the salience in the natural language processing API. What I'm hoping is that you will use these tools to help yourself understand how readable, how parsable, and how understandable your content is, how much your content is about what you say it's about and what you think it's about so you can create better stuff for users.
It makes the internet a better place, and it will probably make you some money as well. So these are my thoughts. I'd love to hear in the comments if you're using the natural language processing API now, if you've built a tool with it, if you want to build a tool with it, what do you think about this, how do you use this, how has it gone. Tell me all about it. Holla atcha girl.
Have a great Friday.
Video transcription by Speechpad.com
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
0 notes
theinjectlikes2 · 5 years ago
Text
Better Content Through NLP (Natural Language Processing) - Whiteboard Friday
Posted by RuthBurrReedy
Gone are the days of optimizing content solely for search engines. For modern SEO, your content needs to please both robots and humans. But how do you know that what you're writing can check the boxes for both man and machine?
In today's Whiteboard Friday, Ruth Burr Reedy focuses on part of her recent MozCon 2019 talk and teaches us all about how Google uses NLP (natural language processing) to truly understand content, plus how you can harness that knowledge to better optimize what you write for people and bots alike.
Click on the whiteboard image above to open a high resolution version in a new tab!
Video Transcription
Howdy, Moz fans. I'm Ruth Burr Reedy, and I am the Vice President of Strategy at UpBuild, a boutique technical marketing agency specializing in technical SEO and advanced web analytics. I recently spoke at MozCon on a basic framework for SEO and approaching changes to our industry that thinks about SEO in the light of we are humans who are marketing to humans, but we are using a machine as the intermediary.
Those videos will be available online at some point. [Editor's note: that point is now!] But today I wanted to talk about one point from my talk that I found really interesting and that has kind of changed the way that I approach content creation, and that is the idea that writing content that is easier for Google, a robot, to understand can actually make you a better writer and help you write better content for humans. It is a win-win. 
The relationships between entities, words, and how people search
To understand how Google is currently approaching parsing content and understanding what content is about, Google is spending a lot of time and a lot of energy and a lot of money on things like neural matching and natural language processing, which seek to understand basically when people talk, what are they talking about?
This goes along with the evolution of search to be more conversational. But there are a lot of times when someone is searching, but they don't totally know what they want, and Google still wants them to get what they want because that's how Google makes money. They are spending a lot of time trying to understand the relationships between entities and between words and how people use words to search.
The example that Danny Sullivan gave online, that I think is a really great example, is if someone is experiencing the soap opera effect on their TV. If you've ever seen a soap opera, you've noticed that they look kind of weird. Someone might be experiencing that, and not knowing what that's called they can't Google soap opera effect because they don't know about it.
They might search something like, "Why does my TV look funny?" Neural matching helps Google understand that when somebody is searching "Why does my TV look funny?" one possible answer might be the soap opera effect. So they can serve up that result, and people are happy. 
Understanding salience
As we're thinking about natural language processing, a core component of natural language processing is understanding salience.
Salience, content, and entities
Salience is a one-word way to sum up to what extent is this piece of content about this specific entity? At this point Google is really good at extracting entities from a piece of content. Entities are basically nouns, people, places, things, proper nouns, regular nouns.
Entities are things, people, etc., numbers, things like that. Google is really good at taking those out and saying, "Okay, here are all of the entities that are contained within this piece of content." Salience attempts to understand how they're related to each other, because what Google is really trying to understand when they're crawling a page is: What is this page about, and is this a good example of a page about this topic?
Salience really goes into the second piece. To what extent is any given entity be the topic of a piece of content? It's often amazing the degree to which a piece of content that a person has created is not actually about anything. I think we've all experienced that.
You're searching and you come to a page and you're like, "This was too vague. This was too broad. This said that it was about one thing, but it was actually about something else. I didn't find what I needed. This wasn't good information for me." As marketers, we're often on the other side of that, trying to get our clients to say what their product actually does on their website or say, "I know you think that you created a guide to Instagram for the holidays. But you actually wrote one paragraph about the holidays and then seven paragraphs about your new Instagram tool. This is not actually a blog post about Instagram for the holidays. It's a piece of content about your tool." These are the kinds of battles that we fight as marketers. 
Natural Language Processing (NLP) APIs
Fortunately, there are now a number of different APIs that you can use to understand natural language processing: 
IBM has one: https://www.ibm.com/watson/services/natural-language-understanding/ 
Google actually has a natural language processing API that's right here on https://cloud.google.com/natural-language/
Is it as sophisticated as what they're using on their own stuff? Probably not. But you can test it out. Put in a piece of content and see (a) what entities Google is able to extract from it, and (b) how salient Google feels each of these entities is to the piece of content as a whole. Again, to what degree is this piece of content about this thing?
So this natural language processing API, which you can try for free and it's actually not that expensive for an API if you want to build a tool with it, will assign each entity that it can extract a salient score between 0 and 1, saying, "Okay, how sure are we that this piece of content is about this thing versus just containing it?"
So the higher or the closer you get to 1, the more confident the tool is that this piece of content is about this thing. 0.9 would be really, really good. 0.01 means it's there, but they're not sure how well it's related. 
A delicious example of how salience and entities work
The example I have here, and this is not taken from a real piece of content — these numbers are made up, it's just an example — is if you had a chocolate chip cookie recipe, you would want chocolate cookies or chocolate chip cookies recipe, chocolate chip cookies, something like that to be the number one entity, the most salient entity, and you would want it to have a pretty high salient score.
You would want the tool to feel pretty confident, yes, this piece of content is about this topic. But what you can also see is the other entities it's extracting and to what degree they are also salient to the topic. So you can see things like if you have a chocolate chip cookie recipe, you would expect to see things like cookie, butter, sugar, 350, which is the temperature you heat your oven, all of the different things that come together to make a chocolate chip cookie recipe.
But I think that it's really, really important for us as SEOs to understand that salience is the future of related keywords. We're beyond the time when to optimize for chocolate chip cookie recipe, we would also be looking for things like chocolate recipe, chocolate chips, chocolate cookie recipe, things like that. Stems, variants, TF-IDF, these are all older methodologies for understanding what a piece of content is about.
Instead what we need to understand is what are the entities that Google, using its vast body of knowledge, using things like Freebase, using large portions of the internet, where is Google seeing these entities co-occur at such a rate that they feel reasonably confident that a piece of content on one entity in order to be salient to that entity would include these other entities?
Using an expert is the best way to create content that's salient to a topic
So chocolate chip cookie recipe, we're now also making sure we're adding things like butter, flour, sugar. This is actually really easy to do if you actually have a chocolate chip cookie recipe to put up there. This is I think what we're going to start seeing as a content trend in SEO is that the best way to create content that is salient to a topic is to have an actual expert in that topic create that content.
Somebody with deep knowledge of a topic is naturally going to include co-occurring terms, because they know how to create something that's about what it's supposed to be about. I think what we're going to start seeing is that people are going to have to start paying more for content marketing, frankly. Unfortunately, a lot of companies seem to think that content marketing is and should be cheap.
Content marketers, I feel you on that. It sucks, and it's no longer the case. We need to start investing in content and investing in experts to create that content so that they can create that deep, rich, salient content that everybody really needs. 
How can you use this API to improve your own SEO? 
One of the things that I like to do with this kind of information is look at — and this is something that I've done for years, just not in this context — but a prime optimization target in general is pages that rank for a topic, but they rank on page 2.
What this often means is that Google understands that that keyword is a topic of the page, but it doesn't necessarily understand that it is a good piece of content on that topic, that the page is actually solely about that content, that it's a good resource. In other words, the signal is there, but it's weak.
What you can do is take content that ranks but not well, run it through this natural language API or another natural language processing tool, and look at how the entities are extracted and how Google is determining that they're related to each other. Sometimes it might be that you need to do some disambiguation. So in this example, you'll notice that while chocolate cookies is called a work of art, and I agree, cookie here is actually called other.
This is because cookie means more than one thing. There's cookies, the baked good, but then there's also cookies, the packet of data. Both of those are legitimate uses of the word "cookie." Words have multiple meanings. If you notice that Google, that this natural language processing API is having trouble correctly classifying your entities, that's a good time to go in and do some disambiguation.
Make sure that the terms surrounding that term are clearly saying, "No, I mean the baked good, not the software piece of data." That's a really great way to kind of bump up your salience. Look at whether or not you have a strong salient score for your primary entity. You'd be amazed at how many pieces of content you can plug into this tool and the top, most salient entity is still only like a 0.01, a 0.14.
A lot of times the API is like "I think this is what it's about," but it's not sure. This is a great time to go in and bump up that content, make it more robust, and look at ways that you can make those entities easier to both extract and to relate to each other. This brings me to my second point, which is my new favorite thing in the world.
Writing for humans and writing for machines, you can now do both at the same time. You no longer have to, and you really haven't had to do this in a long time, but the idea that you might keyword stuff or otherwise create content for Google that your users might not see or care about is way, way, way over.
Now you can create content for Google that also is better for users, because the tenets of machine readability and human readability are moving closer and closer together. 
Tips for writing for human and machine readability:
What I've done here is I did some research not on natural language processing, but on writing for human readability, that is advice from writers, from writing experts on how to write better, clearer, easier to read, easier to understand content.Then I pulled out the pieces of advice that also work as pieces of advice for writing for natural language processing. So natural language processing, again, is the process by which Google or really anything that might be processing language tries to understand how entities are related to each other within a given body of content.
Short, simple sentences
Short, simple sentences. Write simply. Don't use a lot of flowery language. Short sentences and try to keep it to one idea per sentence. 
One idea per sentence
If you're running on, if you've got a lot of different clauses, if you're using a lot of pronouns and it's becoming confusing what you're talking about, that's not great for readers.
It also makes it harder for machines to parse your content. 
Connect questions to answers
Then closely connecting questions to answers. So don't say, "What is the best temperature to bake cookies? Well, let me tell you a story about my grandmother and my childhood," and 500 words later here's the answer. Connect questions to answers. 
What all three of those readability tips have in common is they boil down to reducing the semantic distance between entities.
If you want natural language processing to understand that two entities in your content are closely related, move them closer together in the sentence. Move the words closer together. Reduce the clutter, reduce the fluff, reduce the number of semantic hops that a robot might have to take between one entity and another to understand the relationship, and you've now created content that is more readable because it's shorter and easier to skim, but also easier for a robot to parse and understand.
Be specific first, then explain nuance
Going back to the example of "What is the best temperature to bake chocolate chip cookies at?" Now the real answer to what is the best temperature to bake chocolate cookies is it depends. Hello. Hi, I'm an SEO, and I just answered a question with it depends. It does depend.
That is true, and that is real, but it is not a good answer. It is also not the kind of thing that a robot could extract and reproduce in, for example, voice search or a featured snippet. If somebody says, "Okay, Google, what is a good temperature to bake cookies at?" and Google says, "It depends," that helps nobody even though it's true. So in order to write for both machine and human readability, be specific first and then you can explain nuance.
Then you can go into the details. So a better, just as correct answer to "What is the temperature to bake chocolate chip cookies?" is the best temperature to bake chocolate chip cookies is usually between 325 and 425 degrees, depending on your altitude and how crisp you like your cookie. That is just as true as it depends and, in fact, means the same thing as it depends, but it's a lot more specific.
It's a lot more precise. It uses real numbers. It provides a real answer. I've shortened the distance between the question and the answer. I didn't say it depends first. I said it depends at the end. That's the kind of thing that you can do to improve readability and understanding for both humans and machines.
Get to the point (don't bury the lede)
Get to the point. Don't bury the lead. All of you journalists who try to become content marketers, and then everybody in content marketing said, "Oh, you need to wait till the end to get to your point or they won't read the whole thing,"and you were like, "Don't bury the lead," you are correct. For those of you who aren't familiar with journalism speak, not burying the lead basically means get to the point upfront, at the top.
Include all the information that somebody would really need to get from that piece of content. If they don't read anything else, they read that one paragraph and they've gotten the gist. Then people who want to go deep can go deep. That's how people actually like to consume content, and surprisingly it doesn't mean they won't read the content. It just means they don't have to read it if they don't have time, if they need a quick answer.
The same is true with machines. Get to the point upfront. Make it clear right away what the primary entity, the primary topic, the primary focus of your content is and then get into the details. You'll have a much better structured piece of content that's easier to parse on all sides. 
Avoid jargon and "marketing speak"
Avoid jargon. Avoid marketing speak. Not only is it terrible and very hard to understand. You see this a lot. I'm going back again to the example of getting your clients to say what their products do. You work with a lot of B2B companies, you will you will often run into this. Yes, but what does it do? It provides solutions to streamline the workflow and blah, blah. Okay, what does it do? This is the kind of thing that can be really, really hard for companies to get out of their own heads about, but it's so important for users, for machines.
Avoid jargon. Avoid marketing speak. Not to get too tautological, but the more esoteric a word is, the less commonly it's used. That's actually what esoteric means. What that means is the less commonly a word is used, the less likely it is that Google is going to understand its semantic relationships to other entities.
Keep it simple. Be specific. Say what you mean. Wipe out all of the jargon. By wiping out jargon and kind of marketing speak and kind of the fluff that can happen in your content, you're also, once again, reducing the semantic distances between entities, making them easier to parse. 
Organize your information to match the user journey
Organize it and map it out to the user journey. Think about the information somebody might need and the order in which they might need it. 
Break out subtopics with headings
Then break it out with subheadings. This is like very, very basic writing advice, and yet you all aren't doing it. So if you're not going to do it for your users, do it for machines. 
Format lists with bullets or numbers
You can also really impact skimmability for users by breaking out lists with bullets or numbers.
The great thing about that is that breaking out a list with bullets or numbers also makes information easier for a robot to parse and extract. If a lot of these tips seem like they're the same tips that you would use to get featured snippets, they are, because featured snippets are actually a pretty good indicator that you're creating content that a robot can find, parse, understand, and extract, and that's what you want.
So if you're targeting featured snippets, you're probably already doing a lot of these things, good job. 
Grammar and spelling count!
The last thing, which I shouldn't have to say, but I'm going to say is that grammar and spelling and punctuation and things like that absolutely do count. They count to users. They don't count to all users, but they count to users. They also count to search engines.
Things like grammar, spelling, and punctuation are very, very easy signals for a machine to find and parse. Google has been specific in things, like the "Quality Rater Guidelines,"that a well-written, well-structured, well-spelled, grammatically correct document, that these are signs of authoritativeness. I'm not saying that having a greatly spelled document is going to mean that you immediately rocket to the top of the results.
I am saying that if you're not on that stuff, it's probably going to hurt you. So take the time to make sure everything is nice and tidy. You can use vernacular English. You don't have to be perfect "AP Style Guide" all the time. But make sure that you are formatting things properly from a grammatical standpoint as well as a technical standpoint. What I love about all of this, this is just good writing.
This is good writing. It's easy to understand. It's easy to parse. It's still so hard, especially in the marketing world, to get out of that world of jargon, to get to the point, to stop writing 2,000 words because we think we need 2,000 words, to really think about are we creating content that's about what we think it's about.
Use these tools to understand how readable, parsable, and understandable your content is
So my hope for the SEO world and for you is that you can use these tools not just to think about how to dial in the perfect keyword density or whatever to get an almost perfect score on the salience in the natural language processing API. What I'm hoping is that you will use these tools to help yourself understand how readable, how parsable, and how understandable your content is, how much your content is about what you say it's about and what you think it's about so you can create better stuff for users.
It makes the internet a better place, and it will probably make you some money as well. So these are my thoughts. I'd love to hear in the comments if you're using the natural language processing API now, if you've built a tool with it, if you want to build a tool with it, what do you think about this, how do you use this, how has it gone. Tell me all about it. Holla atcha girl.
Have a great Friday.
Video transcription by Speechpad.com
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
from The Moz Blog https://ift.tt/2XAOsqm via IFTTT
0 notes
gamebazu · 5 years ago
Text
Better Content Through NLP (Natural Language Processing) - Whiteboard Friday
Posted by RuthBurrReedy
Gone are the days of optimizing content solely for search engines. For modern SEO, your content needs to please both robots and humans. But how do you know that what you're writing can check the boxes for both man and machine?
In today's Whiteboard Friday, Ruth Burr Reedy focuses on part of her recent MozCon 2019 talk and teaches us all about how Google uses NLP (natural language processing) to truly understand content, plus how you can harness that knowledge to better optimize what you write for people and bots alike.
Click on the whiteboard image above to open a high resolution version in a new tab!
Video Transcription
Howdy, Moz fans. I'm Ruth Burr Reedy, and I am the Vice President of Strategy at UpBuild, a boutique technical marketing agency specializing in technical SEO and advanced web analytics. I recently spoke at MozCon on a basic framework for SEO and approaching changes to our industry that thinks about SEO in the light of we are humans who are marketing to humans, but we are using a machine as the intermediary.
Those videos will be available online at some point. [Editor's note: that point is now!] But today I wanted to talk about one point from my talk that I found really interesting and that has kind of changed the way that I approach content creation, and that is the idea that writing content that is easier for Google, a robot, to understand can actually make you a better writer and help you write better content for humans. It is a win-win. 
The relationships between entities, words, and how people search
To understand how Google is currently approaching parsing content and understanding what content is about, Google is spending a lot of time and a lot of energy and a lot of money on things like neural matching and natural language processing, which seek to understand basically when people talk, what are they talking about?
This goes along with the evolution of search to be more conversational. But there are a lot of times when someone is searching, but they don't totally know what they want, and Google still wants them to get what they want because that's how Google makes money. They are spending a lot of time trying to understand the relationships between entities and between words and how people use words to search.
The example that Danny Sullivan gave online, that I think is a really great example, is if someone is experiencing the soap opera effect on their TV. If you've ever seen a soap opera, you've noticed that they look kind of weird. Someone might be experiencing that, and not knowing what that's called they can't Google soap opera effect because they don't know about it.
They might search something like, "Why does my TV look funny?" Neural matching helps Google understand that when somebody is searching "Why does my TV look funny?" one possible answer might be the soap opera effect. So they can serve up that result, and people are happy. 
Understanding salience
As we're thinking about natural language processing, a core component of natural language processing is understanding salience.
Salience, content, and entities
Salience is a one-word way to sum up to what extent is this piece of content about this specific entity? At this point Google is really good at extracting entities from a piece of content. Entities are basically nouns, people, places, things, proper nouns, regular nouns.
Entities are things, people, etc., numbers, things like that. Google is really good at taking those out and saying, "Okay, here are all of the entities that are contained within this piece of content." Salience attempts to understand how they're related to each other, because what Google is really trying to understand when they're crawling a page is: What is this page about, and is this a good example of a page about this topic?
Salience really goes into the second piece. To what extent is any given entity be the topic of a piece of content? It's often amazing the degree to which a piece of content that a person has created is not actually about anything. I think we've all experienced that.
You're searching and you come to a page and you're like, "This was too vague. This was too broad. This said that it was about one thing, but it was actually about something else. I didn't find what I needed. This wasn't good information for me." As marketers, we're often on the other side of that, trying to get our clients to say what their product actually does on their website or say, "I know you think that you created a guide to Instagram for the holidays. But you actually wrote one paragraph about the holidays and then seven paragraphs about your new Instagram tool. This is not actually a blog post about Instagram for the holidays. It's a piece of content about your tool." These are the kinds of battles that we fight as marketers. 
Natural Language Processing (NLP) APIs
Fortunately, there are now a number of different APIs that you can use to understand natural language processing: 
IBM has one: https://www.ibm.com/watson/services/natural-language-understanding/ 
Google actually has a natural language processing API that's right here on https://cloud.google.com/natural-language/
Is it as sophisticated as what they're using on their own stuff? Probably not. But you can test it out. Put in a piece of content and see (a) what entities Google is able to extract from it, and (b) how salient Google feels each of these entities is to the piece of content as a whole. Again, to what degree is this piece of content about this thing?
So this natural language processing API, which you can try for free and it's actually not that expensive for an API if you want to build a tool with it, will assign each entity that it can extract a salient score between 0 and 1, saying, "Okay, how sure are we that this piece of content is about this thing versus just containing it?"
So the higher or the closer you get to 1, the more confident the tool is that this piece of content is about this thing. 0.9 would be really, really good. 0.01 means it's there, but they're not sure how well it's related. 
A delicious example of how salience and entities work
The example I have here, and this is not taken from a real piece of content — these numbers are made up, it's just an example — is if you had a chocolate chip cookie recipe, you would want chocolate cookies or chocolate chip cookies recipe, chocolate chip cookies, something like that to be the number one entity, the most salient entity, and you would want it to have a pretty high salient score.
You would want the tool to feel pretty confident, yes, this piece of content is about this topic. But what you can also see is the other entities it's extracting and to what degree they are also salient to the topic. So you can see things like if you have a chocolate chip cookie recipe, you would expect to see things like cookie, butter, sugar, 350, which is the temperature you heat your oven, all of the different things that come together to make a chocolate chip cookie recipe.
But I think that it's really, really important for us as SEOs to understand that salience is the future of related keywords. We're beyond the time when to optimize for chocolate chip cookie recipe, we would also be looking for things like chocolate recipe, chocolate chips, chocolate cookie recipe, things like that. Stems, variants, TF-IDF, these are all older methodologies for understanding what a piece of content is about.
Instead what we need to understand is what are the entities that Google, using its vast body of knowledge, using things like Freebase, using large portions of the internet, where is Google seeing these entities co-occur at such a rate that they feel reasonably confident that a piece of content on one entity in order to be salient to that entity would include these other entities?
Using an expert is the best way to create content that's salient to a topic
So chocolate chip cookie recipe, we're now also making sure we're adding things like butter, flour, sugar. This is actually really easy to do if you actually have a chocolate chip cookie recipe to put up there. This is I think what we're going to start seeing as a content trend in SEO is that the best way to create content that is salient to a topic is to have an actual expert in that topic create that content.
Somebody with deep knowledge of a topic is naturally going to include co-occurring terms, because they know how to create something that's about what it's supposed to be about. I think what we're going to start seeing is that people are going to have to start paying more for content marketing, frankly. Unfortunately, a lot of companies seem to think that content marketing is and should be cheap.
Content marketers, I feel you on that. It sucks, and it's no longer the case. We need to start investing in content and investing in experts to create that content so that they can create that deep, rich, salient content that everybody really needs. 
How can you use this API to improve your own SEO? 
One of the things that I like to do with this kind of information is look at — and this is something that I've done for years, just not in this context — but a prime optimization target in general is pages that rank for a topic, but they rank on page 2.
What this often means is that Google understands that that keyword is a topic of the page, but it doesn't necessarily understand that it is a good piece of content on that topic, that the page is actually solely about that content, that it's a good resource. In other words, the signal is there, but it's weak.
What you can do is take content that ranks but not well, run it through this natural language API or another natural language processing tool, and look at how the entities are extracted and how Google is determining that they're related to each other. Sometimes it might be that you need to do some disambiguation. So in this example, you'll notice that while chocolate cookies is called a work of art, and I agree, cookie here is actually called other.
This is because cookie means more than one thing. There's cookies, the baked good, but then there's also cookies, the packet of data. Both of those are legitimate uses of the word "cookie." Words have multiple meanings. If you notice that Google, that this natural language processing API is having trouble correctly classifying your entities, that's a good time to go in and do some disambiguation.
Make sure that the terms surrounding that term are clearly saying, "No, I mean the baked good, not the software piece of data." That's a really great way to kind of bump up your salience. Look at whether or not you have a strong salient score for your primary entity. You'd be amazed at how many pieces of content you can plug into this tool and the top, most salient entity is still only like a 0.01, a 0.14.
A lot of times the API is like "I think this is what it's about," but it's not sure. This is a great time to go in and bump up that content, make it more robust, and look at ways that you can make those entities easier to both extract and to relate to each other. This brings me to my second point, which is my new favorite thing in the world.
Writing for humans and writing for machines, you can now do both at the same time. You no longer have to, and you really haven't had to do this in a long time, but the idea that you might keyword stuff or otherwise create content for Google that your users might not see or care about is way, way, way over.
Now you can create content for Google that also is better for users, because the tenets of machine readability and human readability are moving closer and closer together. 
Tips for writing for human and machine readability:
What I've done here is I did some research not on natural language processing, but on writing for human readability, that is advice from writers, from writing experts on how to write better, clearer, easier to read, easier to understand content.Then I pulled out the pieces of advice that also work as pieces of advice for writing for natural language processing. So natural language processing, again, is the process by which Google or really anything that might be processing language tries to understand how entities are related to each other within a given body of content.
Short, simple sentences
Short, simple sentences. Write simply. Don't use a lot of flowery language. Short sentences and try to keep it to one idea per sentence. 
One idea per sentence
If you're running on, if you've got a lot of different clauses, if you're using a lot of pronouns and it's becoming confusing what you're talking about, that's not great for readers.
It also makes it harder for machines to parse your content. 
Connect questions to answers
Then closely connecting questions to answers. So don't say, "What is the best temperature to bake cookies? Well, let me tell you a story about my grandmother and my childhood," and 500 words later here's the answer. Connect questions to answers. 
What all three of those readability tips have in common is they boil down to reducing the semantic distance between entities.
If you want natural language processing to understand that two entities in your content are closely related, move them closer together in the sentence. Move the words closer together. Reduce the clutter, reduce the fluff, reduce the number of semantic hops that a robot might have to take between one entity and another to understand the relationship, and you've now created content that is more readable because it's shorter and easier to skim, but also easier for a robot to parse and understand.
Be specific first, then explain nuance
Going back to the example of "What is the best temperature to bake chocolate chip cookies at?" Now the real answer to what is the best temperature to bake chocolate cookies is it depends. Hello. Hi, I'm an SEO, and I just answered a question with it depends. It does depend.
That is true, and that is real, but it is not a good answer. It is also not the kind of thing that a robot could extract and reproduce in, for example, voice search or a featured snippet. If somebody says, "Okay, Google, what is a good temperature to bake cookies at?" and Google says, "It depends," that helps nobody even though it's true. So in order to write for both machine and human readability, be specific first and then you can explain nuance.
Then you can go into the details. So a better, just as correct answer to "What is the temperature to bake chocolate chip cookies?" is the best temperature to bake chocolate chip cookies is usually between 325 and 425 degrees, depending on your altitude and how crisp you like your cookie. That is just as true as it depends and, in fact, means the same thing as it depends, but it's a lot more specific.
It's a lot more precise. It uses real numbers. It provides a real answer. I've shortened the distance between the question and the answer. I didn't say it depends first. I said it depends at the end. That's the kind of thing that you can do to improve readability and understanding for both humans and machines.
Get to the point (don't bury the lede)
Get to the point. Don't bury the lead. All of you journalists who try to become content marketers, and then everybody in content marketing said, "Oh, you need to wait till the end to get to your point or they won't read the whole thing,"and you were like, "Don't bury the lead," you are correct. For those of you who aren't familiar with journalism speak, not burying the lead basically means get to the point upfront, at the top.
Include all the information that somebody would really need to get from that piece of content. If they don't read anything else, they read that one paragraph and they've gotten the gist. Then people who want to go deep can go deep. That's how people actually like to consume content, and surprisingly it doesn't mean they won't read the content. It just means they don't have to read it if they don't have time, if they need a quick answer.
The same is true with machines. Get to the point upfront. Make it clear right away what the primary entity, the primary topic, the primary focus of your content is and then get into the details. You'll have a much better structured piece of content that's easier to parse on all sides. 
Avoid jargon and "marketing speak"
Avoid jargon. Avoid marketing speak. Not only is it terrible and very hard to understand. You see this a lot. I'm going back again to the example of getting your clients to say what their products do. You work with a lot of B2B companies, you will you will often run into this. Yes, but what does it do? It provides solutions to streamline the workflow and blah, blah. Okay, what does it do? This is the kind of thing that can be really, really hard for companies to get out of their own heads about, but it's so important for users, for machines.
Avoid jargon. Avoid marketing speak. Not to get too tautological, but the more esoteric a word is, the less commonly it's used. That's actually what esoteric means. What that means is the less commonly a word is used, the less likely it is that Google is going to understand its semantic relationships to other entities.
Keep it simple. Be specific. Say what you mean. Wipe out all of the jargon. By wiping out jargon and kind of marketing speak and kind of the fluff that can happen in your content, you're also, once again, reducing the semantic distances between entities, making them easier to parse. 
Organize your information to match the user journey
Organize it and map it out to the user journey. Think about the information somebody might need and the order in which they might need it. 
Break out subtopics with headings
Then break it out with subheadings. This is like very, very basic writing advice, and yet you all aren't doing it. So if you're not going to do it for your users, do it for machines. 
Format lists with bullets or numbers
You can also really impact skimmability for users by breaking out lists with bullets or numbers.
The great thing about that is that breaking out a list with bullets or numbers also makes information easier for a robot to parse and extract. If a lot of these tips seem like they're the same tips that you would use to get featured snippets, they are, because featured snippets are actually a pretty good indicator that you're creating content that a robot can find, parse, understand, and extract, and that's what you want.
So if you're targeting featured snippets, you're probably already doing a lot of these things, good job. 
Grammar and spelling count!
The last thing, which I shouldn't have to say, but I'm going to say is that grammar and spelling and punctuation and things like that absolutely do count. They count to users. They don't count to all users, but they count to users. They also count to search engines.
Things like grammar, spelling, and punctuation are very, very easy signals for a machine to find and parse. Google has been specific in things, like the "Quality Rater Guidelines,"that a well-written, well-structured, well-spelled, grammatically correct document, that these are signs of authoritativeness. I'm not saying that having a greatly spelled document is going to mean that you immediately rocket to the top of the results.
I am saying that if you're not on that stuff, it's probably going to hurt you. So take the time to make sure everything is nice and tidy. You can use vernacular English. You don't have to be perfect "AP Style Guide" all the time. But make sure that you are formatting things properly from a grammatical standpoint as well as a technical standpoint. What I love about all of this, this is just good writing.
This is good writing. It's easy to understand. It's easy to parse. It's still so hard, especially in the marketing world, to get out of that world of jargon, to get to the point, to stop writing 2,000 words because we think we need 2,000 words, to really think about are we creating content that's about what we think it's about.
Use these tools to understand how readable, parsable, and understandable your content is
So my hope for the SEO world and for you is that you can use these tools not just to think about how to dial in the perfect keyword density or whatever to get an almost perfect score on the salience in the natural language processing API. What I'm hoping is that you will use these tools to help yourself understand how readable, how parsable, and how understandable your content is, how much your content is about what you say it's about and what you think it's about so you can create better stuff for users.
It makes the internet a better place, and it will probably make you some money as well. So these are my thoughts. I'd love to hear in the comments if you're using the natural language processing API now, if you've built a tool with it, if you want to build a tool with it, what do you think about this, how do you use this, how has it gone. Tell me all about it. Holla atcha girl.
Have a great Friday.
Video transcription by Speechpad.com
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
0 notes
epackingvietnam · 5 years ago
Text
Better Content Through NLP (Natural Language Processing) - Whiteboard Friday
Posted by RuthBurrReedy
Gone are the days of optimizing content solely for search engines. For modern SEO, your content needs to please both robots and humans. But how do you know that what you're writing can check the boxes for both man and machine?
In today's Whiteboard Friday, Ruth Burr Reedy focuses on part of her recent MozCon 2019 talk and teaches us all about how Google uses NLP (natural language processing) to truly understand content, plus how you can harness that knowledge to better optimize what you write for people and bots alike.
Click on the whiteboard image above to open a high resolution version in a new tab!
Video Transcription
Howdy, Moz fans. I'm Ruth Burr Reedy, and I am the Vice President of Strategy at UpBuild, a boutique technical marketing agency specializing in technical SEO and advanced web analytics. I recently spoke at MozCon on a basic framework for SEO and approaching changes to our industry that thinks about SEO in the light of we are humans who are marketing to humans, but we are using a machine as the intermediary.
Those videos will be available online at some point. [Editor's note: that point is now!] But today I wanted to talk about one point from my talk that I found really interesting and that has kind of changed the way that I approach content creation, and that is the idea that writing content that is easier for Google, a robot, to understand can actually make you a better writer and help you write better content for humans. It is a win-win. 
The relationships between entities, words, and how people search
To understand how Google is currently approaching parsing content and understanding what content is about, Google is spending a lot of time and a lot of energy and a lot of money on things like neural matching and natural language processing, which seek to understand basically when people talk, what are they talking about?
This goes along with the evolution of search to be more conversational. But there are a lot of times when someone is searching, but they don't totally know what they want, and Google still wants them to get what they want because that's how Google makes money. They are spending a lot of time trying to understand the relationships between entities and between words and how people use words to search.
The example that Danny Sullivan gave online, that I think is a really great example, is if someone is experiencing the soap opera effect on their TV. If you've ever seen a soap opera, you've noticed that they look kind of weird. Someone might be experiencing that, and not knowing what that's called they can't Google soap opera effect because they don't know about it.
They might search something like, "Why does my TV look funny?" Neural matching helps Google understand that when somebody is searching "Why does my TV look funny?" one possible answer might be the soap opera effect. So they can serve up that result, and people are happy. 
Understanding salience
As we're thinking about natural language processing, a core component of natural language processing is understanding salience.
Salience, content, and entities
Salience is a one-word way to sum up to what extent is this piece of content about this specific entity? At this point Google is really good at extracting entities from a piece of content. Entities are basically nouns, people, places, things, proper nouns, regular nouns.
Entities are things, people, etc., numbers, things like that. Google is really good at taking those out and saying, "Okay, here are all of the entities that are contained within this piece of content." Salience attempts to understand how they're related to each other, because what Google is really trying to understand when they're crawling a page is: What is this page about, and is this a good example of a page about this topic?
Salience really goes into the second piece. To what extent is any given entity be the topic of a piece of content? It's often amazing the degree to which a piece of content that a person has created is not actually about anything. I think we've all experienced that.
You're searching and you come to a page and you're like, "This was too vague. This was too broad. This said that it was about one thing, but it was actually about something else. I didn't find what I needed. This wasn't good information for me." As marketers, we're often on the other side of that, trying to get our clients to say what their product actually does on their website or say, "I know you think that you created a guide to Instagram for the holidays. But you actually wrote one paragraph about the holidays and then seven paragraphs about your new Instagram tool. This is not actually a blog post about Instagram for the holidays. It's a piece of content about your tool." These are the kinds of battles that we fight as marketers. 
Natural Language Processing (NLP) APIs
Fortunately, there are now a number of different APIs that you can use to understand natural language processing: 
IBM has one: https://www.ibm.com/watson/services/natural-language-understanding/ 
Google actually has a natural language processing API that's right here on https://cloud.google.com/natural-language/
Is it as sophisticated as what they're using on their own stuff? Probably not. But you can test it out. Put in a piece of content and see (a) what entities Google is able to extract from it, and (b) how salient Google feels each of these entities is to the piece of content as a whole. Again, to what degree is this piece of content about this thing?
So this natural language processing API, which you can try for free and it's actually not that expensive for an API if you want to build a tool with it, will assign each entity that it can extract a salient score between 0 and 1, saying, "Okay, how sure are we that this piece of content is about this thing versus just containing it?"
So the higher or the closer you get to 1, the more confident the tool is that this piece of content is about this thing. 0.9 would be really, really good. 0.01 means it's there, but they're not sure how well it's related. 
A delicious example of how salience and entities work
The example I have here, and this is not taken from a real piece of content — these numbers are made up, it's just an example — is if you had a chocolate chip cookie recipe, you would want chocolate cookies or chocolate chip cookies recipe, chocolate chip cookies, something like that to be the number one entity, the most salient entity, and you would want it to have a pretty high salient score.
You would want the tool to feel pretty confident, yes, this piece of content is about this topic. But what you can also see is the other entities it's extracting and to what degree they are also salient to the topic. So you can see things like if you have a chocolate chip cookie recipe, you would expect to see things like cookie, butter, sugar, 350, which is the temperature you heat your oven, all of the different things that come together to make a chocolate chip cookie recipe.
But I think that it's really, really important for us as SEOs to understand that salience is the future of related keywords. We're beyond the time when to optimize for chocolate chip cookie recipe, we would also be looking for things like chocolate recipe, chocolate chips, chocolate cookie recipe, things like that. Stems, variants, TF-IDF, these are all older methodologies for understanding what a piece of content is about.
Instead what we need to understand is what are the entities that Google, using its vast body of knowledge, using things like Freebase, using large portions of the internet, where is Google seeing these entities co-occur at such a rate that they feel reasonably confident that a piece of content on one entity in order to be salient to that entity would include these other entities?
Using an expert is the best way to create content that's salient to a topic
So chocolate chip cookie recipe, we're now also making sure we're adding things like butter, flour, sugar. This is actually really easy to do if you actually have a chocolate chip cookie recipe to put up there. This is I think what we're going to start seeing as a content trend in SEO is that the best way to create content that is salient to a topic is to have an actual expert in that topic create that content.
Somebody with deep knowledge of a topic is naturally going to include co-occurring terms, because they know how to create something that's about what it's supposed to be about. I think what we're going to start seeing is that people are going to have to start paying more for content marketing, frankly. Unfortunately, a lot of companies seem to think that content marketing is and should be cheap.
Content marketers, I feel you on that. It sucks, and it's no longer the case. We need to start investing in content and investing in experts to create that content so that they can create that deep, rich, salient content that everybody really needs. 
How can you use this API to improve your own SEO? 
One of the things that I like to do with this kind of information is look at — and this is something that I've done for years, just not in this context — but a prime optimization target in general is pages that rank for a topic, but they rank on page 2.
What this often means is that Google understands that that keyword is a topic of the page, but it doesn't necessarily understand that it is a good piece of content on that topic, that the page is actually solely about that content, that it's a good resource. In other words, the signal is there, but it's weak.
What you can do is take content that ranks but not well, run it through this natural language API or another natural language processing tool, and look at how the entities are extracted and how Google is determining that they're related to each other. Sometimes it might be that you need to do some disambiguation. So in this example, you'll notice that while chocolate cookies is called a work of art, and I agree, cookie here is actually called other.
This is because cookie means more than one thing. There's cookies, the baked good, but then there's also cookies, the packet of data. Both of those are legitimate uses of the word "cookie." Words have multiple meanings. If you notice that Google, that this natural language processing API is having trouble correctly classifying your entities, that's a good time to go in and do some disambiguation.
Make sure that the terms surrounding that term are clearly saying, "No, I mean the baked good, not the software piece of data." That's a really great way to kind of bump up your salience. Look at whether or not you have a strong salient score for your primary entity. You'd be amazed at how many pieces of content you can plug into this tool and the top, most salient entity is still only like a 0.01, a 0.14.
A lot of times the API is like "I think this is what it's about," but it's not sure. This is a great time to go in and bump up that content, make it more robust, and look at ways that you can make those entities easier to both extract and to relate to each other. This brings me to my second point, which is my new favorite thing in the world.
Writing for humans and writing for machines, you can now do both at the same time. You no longer have to, and you really haven't had to do this in a long time, but the idea that you might keyword stuff or otherwise create content for Google that your users might not see or care about is way, way, way over.
Now you can create content for Google that also is better for users, because the tenets of machine readability and human readability are moving closer and closer together. 
Tips for writing for human and machine readability:
What I've done here is I did some research not on natural language processing, but on writing for human readability, that is advice from writers, from writing experts on how to write better, clearer, easier to read, easier to understand content.Then I pulled out the pieces of advice that also work as pieces of advice for writing for natural language processing. So natural language processing, again, is the process by which Google or really anything that might be processing language tries to understand how entities are related to each other within a given body of content.
Short, simple sentences
Short, simple sentences. Write simply. Don't use a lot of flowery language. Short sentences and try to keep it to one idea per sentence. 
One idea per sentence
If you're running on, if you've got a lot of different clauses, if you're using a lot of pronouns and it's becoming confusing what you're talking about, that's not great for readers.
It also makes it harder for machines to parse your content. 
Connect questions to answers
Then closely connecting questions to answers. So don't say, "What is the best temperature to bake cookies? Well, let me tell you a story about my grandmother and my childhood," and 500 words later here's the answer. Connect questions to answers. 
What all three of those readability tips have in common is they boil down to reducing the semantic distance between entities.
If you want natural language processing to understand that two entities in your content are closely related, move them closer together in the sentence. Move the words closer together. Reduce the clutter, reduce the fluff, reduce the number of semantic hops that a robot might have to take between one entity and another to understand the relationship, and you've now created content that is more readable because it's shorter and easier to skim, but also easier for a robot to parse and understand.
Be specific first, then explain nuance
Going back to the example of "What is the best temperature to bake chocolate chip cookies at?" Now the real answer to what is the best temperature to bake chocolate cookies is it depends. Hello. Hi, I'm an SEO, and I just answered a question with it depends. It does depend.
That is true, and that is real, but it is not a good answer. It is also not the kind of thing that a robot could extract and reproduce in, for example, voice search or a featured snippet. If somebody says, "Okay, Google, what is a good temperature to bake cookies at?" and Google says, "It depends," that helps nobody even though it's true. So in order to write for both machine and human readability, be specific first and then you can explain nuance.
Then you can go into the details. So a better, just as correct answer to "What is the temperature to bake chocolate chip cookies?" is the best temperature to bake chocolate chip cookies is usually between 325 and 425 degrees, depending on your altitude and how crisp you like your cookie. That is just as true as it depends and, in fact, means the same thing as it depends, but it's a lot more specific.
It's a lot more precise. It uses real numbers. It provides a real answer. I've shortened the distance between the question and the answer. I didn't say it depends first. I said it depends at the end. That's the kind of thing that you can do to improve readability and understanding for both humans and machines.
Get to the point (don't bury the lede)
Get to the point. Don't bury the lead. All of you journalists who try to become content marketers, and then everybody in content marketing said, "Oh, you need to wait till the end to get to your point or they won't read the whole thing,"and you were like, "Don't bury the lead," you are correct. For those of you who aren't familiar with journalism speak, not burying the lead basically means get to the point upfront, at the top.
Include all the information that somebody would really need to get from that piece of content. If they don't read anything else, they read that one paragraph and they've gotten the gist. Then people who want to go deep can go deep. That's how people actually like to consume content, and surprisingly it doesn't mean they won't read the content. It just means they don't have to read it if they don't have time, if they need a quick answer.
The same is true with machines. Get to the point upfront. Make it clear right away what the primary entity, the primary topic, the primary focus of your content is and then get into the details. You'll have a much better structured piece of content that's easier to parse on all sides. 
Avoid jargon and "marketing speak"
Avoid jargon. Avoid marketing speak. Not only is it terrible and very hard to understand. You see this a lot. I'm going back again to the example of getting your clients to say what their products do. You work with a lot of B2B companies, you will you will often run into this. Yes, but what does it do? It provides solutions to streamline the workflow and blah, blah. Okay, what does it do? This is the kind of thing that can be really, really hard for companies to get out of their own heads about, but it's so important for users, for machines.
Avoid jargon. Avoid marketing speak. Not to get too tautological, but the more esoteric a word is, the less commonly it's used. That's actually what esoteric means. What that means is the less commonly a word is used, the less likely it is that Google is going to understand its semantic relationships to other entities.
Keep it simple. Be specific. Say what you mean. Wipe out all of the jargon. By wiping out jargon and kind of marketing speak and kind of the fluff that can happen in your content, you're also, once again, reducing the semantic distances between entities, making them easier to parse. 
Organize your information to match the user journey
Organize it and map it out to the user journey. Think about the information somebody might need and the order in which they might need it. 
Break out subtopics with headings
Then break it out with subheadings. This is like very, very basic writing advice, and yet you all aren't doing it. So if you're not going to do it for your users, do it for machines. 
Format lists with bullets or numbers
You can also really impact skimmability for users by breaking out lists with bullets or numbers.
The great thing about that is that breaking out a list with bullets or numbers also makes information easier for a robot to parse and extract. If a lot of these tips seem like they're the same tips that you would use to get featured snippets, they are, because featured snippets are actually a pretty good indicator that you're creating content that a robot can find, parse, understand, and extract, and that's what you want.
So if you're targeting featured snippets, you're probably already doing a lot of these things, good job. 
Grammar and spelling count!
The last thing, which I shouldn't have to say, but I'm going to say is that grammar and spelling and punctuation and things like that absolutely do count. They count to users. They don't count to all users, but they count to users. They also count to search engines.
Things like grammar, spelling, and punctuation are very, very easy signals for a machine to find and parse. Google has been specific in things, like the "Quality Rater Guidelines,"that a well-written, well-structured, well-spelled, grammatically correct document, that these are signs of authoritativeness. I'm not saying that having a greatly spelled document is going to mean that you immediately rocket to the top of the results.
I am saying that if you're not on that stuff, it's probably going to hurt you. So take the time to make sure everything is nice and tidy. You can use vernacular English. You don't have to be perfect "AP Style Guide" all the time. But make sure that you are formatting things properly from a grammatical standpoint as well as a technical standpoint. What I love about all of this, this is just good writing.
This is good writing. It's easy to understand. It's easy to parse. It's still so hard, especially in the marketing world, to get out of that world of jargon, to get to the point, to stop writing 2,000 words because we think we need 2,000 words, to really think about are we creating content that's about what we think it's about.
Use these tools to understand how readable, parsable, and understandable your content is
So my hope for the SEO world and for you is that you can use these tools not just to think about how to dial in the perfect keyword density or whatever to get an almost perfect score on the salience in the natural language processing API. What I'm hoping is that you will use these tools to help yourself understand how readable, how parsable, and how understandable your content is, how much your content is about what you say it's about and what you think it's about so you can create better stuff for users.
It makes the internet a better place, and it will probably make you some money as well. So these are my thoughts. I'd love to hear in the comments if you're using the natural language processing API now, if you've built a tool with it, if you want to build a tool with it, what do you think about this, how do you use this, how has it gone. Tell me all about it. Holla atcha girl.
Have a great Friday.
Video transcription by Speechpad.com
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
#túi_giấy_epacking_việt_nam #túi_giấy_epacking #in_túi_giấy_giá_rẻ #in_túi_giấy #epackingvietnam #tuigiayepacking
0 notes
localwebmgmt · 5 years ago
Text
Better Content Through NLP (Natural Language Processing) - Whiteboard Friday
Posted by RuthBurrReedy
Gone are the days of optimizing content solely for search engines. For modern SEO, your content needs to please both robots and humans. But how do you know that what you're writing can check the boxes for both man and machine?
In today's Whiteboard Friday, Ruth Burr Reedy focuses on part of her recent MozCon 2019 talk and teaches us all about how Google uses NLP (natural language processing) to truly understand content, plus how you can harness that knowledge to better optimize what you write for people and bots alike.
Click on the whiteboard image above to open a high resolution version in a new tab!
Video Transcription
Howdy, Moz fans. I'm Ruth Burr Reedy, and I am the Vice President of Strategy at UpBuild, a boutique technical marketing agency specializing in technical SEO and advanced web analytics. I recently spoke at MozCon on a basic framework for SEO and approaching changes to our industry that thinks about SEO in the light of we are humans who are marketing to humans, but we are using a machine as the intermediary.
Those videos will be available online at some point. [Editor's note: that point is now!] But today I wanted to talk about one point from my talk that I found really interesting and that has kind of changed the way that I approach content creation, and that is the idea that writing content that is easier for Google, a robot, to understand can actually make you a better writer and help you write better content for humans. It is a win-win. 
The relationships between entities, words, and how people search
To understand how Google is currently approaching parsing content and understanding what content is about, Google is spending a lot of time and a lot of energy and a lot of money on things like neural matching and natural language processing, which seek to understand basically when people talk, what are they talking about?
This goes along with the evolution of search to be more conversational. But there are a lot of times when someone is searching, but they don't totally know what they want, and Google still wants them to get what they want because that's how Google makes money. They are spending a lot of time trying to understand the relationships between entities and between words and how people use words to search.
The example that Danny Sullivan gave online, that I think is a really great example, is if someone is experiencing the soap opera effect on their TV. If you've ever seen a soap opera, you've noticed that they look kind of weird. Someone might be experiencing that, and not knowing what that's called they can't Google soap opera effect because they don't know about it.
They might search something like, "Why does my TV look funny?" Neural matching helps Google understand that when somebody is searching "Why does my TV look funny?" one possible answer might be the soap opera effect. So they can serve up that result, and people are happy. 
Understanding salience
As we're thinking about natural language processing, a core component of natural language processing is understanding salience.
Salience, content, and entities
Salience is a one-word way to sum up to what extent is this piece of content about this specific entity? At this point Google is really good at extracting entities from a piece of content. Entities are basically nouns, people, places, things, proper nouns, regular nouns.
Entities are things, people, etc., numbers, things like that. Google is really good at taking those out and saying, "Okay, here are all of the entities that are contained within this piece of content." Salience attempts to understand how they're related to each other, because what Google is really trying to understand when they're crawling a page is: What is this page about, and is this a good example of a page about this topic?
Salience really goes into the second piece. To what extent is any given entity be the topic of a piece of content? It's often amazing the degree to which a piece of content that a person has created is not actually about anything. I think we've all experienced that.
You're searching and you come to a page and you're like, "This was too vague. This was too broad. This said that it was about one thing, but it was actually about something else. I didn't find what I needed. This wasn't good information for me." As marketers, we're often on the other side of that, trying to get our clients to say what their product actually does on their website or say, "I know you think that you created a guide to Instagram for the holidays. But you actually wrote one paragraph about the holidays and then seven paragraphs about your new Instagram tool. This is not actually a blog post about Instagram for the holidays. It's a piece of content about your tool." These are the kinds of battles that we fight as marketers. 
Natural Language Processing (NLP) APIs
Fortunately, there are now a number of different APIs that you can use to understand natural language processing: 
IBM has one: https://www.ibm.com/watson/services/natural-language-understanding/ 
Google actually has a natural language processing API that's right here on https://cloud.google.com/natural-language/
Is it as sophisticated as what they're using on their own stuff? Probably not. But you can test it out. Put in a piece of content and see (a) what entities Google is able to extract from it, and (b) how salient Google feels each of these entities is to the piece of content as a whole. Again, to what degree is this piece of content about this thing?
So this natural language processing API, which you can try for free and it's actually not that expensive for an API if you want to build a tool with it, will assign each entity that it can extract a salient score between 0 and 1, saying, "Okay, how sure are we that this piece of content is about this thing versus just containing it?"
So the higher or the closer you get to 1, the more confident the tool is that this piece of content is about this thing. 0.9 would be really, really good. 0.01 means it's there, but they're not sure how well it's related. 
A delicious example of how salience and entities work
The example I have here, and this is not taken from a real piece of content — these numbers are made up, it's just an example — is if you had a chocolate chip cookie recipe, you would want chocolate cookies or chocolate chip cookies recipe, chocolate chip cookies, something like that to be the number one entity, the most salient entity, and you would want it to have a pretty high salient score.
You would want the tool to feel pretty confident, yes, this piece of content is about this topic. But what you can also see is the other entities it's extracting and to what degree they are also salient to the topic. So you can see things like if you have a chocolate chip cookie recipe, you would expect to see things like cookie, butter, sugar, 350, which is the temperature you heat your oven, all of the different things that come together to make a chocolate chip cookie recipe.
But I think that it's really, really important for us as SEOs to understand that salience is the future of related keywords. We're beyond the time when to optimize for chocolate chip cookie recipe, we would also be looking for things like chocolate recipe, chocolate chips, chocolate cookie recipe, things like that. Stems, variants, TF-IDF, these are all older methodologies for understanding what a piece of content is about.
Instead what we need to understand is what are the entities that Google, using its vast body of knowledge, using things like Freebase, using large portions of the internet, where is Google seeing these entities co-occur at such a rate that they feel reasonably confident that a piece of content on one entity in order to be salient to that entity would include these other entities?
Using an expert is the best way to create content that's salient to a topic
So chocolate chip cookie recipe, we're now also making sure we're adding things like butter, flour, sugar. This is actually really easy to do if you actually have a chocolate chip cookie recipe to put up there. This is I think what we're going to start seeing as a content trend in SEO is that the best way to create content that is salient to a topic is to have an actual expert in that topic create that content.
Somebody with deep knowledge of a topic is naturally going to include co-occurring terms, because they know how to create something that's about what it's supposed to be about. I think what we're going to start seeing is that people are going to have to start paying more for content marketing, frankly. Unfortunately, a lot of companies seem to think that content marketing is and should be cheap.
Content marketers, I feel you on that. It sucks, and it's no longer the case. We need to start investing in content and investing in experts to create that content so that they can create that deep, rich, salient content that everybody really needs. 
How can you use this API to improve your own SEO? 
One of the things that I like to do with this kind of information is look at — and this is something that I've done for years, just not in this context — but a prime optimization target in general is pages that rank for a topic, but they rank on page 2.
What this often means is that Google understands that that keyword is a topic of the page, but it doesn't necessarily understand that it is a good piece of content on that topic, that the page is actually solely about that content, that it's a good resource. In other words, the signal is there, but it's weak.
What you can do is take content that ranks but not well, run it through this natural language API or another natural language processing tool, and look at how the entities are extracted and how Google is determining that they're related to each other. Sometimes it might be that you need to do some disambiguation. So in this example, you'll notice that while chocolate cookies is called a work of art, and I agree, cookie here is actually called other.
This is because cookie means more than one thing. There's cookies, the baked good, but then there's also cookies, the packet of data. Both of those are legitimate uses of the word "cookie." Words have multiple meanings. If you notice that Google, that this natural language processing API is having trouble correctly classifying your entities, that's a good time to go in and do some disambiguation.
Make sure that the terms surrounding that term are clearly saying, "No, I mean the baked good, not the software piece of data." That's a really great way to kind of bump up your salience. Look at whether or not you have a strong salient score for your primary entity. You'd be amazed at how many pieces of content you can plug into this tool and the top, most salient entity is still only like a 0.01, a 0.14.
A lot of times the API is like "I think this is what it's about," but it's not sure. This is a great time to go in and bump up that content, make it more robust, and look at ways that you can make those entities easier to both extract and to relate to each other. This brings me to my second point, which is my new favorite thing in the world.
Writing for humans and writing for machines, you can now do both at the same time. You no longer have to, and you really haven't had to do this in a long time, but the idea that you might keyword stuff or otherwise create content for Google that your users might not see or care about is way, way, way over.
Now you can create content for Google that also is better for users, because the tenets of machine readability and human readability are moving closer and closer together. 
Tips for writing for human and machine readability:
What I've done here is I did some research not on natural language processing, but on writing for human readability, that is advice from writers, from writing experts on how to write better, clearer, easier to read, easier to understand content.Then I pulled out the pieces of advice that also work as pieces of advice for writing for natural language processing. So natural language processing, again, is the process by which Google or really anything that might be processing language tries to understand how entities are related to each other within a given body of content.
Short, simple sentences
Short, simple sentences. Write simply. Don't use a lot of flowery language. Short sentences and try to keep it to one idea per sentence. 
One idea per sentence
If you're running on, if you've got a lot of different clauses, if you're using a lot of pronouns and it's becoming confusing what you're talking about, that's not great for readers.
It also makes it harder for machines to parse your content. 
Connect questions to answers
Then closely connecting questions to answers. So don't say, "What is the best temperature to bake cookies? Well, let me tell you a story about my grandmother and my childhood," and 500 words later here's the answer. Connect questions to answers. 
What all three of those readability tips have in common is they boil down to reducing the semantic distance between entities.
If you want natural language processing to understand that two entities in your content are closely related, move them closer together in the sentence. Move the words closer together. Reduce the clutter, reduce the fluff, reduce the number of semantic hops that a robot might have to take between one entity and another to understand the relationship, and you've now created content that is more readable because it's shorter and easier to skim, but also easier for a robot to parse and understand.
Be specific first, then explain nuance
Going back to the example of "What is the best temperature to bake chocolate chip cookies at?" Now the real answer to what is the best temperature to bake chocolate cookies is it depends. Hello. Hi, I'm an SEO, and I just answered a question with it depends. It does depend.
That is true, and that is real, but it is not a good answer. It is also not the kind of thing that a robot could extract and reproduce in, for example, voice search or a featured snippet. If somebody says, "Okay, Google, what is a good temperature to bake cookies at?" and Google says, "It depends," that helps nobody even though it's true. So in order to write for both machine and human readability, be specific first and then you can explain nuance.
Then you can go into the details. So a better, just as correct answer to "What is the temperature to bake chocolate chip cookies?" is the best temperature to bake chocolate chip cookies is usually between 325 and 425 degrees, depending on your altitude and how crisp you like your cookie. That is just as true as it depends and, in fact, means the same thing as it depends, but it's a lot more specific.
It's a lot more precise. It uses real numbers. It provides a real answer. I've shortened the distance between the question and the answer. I didn't say it depends first. I said it depends at the end. That's the kind of thing that you can do to improve readability and understanding for both humans and machines.
Get to the point (don't bury the lede)
Get to the point. Don't bury the lead. All of you journalists who try to become content marketers, and then everybody in content marketing said, "Oh, you need to wait till the end to get to your point or they won't read the whole thing,"and you were like, "Don't bury the lead," you are correct. For those of you who aren't familiar with journalism speak, not burying the lead basically means get to the point upfront, at the top.
Include all the information that somebody would really need to get from that piece of content. If they don't read anything else, they read that one paragraph and they've gotten the gist. Then people who want to go deep can go deep. That's how people actually like to consume content, and surprisingly it doesn't mean they won't read the content. It just means they don't have to read it if they don't have time, if they need a quick answer.
The same is true with machines. Get to the point upfront. Make it clear right away what the primary entity, the primary topic, the primary focus of your content is and then get into the details. You'll have a much better structured piece of content that's easier to parse on all sides. 
Avoid jargon and "marketing speak"
Avoid jargon. Avoid marketing speak. Not only is it terrible and very hard to understand. You see this a lot. I'm going back again to the example of getting your clients to say what their products do. You work with a lot of B2B companies, you will you will often run into this. Yes, but what does it do? It provides solutions to streamline the workflow and blah, blah. Okay, what does it do? This is the kind of thing that can be really, really hard for companies to get out of their own heads about, but it's so important for users, for machines.
Avoid jargon. Avoid marketing speak. Not to get too tautological, but the more esoteric a word is, the less commonly it's used. That's actually what esoteric means. What that means is the less commonly a word is used, the less likely it is that Google is going to understand its semantic relationships to other entities.
Keep it simple. Be specific. Say what you mean. Wipe out all of the jargon. By wiping out jargon and kind of marketing speak and kind of the fluff that can happen in your content, you're also, once again, reducing the semantic distances between entities, making them easier to parse. 
Organize your information to match the user journey
Organize it and map it out to the user journey. Think about the information somebody might need and the order in which they might need it. 
Break out subtopics with headings
Then break it out with subheadings. This is like very, very basic writing advice, and yet you all aren't doing it. So if you're not going to do it for your users, do it for machines. 
Format lists with bullets or numbers
You can also really impact skimmability for users by breaking out lists with bullets or numbers.
The great thing about that is that breaking out a list with bullets or numbers also makes information easier for a robot to parse and extract. If a lot of these tips seem like they're the same tips that you would use to get featured snippets, they are, because featured snippets are actually a pretty good indicator that you're creating content that a robot can find, parse, understand, and extract, and that's what you want.
So if you're targeting featured snippets, you're probably already doing a lot of these things, good job. 
Grammar and spelling count!
The last thing, which I shouldn't have to say, but I'm going to say is that grammar and spelling and punctuation and things like that absolutely do count. They count to users. They don't count to all users, but they count to users. They also count to search engines.
Things like grammar, spelling, and punctuation are very, very easy signals for a machine to find and parse. Google has been specific in things, like the "Quality Rater Guidelines,"that a well-written, well-structured, well-spelled, grammatically correct document, that these are signs of authoritativeness. I'm not saying that having a greatly spelled document is going to mean that you immediately rocket to the top of the results.
I am saying that if you're not on that stuff, it's probably going to hurt you. So take the time to make sure everything is nice and tidy. You can use vernacular English. You don't have to be perfect "AP Style Guide" all the time. But make sure that you are formatting things properly from a grammatical standpoint as well as a technical standpoint. What I love about all of this, this is just good writing.
This is good writing. It's easy to understand. It's easy to parse. It's still so hard, especially in the marketing world, to get out of that world of jargon, to get to the point, to stop writing 2,000 words because we think we need 2,000 words, to really think about are we creating content that's about what we think it's about.
Use these tools to understand how readable, parsable, and understandable your content is
So my hope for the SEO world and for you is that you can use these tools not just to think about how to dial in the perfect keyword density or whatever to get an almost perfect score on the salience in the natural language processing API. What I'm hoping is that you will use these tools to help yourself understand how readable, how parsable, and how understandable your content is, how much your content is about what you say it's about and what you think it's about so you can create better stuff for users.
It makes the internet a better place, and it will probably make you some money as well. So these are my thoughts. I'd love to hear in the comments if you're using the natural language processing API now, if you've built a tool with it, if you want to build a tool with it, what do you think about this, how do you use this, how has it gone. Tell me all about it. Holla atcha girl.
Have a great Friday.
Video transcription by Speechpad.com
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
0 notes