#//this goes both for the spambots AND actual people
Explore tagged Tumblr posts
mechahero · 1 year ago
Text
//Trying to look through tags only to be hit with pictures of people's chests and junk
youtube
9 notes · View notes
curmemini · 2 years ago
Note
The most bizarre thing for me about the whole AO3 anti-censorship situation is how people are so against the removal of dangerously problematic fan fiction… but like, if it were any other medium of fandom content being problematic (ex. Racist fanart) those same people would be tripping over themselves to call out the artist and act outraged at how offensive and abhorrent the content was.
Like, clearly these people comprehend that there is a line that art can cross where its artistic value doesn’t outweigh the problematic content that it contains, but as soon as people bring up fan fiction in that regard… they conveniently forget? 🤔
So true Bestie Anon. Fanfiction and Fanart are both treated extremely differently in fandom spaces to begin with, on far less serious issues than racism and abuse of minors (like commissions and audience building). Which does have its roots in early fandom site-history... but times have changed now and it's stupid to keep clutching the same pearls of change for fear of a total site wipe. The cognitive dissonance here is a little... wow. In part, I'm wondering if it is a difference between those who put themselves into a narrative versus those who are passive outsiders when reading. The written word pulls people in in a completely different way than art does, after all, and by its nature it encourages people to spend time within the world of the fic... and where fanart does the same, I think, it is also treated as more... it is easy to scroll past fanart that doesn't engage you right away, and engaging with a piece of art beyond the act of looking at it is a choice. Consuming fanfic requires you to sit down with it. That's a tangential thought though I guess. Excuse me, I'm excited, it's been such a long time since I had an ask that wasn't a spambot. Anyhow, I firmly believe there is a line where a n y art can cross the line into becoming worthless. The eroticization of minors is far past that line. It is easier for people with half a brain to reconize a child is a child when they are looking at them, whereas the concept of the word of a child is ambiguous and can be "correct" if they try to jump through enough hoops. I'd imagine the same goes for the trinity of -isms (racism, sexism, homophobism (joke)) where racism in fanart gets called out when its otherwise ignored in fanfic because there is an actual representation that is going on. It's like whatever. Some of these people are past the point of needing to touch grass and need to eat some.
1 note · View note
tearlessrain · 4 years ago
Text
OC Interview Meme
Tagged by @mercurypilgrim... a while ago (thanks! I wasn’t ignoring this I just left it in my drafts and forgot about it. shoutout to the ray bans spambot that made me check my mentions and remember)
Pick three companions who know your OC/muse well. Answer the questions from at least one of their companions points of view. Replace anywhere it says Khatte with your OC’s name. Name the three companions who will be answering here:
1. Kaliyo
2. Vector
3. Dr. Lokin
Are they ready to be candid with their responses? Don’t worry, this is totally private. Khatte will never read it.
1. First Impressions. What was the first impression you had of Khatte?
Kaliyo: “I thought he was a notorious pirate who'd be fun to mess around with for a while. Yeah.”
Vector: “His song was... discordant, but he was more courteous to us than he needed to be. The phosphorescence in the caverns caught in the ends of his fur like unfamiliar constellations.”
Lokin: “My first impression was that he was a reckless fool who would be dead before the year was out. I was wrong about the last one, at least.”
2. Khatte walks into a bar. No, it’s not a joke - what does he order? If you give him a credit for the jukebox, what kind of music would he put on?
Kaliyo: “Usually he just sits there with his virgin whatever and acts drunk so people will tell him stuff. If he is drinking... he's all about efficiency, like every other aspect of his life. And I’m not letting him near the jukebox, I don’t know where he got his taste in music but it makes whatever room he’s in feel like some kind of stuffy cocktail lounge.”
Vector: “If he isn’t there for work, most likely whatever’s strongest. He has a certain lack of regard for both taste and his own health that we find concerning. We all have our own ways of living with ourselves. We’ve attempted to introduce him to the concept of moderation – and quality – but the results have been... mixed.”
Lokin: “He can hold a very intelligent conversation about liquor, but in actual practice he doesn’t drink it for the taste. As his doctor I really can’t endorse it, considering how colorful his medical charts are already, but I’ve seen agents develop far worse coping mechanisms. As for music, he has surprisingly old-fashioned tastes. I’ve heard music while living aboard his ship that I hadn’t heard in decades. It always drove Kaliyo up the wall, of course, and I'm certain he plays it on purpose sometimes to keep her away.”
3. How does Khatte spend a day off from work?
Kaliyo: “Off work? He doesn’t take days off work, if he looks like he’s not doing anything important that means you should be running in the other direction.”
Vector: “At death’s door, presumably. We can’t think of anything else that would make him stop working that long. If he has a few spare hours he has a habit of disappearing, getting himself spectacularly drunk, and returning with obvious injuries that he somehow expects nobody to notice. He does manage to make time for us, though, and we appreciate the effort. Particularly since we do not share his appreciation for colorful nightlife.”
Lokin: “He gets restless with too much free time; if he doesn’t have work to do he’ll find it or make it. If he’s not doing that he’ll be wandering around the less reputable sectors of the nearest populated planet to find new and inventive ways to get himself killed. It’s none of my business, really, but it runs through our kolto supplies and I do wish he’d just take up knitting.”
4. What silly superstitions or funny traditions does he observe?
Kaliyo: “He doesn’t really have superstitions. Just a lot of paranoia.”
Vector: “We have never seen him sit with his back to a door, though we wouldn’t call it superstition so much as very well-ingrained habit.”
Lokin: “He goes out of his way to be polite to the maintenance droid, of all things. Sometimes I catch him talking to it in the middle of the night. I’m sure he’d stop if he realized anyone knew, so I’ve never said anything about it. No idea what he says to it, but it seems a touch less nervous around him than Seneschal-series droids usually are, so perhaps there’s something to it.”
5. What does Khatte wear to bed? And just how do you know that?
Kaliyo: “I don’t know what happens if he actually makes it to a bed and I don’t care, found him passed out on the floor of the main cabin in leather pants once, though.”
Vector: “He doesn’t, and he has both strong opinions and a complete lack of shame about it. We suspect the only reason he wears clothes while awake is the risk of arrest on most civilized worlds, and the risk of mutiny aboard the ship.”
Lokin: “When you live with so many personalities on a crowded ship, it’s usually best to know as little about one’s crewmates’ personal habits as possible. That being said, I regret to confirm that he sleeps nude, and he’s not the type to waste time pulling on trousers in the event of an emergency alert.”
6. Your favorite memory of Khatte?
Kaliyo: “Ooh, how do I pick? Maybe when I found out he was lying about who he was and I got shanghaied into working for the Empire? Or the time he suddenly grew a moral compass for two minutes and let that double-crossing ‘friend’ of mine slip out from under me? No, wait, definitely the time the hairball almost killed me and started treating me like some kind of prisoner because I got double-crossed by the Wheezer. Fun times. I’m thrilled to be working with him again. Really.”
Vector: “We can pinpoint the exact moment he realized that we wouldn’t leave him for someone... easier to love, as he puts it. It took him a very long time to believe that.”
Lokin: “You know, I find myself missing those ridiculous conversations we used to have. It was only a security measure at the time, but there are very few people in this galaxy I can tolerate, let alone enjoy talking to. I really should find some time to catch up with him again.”
7. A time you very nearly almost kissed Khatte?
Kaliyo: “Way back on Hutta, but unlike the entire rest of the planet’s population, probably including the Hutts, I apparently don’t meet his standards. He has awful taste, his loss.”
Vector: “‘Almost’ is a somewhat leading phrase, but still apt, in his case. He has... a complicated relationship with physical intimacy.”
Kaliyo: “Right, I also rank below ‘creepy bug guy’ somehow. Love it.”
Lokin: “Ah... no. For several reasons.”
8. Vacation time! Where do you take Khatte for some R&R?
Kaliyo: “Look, every time I try to take him somewhere fun he ends up yelling at me. Someone else can deal with it.”
Vector: “We always regretted that he was unable to join us on our pilgrimage to seek out the lost colonies, all those years ago. He has asked us about it often, but we can only describe what our own senses told us. It would be impossible for us to know an environment by scent in as much detail as he could, or imagine the feeling of steam vents against fur, instead of skin. We would like to retrace our path one day, with him at our side.”
Lokin: “I suspect I’d have to sedate him to get him to relax around me. He’s got better survival instincts than that. Not that I’m any threat to him at the moment, of course, but he certainly knows that I could be. Perhaps I’d take him tea-tasting, just to see how long he would put up with it.”
9. Khatte’s sense of humor – is it dry, immature, sarcastic, self-deprecating, physical, witty, dark, or…?
Kaliyo: “He’s fast with it, really good at the whole charming witty banter thing, but there’s nothing behind it, you know? It’s a front. One second he’s all smiles and quips and flirting, the next he’s standing over a dead body with a smoking blaster in his hand like it’s nothing. It’s eerie.”
Vector: “We appreciate his wit, most of the time, but we’re fairly certain his sarcasm could peel paint off the hull of a starship if he feels threatened. It’s best not to let him get to that point, unless you want to be very accurately insulted.”
Lokin: “Even when I had an exceptionally low opinion of him I appreciated his talent for wordplay, but you’d never believe it if you met him in a cantina. I’ve seen him hold his own against half a squadron of drunk soldiers. It was appalling. Still, adaptability is key in this line of work and he does have that. But yes, I wonder if the Commander has given any thought to simply letting Cipher insult the Eternal Fleet to death. It might be worth a try.”
-
Tagging literally any mutual with SWTOR OCs and time on their hands, all of you, go nuts. Even if we’re not mutuals, just say I tagged you if you want to do it.
3 notes · View notes
hub-pub-bub · 6 years ago
Link
Putting together an author website can be slightly overwhelming. There are so many other sites to compare yourself to, so many things you don’t want to forget, and how often are you supposed to update it? How do you even design it? What do author websites need? Where do you find the time? And do you really need a blog? (You don’t.)
Whether your website is sleek like Ashley Poston‘s, intricately illustrated like Roshani Chokshi’s, or merely functional like Suzanne Collins‘s, here are five things that that every author needs to have on their author website.
Because apparently my own house is out of order I'm gonna say it here. If you're a writer you need to have a website. Here's what goes on it: Your name Your beautiful face A list of books you done wroted A bio, shorter is better An email contact form Newsletter signup THAT'S IT
— DongWon (@dongwon)
March 29, 2018
1. Contact information
Even the largest, most successful author needs contact information on their website, yet often its buried in hard-to-find places or just not included at all. Contact information—usually in the form of an email address—ensures that booksellers, press, librarians and fans have a way of contacting you.
Authors who feel nervous or unsafe about giving public access to an email address—a particular issue for authors of color who get an exceptional amount of hatemail for just existing—can make a separate email just for authorial duties (either through your website host or a service like Gmail) or they can list the emails of their publicist, literary agent, or author assistant as a buffer.
While many authors use contact forms, and they certainly do cut down on spam, I personally find them difficult to navigate, and its almost impossible to follow up on an email sent through a generic contact form. My preference is just a listed email address, even if its broken up in a way that spambots can’t decipher.
Looking for an example to model your contact page after? Try Zoraida Córdova’s!
2. Links to your social media
If a fan Googles your name, what will be the first two things to come up? Odds are that it’s your website and your Twitter handle. But there’s more social media sites than Twitter, and they’re much more difficult to find via Google, especially if your usernames are all different.
Depending on your layout, social media links can be featured prominently on a landing page, along the sidebar of blog posts, besides a search function if you use a blog, or listed on the about or contact pages. Much like your contact page, they should be easy to find. Angie Thomas’s website is a great example: she features them prominently on the landing page and along the header of every other page.
Remember to use alt-tags to label each site if you choose to hyperlink logos for fans who may have a hard time “reading” images.
3. Newsletter sign-up form
Unlike blogs or social media, newsletters are more guaranteed to be read (as they’re sitting in an email inbox) and fans who sign up for a newsletter are way more likely to buy a book. Authors who don’t have a newsletter should get one – and any author who does needs a prominent way to have fans sign up on their website.
Newsletter sign-ups, much like social media links, can be displayed on your landing page, in a footer, or along a sidebar. My personal favorite method is to include a newsletter sign-up on your landing page and then link to the sign-up page in the menu that’s featured across the site. While more subtle of an approach for those who skin the landing page, it’s statistically more likely to rope you newsletter subscribers that will turn into actual viable sales. (Who knew?)
Susan Dennard implements both a landing page and a pop-up (not my preferred method, but an effective one) to grab newsletter subscribers on her website. Her newsletter doubles as both a writer’s newsletter and her author newsletter, however, so she markets hers a little bit differently; Kate Elliott’s sign-up form is another example that works.
4. Updated information about your book
‘Updated’ is the key word here, especially when it comes to soon-to-release books. Do you have covers? Summaries? Blurbs? Release dates? Order links? Order links to all available places to buy and not just Amazon? A link to the Goodreads page? Links to any important coverage or reviews?
Keeping your book pages updated are vital for author websites, as finding out more about a book is the main reason people go to websites – but so often, especially pre-publication, it can be easy to forget to keep them updated in the whirlwind of other things that need to be done. Immediately updating any new covers, release dates, and ordering information is a must—you don’t want your own site to be behind on the times!—but, unless there’s another major update, try to make sure your information is up to date a) once a month pre-pub and b) once every few months post-pub.
Victoria Schwab’s website includes a book page with information about all her available titles and encourages fans to buy signed copies from her local independent bookstore, so I’m particularly fond of it.
5. A bio—preferably with a press kit
Whether it’s a blogger, a bookstagrammer, a member of the press, or an eager fan, having a press kit on your website guarantees a place where information on you can be found and used on other sites. (Especially if you include it in an easy-to-find location like your “about” page.)
Press kits should include everything on your book page—summaries, order links, blurbs, release dates—and all of your social media links along with an author bio and downloadable high quality images of your cover(s) and author photo. I recommend having the text available in a .doc or .rtf downloadable file rather than a .pdf for ease of the person downloading, along with the images a downloadable .jpgs. You should also include all of your contact information, including your publicist’s email.
Beth Revis has a beautiful press kit on her website that’s easy-to-find on her contact page.
While author websites can certainly include more than what’s mentioned above, those are five musts for any author’s website. Getting that information up and keeping it updated will keep your readers happy—and keep them coming back for more.
Nicole Brinkley
Tumblr media
Nicole Brinkley is a bookseller who loves dragons and plants. The rest changes without notice. Learn more.
2 notes · View notes
tamphong2888 · 6 years ago
Text
Uncovering SEO Opportunities via Log Files
Posted by RobinRozhon
I use web crawlers on a daily basis. While they are very useful, they only imitate search engine crawlers’ behavior, which means you aren’t always getting the full picture.
The only tool that can give you a real overview of how search engines crawl your site are log files. Despite this, many people are still obsessed with crawl budget — the number of URLs Googlebot can and wants to crawl.
Log file analysis may discover URLs on your site that you had no idea about but that search engines are crawling anyway — a major waste of Google server resources (Google Webmaster Blog):
“Wasting server resources on pages like these will drain crawl activity from pages that do actually have value, which may cause a significant delay in discovering great content on a site.”
While it’s a fascinating topic, the fact is that most sites don’t need to worry that much about crawl budget —an observation shared by John Mueller (Webmaster Trends Analyst at Google) quite a few times already.
There’s still a huge value in analyzing logs produced from those crawls, though. It will show what pages Google is crawling and if anything needs to be fixed.
When you know exactly what your log files are telling you, you’ll gain valuable insights about how Google crawls and views your site, which means you can optimize for this data to increase traffic. And the bigger the site, the greater the impact fixing these issues will have.
What are server logs?
A log file is a recording of everything that goes in and out of a server. Think of it as a ledger of requests made by crawlers and real users. You can see exactly what resources Google is crawling on your site.
You can also see what errors need your attention. For instance, one of the issues we uncovered with our analysis was that our CMS created two URLs for each page and Google discovered both. This led to duplicate content issues because two URLs with the same content was competing against each other.
Analyzing logs is not rocket science — the logic is the same as when working with tables in Excel or Google Sheets. The hardest part is getting access to them — exporting and filtering that data.
Looking at a log file for the first time may also feel somewhat daunting because when you open one, you see something like this:
Calm down and take a closer look at a single line:
66.249.65.107 - - [08/Dec/2017:04:54:20 -0400] "GET /contact/ HTTP/1.1" 200 11179 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
You’ll quickly recognize that:
66.249.65.107 is the IP address (who)
[08/Dec/2017:04:54:20 -0400] is the Timestamp (when)
GET is the Method
/contact/ is the Requested URL (what)
200 is the Status Code (result)
11179 is the Bytes Transferred (size)
“-” is the Referrer URL (source) — it’s empty because this request was made by a crawler
Mozilla/5.0 (compatible; Googlebot/2.1; +http://bit.ly/eSXNch) is the User Agent (signature) — this is user agent of Googlebot (Desktop)
Once you know what each line is composed of, it’s not so scary. It’s just a lot of information. But that’s where the next step comes in handy.
Tools you can use
There are many tools you can choose from that will help you analyze your log files. I won’t give you a full run-down of available ones, but it’s important to know the difference between static and real-time tools.
Static — This only analyzes a static file. You can’t extend the time frame. Want to analyze another period? You need to request a new log file. My favourite tool for analyzing static log files is Power BI.
Real-time — Gives you direct access to logs. I really like open source ELK Stack (Elasticsearch, Logstash, and Kibana). It takes a moderate effort to implement it but once the stack is ready, it allows me changing the time frame based on my needs without needing to contact our developers.
Start analyzing
Don’t just dive into logs with a hope to find something — start asking questions. If you don’t formulate your questions at the beginning, you will end up in a rabbit hole with no direction and no real insights.
Here are a few samples of questions I use at the start of my analysis:
Which search engines crawl my website?
Which URLs are crawled most often?
Which content types are crawled most often?
Which status codes are returned?
If you see that Google is crawling non-existing pages (404), you can start asking which of those requested URLs return 404 status code.
Order the list by the number of requests, evaluate the ones with the highest number to find the pages with the highest priority (the more requests, the higher priority), and consider whether to redirect that URL or do any other action.
If you use a CDN or cache server, you need to get that data as well to get the full picture.
Segment your data
Grouping data into segments provides aggregate numbers that give you the big picture. This makes it easier to spot trends you might have missed by looking only at individual URLs. You can locate problematic sections and drill down if needed.
There are various ways to group URLs:
Group by content type (single product pages vs. category pages)
Group by language (English pages vs. French pages)
Group by storefront (Canadian store vs. US store)
Group by file format (JS vs. images vs. CSS)
Don’t forget to slice your data by user-agent. Looking at Google Desktop, Google Smartphone, and Bing all together won’t surface any useful insights.
Monitor behavior changes over time
Your site changes over time, which means so will crawlers’ behavior. Googlebot often decreases or increases the crawl rate based on factors such as a page’s speed, internal link structure, and the existence of crawl traps.
It’s a good idea to check in with your log files throughout the year or when executing website changes. I look at logs almost on a weekly basis when releasing significant changes for large websites.
By analyzing server logs twice a year, at the very least, you’ll surface changes in crawler’s behavior.
Watch for spoofing
Spambots and scrapers don’t like being blocked, so they may fake their identity — they leverage Googlebot’s user agent to avoid spam filters.
To verify if a web crawler accessing your server really is Googlebot, you can run a reverse DNS lookup and then a forward DNS lookup. More on this topic can be found in Google Webmaster Help Center.
Merge logs with other data sources
While it’s no necessary to connect to other data sources, doing so will unlock another level of insight and context that regular log analysis might not be able to give you. An ability to easily connect multiple datasets and extract insights from them is the main reason why Power BI is my tool of choice, but you can use any tool that you’re familiar with (e.g. Tableau).
Blend server logs with multiple other sources such as Google Analytics data, keyword ranking, sitemaps, crawl data, and start asking questions like:
What pages are not included in the sitemap.xml but are crawled extensively?
What pages are included in the Sitemap.xml file but are not crawled?
Are revenue-driving pages crawled often?
Is the majority of crawled pages indexable?
You may be surprised by the insights you’ll uncover that can help strengthen your SEO strategy. For instance, discovering that almost 70 percent of Googlebot requests are for pages that are not indexable is an insight you can act on.
You can see more examples of blending log files with other data sources in my post about advanced log analysis.
Use logs to debug Google Analytics
Don’t think of server logs as just another SEO tool. Logs are also an invaluable source of information that can help pinpoint technical errors before they become a larger problem.
Last year, Google Analytics reported a drop in organic traffic for our branded search queries. But our keyword tracking tool, STAT Search Analytics, and other tools showed no movement that would have warranted the drop. So, what was going on?
Server logs helped us understand the situation: There was no real drop in traffic. It was our newly deployed WAF (Web Application Firewall) that was overriding the referrer, which caused some organic traffic to be incorrectly classified as direct traffic in Google Analytics.
Using log files in conjunction with keyword tracking in STAT helped us uncover the whole story and diagnose this issue quickly.
Putting it all together
Log analysis is a must-do, especially once you start working with large websites.
My advice is to start with segmenting data and monitoring changes over time. Once you feel ready, explore the possibilities of blending logs with your crawl data or Google Analytics. That’s where great insights are hidden.
Want more?
Ready to learn how to get cracking and tracking some more? Reach out and request a demo get your very own tailored walkthrough of STAT.
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
1 note · View note
crtter · 3 years ago
Note
Enter chapter 2, where Spamton comes in! Retaking control of Kris, you and your classmate go back to the Dark World, this time in the town library’s computer lab, to save two other classmates of yours that were sucked into it. This is a different Dark World, though. It’s computer themed, and all of its denizens are anthropomorphic computer programs and accessories.
Spamton, as you probably know, looks a lot like a puppet, but he’s actually the personification of a spambot. You learn that apparently he used to be quite the prestigious car salesman (or, as he calls it and the meme goes, a “big shot”) something that’s no longer the case: he currently lives inside a dumpster in an alley and both his behavior and speech are hard to parse, especially because he can’t help but mix certain words and phrases up with bits and pieces of ads when he talks.
Spamton ends up making you agree with striking a nebulous deal with him in which he promises to make you a big shot if you retrieve something for him. If you do go along with it, it’s revealed that what he’s after is a robotic body that was hidden away in a basement. Basically, he wants you to upload his consciousness to it via floppy disk. If you do it, he becomes Spamton NEO, a much taller, more powerful robot version of himself and chapter 2’s optional boss fight.
Turns out that Spamton, just like Kris, is being controlled by somebody! (A fact that manifests itself in literal strings attached to him in NEO form) Eventually, we’re able to unlock his full story: he used to be a very unsuccessful adbot -because well, very few people buy stuff spam e-mails tell them to- but he started talking with a mysterious voice through a phone and said voice gave him advice for him to become a very successful salesman. One day, however, the voice disappeared and he lost everything he had and worse: he was no longer fully in control of his actions.
Just like Jevil, he eventually became erratic and obsessed with the idea that, by inhabiting the basement robot, he’d regain control of himself and be able to break into what he calls “heaven”, which seems to be the Light World itself. That isn’t the case, though, and through battling him he’s eventually pacified and joins Kris as a battle item, promising to lend them his strength to hopefully help them “lose their strings”, something he couldn’t do. He’s quite the tragic little dude.
Tumblr media
i’m never gonna play deltarune but i’m a secondhand spamton enjoyer can u explain why spamton turns into spamton neo?
Alright OK so! Idk how familiar you’re with like… the world of Deltarune in itself so I’ll give you a summary of it to contextualize Spamton. You probably know about Undertale through internet osmosis so I’ll skip explaining it but if you don’t, you can find a summary of it fairly easily online!
Deltarune starts as what’s seemingly a spin-off of Undertale. You control a different human kid than in Undertale, a teen this time, who’s the only human living in a town populated by monsters. Most of them are monsters you met in Undertale, including their own adoptive family (though no one seems to have any recollection of the events of Undertale)
Along with a monster classmate, you’re eventually sent to a parallel universe called Dark World by entering an abandoned classroom in their school. Dark World is populated by anthropomorphized versions of everyday objects called Darkners. Together with your classmate and a friendly Darkner, you’re tasked to seal a Dark Fountain, a geyser of dark energy that’s threatening to upset the balance between the Dark and Light worlds. Before the end of the game you can meet a very strong Darkner who acts as an optional boss fight: Jevil, a former court jester who went mad after some kind of revelation he went through and believes the world is just a game.
At the end of chapter 1 though, you’re left with a rather uneasy revelation yourself: the human you’re controlling, Kris, seems aware of YOU, the player, controlling them and is able to free themselves from you for a short while. And this is getting too long so I’m continuing the rest on a reblog.
21 notes · View notes
fyrapartnersearch · 7 years ago
Text
MERFOLK AND ALIENS
Okay, hi! I'm Mal (27, transmale, GMT, for all it matters) and I got a few various calls for RP here! Firstly, my OC stuff -
I have a real craving to revive an old plot I played a while back -
Giant fuck-off armoured spacewhale aliens arrive on earth without warning, and naturally, everyone panics. Earth military mobilised, and the aliens were driven off, big whoop, everyone celebrates, countries of the world are damaged but draw closer together as a result. (Also everyone is considerably more wary and watching the skies all the more closely.)
A couple of months later, an alien craft is captured, with a very different-looking creature. This one is human-sized and apparently more peaceable and goes into captivity without a fight. (Obviously, this is not the entire story.)
Ideally, I'd want you to play the earth scientist charged with interrogating and studying the secondary creature. (Intial language barriers and xenobiology fascination on both sides would make me so happy, please <3)
---
Also, I'm really looking for people to play in my merfolk verse! (Xenobiology notes here) I'm particularly looking for human characters, again, and I would really love someone to help me expand my idea of nomadic merfolk who travel with cetacean pods, who aren't mentioned above because I don't feel like I have enough on them, yet. Please PLEASE PLEASE come interested and asking questions!
---
I usually try to at least check in with my RP partner daily, and it would be cool if that was mutual. I don't necessarily expect daily posting, especially if you're sick or busy, or if I'm sick or busy - hell, I had one partner where WEEKS would go by between our posts and that was awesome because we'd just read back and know what we were doing - but I love talking to my RP partners and getting to know them as friends! 
Nothing would make me happier than someone who doesn't hear from me for a few days and drops a line to check that I'm still breathing. I do it all the time for my current partners, and while sometimes I've dropped a thread because I can't think of what to post/ I've run into a block, it would be much easier to get back on the proverbial equine if I knew that my partner was still interested, too! Instead of just radio silence and me having to make the first move all the time. 
Dude, I will TELL you if I don't want to play with you any more. If I've gone silent it's usually just spoons, and knowing someone else gives a shit helps A LOT. 
Also, PLEASE APPROACH ME WITH YOUR OWN PLOT BUNNIES AND IDEAS. Nothing wears me down faster than someone going 'I dunno, what do you wanna do'. This trip oughta be a Fusion, not just two dumbasses who don't know their chompers from their excretion chute XD 
Anyways, if any of that interested you and you think you can handle this hot mess, contact me at: 
or Skype: lilacsofthedead (PLEASE tell me where you found my handle in your contacting message – I have bad experiences with spambots)
I only RP via gdocs or email, but I REALLY love getting to know you over IM, and I find I actually click with people better in realtime, you know?
Good hunting!
2 notes · View notes
toothextract · 6 years ago
Text
Uncovering SEO Opportunities via Log Files
Posted by RobinRozhon
I use web crawlers on a daily basis. While they are very useful, they only imitate search engine crawlers’ behavior, which means you aren’t always getting the full picture.
The only tool that can give you a real overview of how search engines crawl your site are log files. Despite this, many people are still obsessed with crawl budget — the number of URLs Googlebot can and wants to crawl.
Log file analysis may discover URLs on your site that you had no idea about but that search engines are crawling anyway — a major waste of Google server resources (Google Webmaster Blog):
“Wasting server resources on pages like these will drain crawl activity from pages that do actually have value, which may cause a significant delay in discovering great content on a site.”
While it’s a fascinating topic, the fact is that most sites don’t need to worry that much about crawl budget —an observation shared by John Mueller (Webmaster Trends Analyst at Google) quite a few times already.
There’s still a huge value in analyzing logs produced from those crawls, though. It will show what pages Google is crawling and if anything needs to be fixed.
When you know exactly what your log files are telling you, you’ll gain valuable insights about how Google crawls and views your site, which means you can optimize for this data to increase traffic. And the bigger the site, the greater the impact fixing these issues will have.
Tumblr media
What are server logs?
A log file is a recording of everything that goes in and out of a server. Think of it as a ledger of requests made by crawlers and real users. You can see exactly what resources Google is crawling on your site.
You can also see what errors need your attention. For instance, one of the issues we uncovered with our analysis was that our CMS created two URLs for each page and Google discovered both. This led to duplicate content issues because two URLs with the same content was competing against each other.
Analyzing logs is not rocket science — the logic is the same as when working with tables in Excel or Google Sheets. The hardest part is getting access to them — exporting and filtering that data.
Looking at a log file for the first time may also feel somewhat daunting because when you open one, you see something like this:
Tumblr media
Calm down and take a closer look at a single line:
66.249.65.107 - - [08/Dec/2017:04:54:20 -0400] "GET /contact/ HTTP/1.1" 200 11179 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
You’ll quickly recognize that:
66.249.65.107 is the IP address (who)
[08/Dec/2017:04:54:20 -0400] is the Timestamp (when)
GET is the Method
/contact/ is the Requested URL (what)
200 is the Status Code (result)
11179 is the Bytes Transferred (size)
“-” is the Referrer URL (source) — it’s empty because this request was made by a crawler
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) is the User Agent (signature) — this is user agent of Googlebot (Desktop)
Once you know what each line is composed of, it’s not so scary. It’s just a lot of information. But that’s where the next step comes in handy.
Tools you can use
There are many tools you can choose from that will help you analyze your log files. I won’t give you a full run-down of available ones, but it’s important to know the difference between static and real-time tools.
Static — This only analyzes a static file. You can’t extend the time frame. Want to analyze another period? You need to request a new log file. My favourite tool for analyzing static log files is Power BI.
Real-time — Gives you direct access to logs. I really like open source ELK Stack (Elasticsearch, Logstash, and Kibana). It takes a moderate effort to implement it but once the stack is ready, it allows me changing the time frame based on my needs without needing to contact our developers.
Start analyzing
Don’t just dive into logs with a hope to find something — start asking questions. If you don’t formulate your questions at the beginning, you will end up in a rabbit hole with no direction and no real insights.
Here are a few samples of questions I use at the start of my analysis:
Which search engines crawl my website?
Which URLs are crawled most often?
Which content types are crawled most often?
Which status codes are returned?
If you see that Google is crawling non-existing pages (404), you can start asking which of those requested URLs return 404 status code.
Order the list by the number of requests, evaluate the ones with the highest number to find the pages with the highest priority (the more requests, the higher priority), and consider whether to redirect that URL or do any other action.
Tumblr media
If you use a CDN or cache server, you need to get that data as well to get the full picture.
Segment your data
Grouping data into segments provides aggregate numbers that give you the big picture. This makes it easier to spot trends you might have missed by looking only at individual URLs. You can locate problematic sections and drill down if needed.
There are various ways to group URLs:
Group by content type (single product pages vs. category pages)
Group by language (English pages vs. French pages)
Group by storefront (Canadian store vs. US store)
Group by file format (JS vs. images vs. CSS)
Don’t forget to slice your data by user-agent. Looking at Google Desktop, Google Smartphone, and Bing all together won’t surface any useful insights.
Monitor behavior changes over time
Your site changes over time, which means so will crawlers’ behavior. Googlebot often decreases or increases the crawl rate based on factors such as a page’s speed, internal link structure, and the existence of crawl traps.
It’s a good idea to check in with your log files throughout the year or when executing website changes. I look at logs almost on a weekly basis when releasing significant changes for large websites.
By analyzing server logs twice a year, at the very least, you’ll surface changes in crawler’s behavior.
Watch for spoofing
Spambots and scrapers don’t like being blocked, so they may fake their identity — they leverage Googlebot’s user agent to avoid spam filters.
To verify if a web crawler accessing your server really is Googlebot, you can run a reverse DNS lookup and then a forward DNS lookup. More on this topic can be found in Google Webmaster Help Center.
Merge logs with other data sources
While it’s no necessary to connect to other data sources, doing so will unlock another level of insight and context that regular log analysis might not be able to give you. An ability to easily connect multiple datasets and extract insights from them is the main reason why Power BI is my tool of choice, but you can use any tool that you’re familiar with (e.g. Tableau).
Tumblr media
Blend server logs with multiple other sources such as Google Analytics data, keyword ranking, sitemaps, crawl data, and start asking questions like:
What pages are not included in the sitemap.xml but are crawled extensively?
What pages are included in the Sitemap.xml file but are not crawled?
Are revenue-driving pages crawled often?
Is the majority of crawled pages indexable?
You may be surprised by the insights you’ll uncover that can help strengthen your SEO strategy. For instance, discovering that almost 70 percent of Googlebot requests are for pages that are not indexable is an insight you can act on.
Tumblr media
You can see more examples of blending log files with other data sources in my post about advanced log analysis.
Use logs to debug Google Analytics
Don’t think of server logs as just another SEO tool. Logs are also an invaluable source of information that can help pinpoint technical errors before they become a larger problem.
Last year, Google Analytics reported a drop in organic traffic for our branded search queries. But our keyword tracking tool, STAT Search Analytics, and other tools showed no movement that would have warranted the drop. So, what was going on?
Server logs helped us understand the situation: There was no real drop in traffic. It was our newly deployed WAF (Web Application Firewall) that was overriding the referrer, which caused some organic traffic to be incorrectly classified as direct traffic in Google Analytics.
Using log files in conjunction with keyword tracking in STAT helped us uncover the whole story and diagnose this issue quickly.
Putting it all together
Log analysis is a must-do, especially once you start working with large websites.
My advice is to start with segmenting data and monitoring changes over time. Once you feel ready, explore the possibilities of blending logs with your crawl data or Google Analytics. That’s where great insights are hidden.
Want more?
Ready to learn how to get cracking and tracking some more? Reach out and request a demo get your very own tailored walkthrough of STAT.
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!
from https://dentistry01.wordpress.com/2019/01/23/uncovering-seo-opportunities-via-log-files/
0 notes
atakportal · 6 years ago
Photo
Tumblr media
New Post has been published on https://adz.cloud/2019/01/23/uncovering-seo-opportunities-via-log-files/
Uncovering SEO Opportunities via Log Files
I use web crawlers on a daily basis. While they are very useful, they only imitate search engine crawlers’ behavior, which means you aren’t always getting the full picture.
The only tool that can give you a real overview of how search engines crawl your site are log files. Despite this, many people are still obsessed with crawl budget — the number of URLs Googlebot can and wants to crawl.
Log file analysis may discover URLs on your site that you had no idea about but that search engines are crawling anyway — a major waste of Google server resources (Google Webmaster Blog):
“Wasting server resources on pages like these will drain crawl activity from pages that do actually have value, which may cause a significant delay in discovering great content on a site.”
While it’s a fascinating topic, the fact is that most sites don’t need to worry that much about crawl budget —an observation shared by John Mueller (Webmaster Trends Analyst at Google) quite a few times already.
There’s still a huge value in analyzing logs produced from those crawls, though. It will show what pages Google is crawling and if anything needs to be fixed.
When you know exactly what your log files are telling you, you’ll gain valuable insights about how Google crawls and views your site, which means you can optimize for this data to increase traffic. And the bigger the site, the greater the impact fixing these issues will have.
What are server logs?
A log file is a recording of everything that goes in and out of a server. Think of it as a ledger of requests made by crawlers and real users. You can see exactly what resources Google is crawling on your site.
You can also see what errors need your attention. For instance, one of the issues we uncovered with our analysis was that our CMS created two URLs for each page and Google discovered both. This led to duplicate content issues because two URLs with the same content was competing against each other.
Analyzing logs is not rocket science — the logic is the same as when working with tables in Excel or Google Sheets. The hardest part is getting access to them — exporting and filtering that data.
Looking at a log file for the first time may also feel somewhat daunting because when you open one, you see something like this:
Calm down and take a closer look at a single line:
66.249.65.107 - - [08/Dec/2017:04:54:20 -0400] "GET /contact/ HTTP/1.1" 200 11179 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
You’ll quickly recognize that:
66.249.65.107 is the IP address (who)
[08/Dec/2017:04:54:20 -0400] is the Timestamp (when)
GET is the Method
/contact/ is the Requested URL (what)
200 is the Status Code (result)
11179 is the Bytes Transferred (size)
“-” is the Referrer URL (source) — it’s empty because this request was made by a crawler
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) is the User Agent (signature) — this is user agent of Googlebot (Desktop)
Once you know what each line is composed of, it’s not so scary. It’s just a lot of information. But that’s where the next step comes in handy.
Tools you can use
There are many tools you can choose from that will help you analyze your log files. I won’t give you a full run-down of available ones, but it’s important to know the difference between static and real-time tools.
Static — This only analyzes a static file. You can’t extend the time frame. Want to analyze another period? You need to request a new log file. My favourite tool for analyzing static log files is Power BI.
Real-time — Gives you direct access to logs. I really like open source ELK Stack (Elasticsearch, Logstash, and Kibana). It takes a moderate effort to implement it but once the stack is ready, it allows me changing the time frame based on my needs without needing to contact our developers.
Start analyzing
Don’t just dive into logs with a hope to find something — start asking questions. If you don’t formulate your questions at the beginning, you will end up in a rabbit hole with no direction and no real insights.
Here are a few samples of questions I use at the start of my analysis:
Which search engines crawl my website?
Which URLs are crawled most often?
Which content types are crawled most often?
Which status codes are returned?
If you see that Google is crawling non-existing pages (404), you can start asking which of those requested URLs return 404 status code.
Order the list by the number of requests, evaluate the ones with the highest number to find the pages with the highest priority (the more requests, the higher priority), and consider whether to redirect that URL or do any other action.
If you use a CDN or cache server, you need to get that data as well to get the full picture.
Segment your data
Grouping data into segments provides aggregate numbers that give you the big picture. This makes it easier to spot trends you might have missed by looking only at individual URLs. You can locate problematic sections and drill down if needed.
There are various ways to group URLs:
Group by content type (single product pages vs. category pages)
Group by language (English pages vs. French pages)
Group by storefront (Canadian store vs. US store)
Group by file format (JS vs. images vs. CSS)
Don’t forget to slice your data by user-agent. Looking at Google Desktop, Google Smartphone, and Bing all together won’t surface any useful insights.
Monitor behavior changes over time
Your site changes over time, which means so will crawlers’ behavior. Googlebot often decreases or increases the crawl rate based on factors such as a page’s speed, internal link structure, and the existence of crawl traps.
It’s a good idea to check in with your log files throughout the year or when executing website changes. I look at logs almost on a weekly basis when releasing significant changes for large websites.
By analyzing server logs twice a year, at the very least, you’ll surface changes in crawler’s behavior.
Watch for spoofing
Spambots and scrapers don’t like being blocked, so they may fake their identity — they leverage Googlebot’s user agent to avoid spam filters.
To verify if a web crawler accessing your server really is Googlebot, you can run a reverse DNS lookup and then a forward DNS lookup. More on this topic can be found in Google Webmaster Help Center.
Merge logs with other data sources
While it’s no necessary to connect to other data sources, doing so will unlock another level of insight and context that regular log analysis might not be able to give you. An ability to easily connect multiple datasets and extract insights from them is the main reason why Power BI is my tool of choice, but you can use any tool that you’re familiar with (e.g. Tableau).
Blend server logs with multiple other sources such as Google Analytics data, keyword ranking, sitemaps, crawl data, and start asking questions like:
What pages are not included in the sitemap.xml but are crawled extensively?
What pages are included in the Sitemap.xml file but are not crawled?
Are revenue-driving pages crawled often?
Is the majority of crawled pages indexable?
You may be surprised by the insights you’ll uncover that can help strengthen your SEO strategy. For instance, discovering that almost 70 percent of Googlebot requests are for pages that are not indexable is an insight you can act on.
You can see more examples of blending log files with other data sources in my post about advanced log analysis.
Use logs to debug Google Analytics
Don’t think of server logs as just another SEO tool. Logs are also an invaluable source of information that can help pinpoint technical errors before they become a larger problem.
Last year, Google Analytics reported a drop in organic traffic for our branded search queries. But our keyword tracking tool, STAT Search Analytics, and other tools showed no movement that would have warranted the drop. So, what was going on?
Server logs helped us understand the situation: There was no real drop in traffic. It was our newly deployed WAF (Web Application Firewall) that was overriding the referrer, which caused some organic traffic to be incorrectly classified as direct traffic in Google Analytics.
Using log files in conjunction with keyword tracking in STAT helped us uncover the whole story and diagnose this issue quickly.
Putting it all together
Log analysis is a must-do, especially once you start working with large websites.
My advice is to start with segmenting data and monitoring changes over time. Once you feel ready, explore the possibilities of blending logs with your crawl data or Google Analytics. That’s where great insights are hidden.
Want more?
Ready to learn how to get cracking and tracking some more? Reach out and request a demo to get your very own tailored walkthrough of STAT.
Source link
0 notes
tigrerramon · 8 years ago
Text
Unexpectedly tagged by @agi92 - thanks!
I’m sorry it took so long, I’ve been busy ;~;
Rules: You have to tag followers you want to know better.
Gender: cis girl
Star Sign: Virgo
Height: about 5′6″? I hope that I’ve calculated it correctly ;-;
Sexual Orientation: asexual
Hogwarts House: Ravenclaw ^^
Favourite Colour: purple <3 and black
Favourite Animal: owl
Average hours of sleep: 8, no more, no less (unless it’s holidays, ofc)
Cat or dog person: Really depends on which animal I currently have at home, since I like both about the same. In this moment my dog annoys me a bit, so I’m going to say cat XD Favourite fictional characters: John Watson (BBC), Sherlock Holmes (ACD), Apollo Justice, Miles Edgeworth, Shane (Asagao Academy), Fluttershy, Amethyst, Kyle Broflovski, king Asgore, Kyubey, Tsunami (Wings of Fire), the list goes on and on...
Number of blankets I sleep with: 1, I don’t want to get tangled up and die
Favourite singer/band: Rammstein and In This Moment (edgy, I know)
Dream trip: I always wanted to visit Scandinavia, especially Sweden! Recently I’ve also been interested in Canada tho. It seems like such a lovely, calm place~
Dream job: I wanted to be a writer when I was little. Now I see that this will be rather impossible, but hey, at least I can dream...
When was this blog made: In the early 2015 and it’s been kept empty ever since XD
Number of followers: 28, most of them spambots. I don’t know why this few actual people that follow me do so, but I’m glad that you’re here~
What made you decide to create this blog: I’ve seen some pretty fanarts in here and decided to make an account to keep track of them... and also to see if this whole Tumblr thing was really as bad as everyone said. It was, but before I realized it, I was alredy knee-deep into this mess and couldn’t get out. The blog itself was supposed to hold my thoughts, rants, works etc., but since I’m super anxious about publishing pretty much anything, it never did. Oh well. I’ll tag: all of my followers that are actually human (idk if any of you has alredy done that, plz don’t hit me if you did ;^;) - @datdarklawxbarnhamfangirl, @nuclear-brachy, @shenanigans0830, @cloudy0103 
Sorry for any mistakes, my terrible English is not improved by the late hour...
14 notes · View notes
readersforum · 6 years ago
Text
Uncovering SEO Opportunities via Log Files
New Post has been published on http://www.readersforum.tk/uncovering-seo-opportunities-via-log-files/
Uncovering SEO Opportunities via Log Files
Posted by RobinRozhon
I use web crawlers on a daily basis. While they are very useful, they only imitate search engine crawlers’ behavior, which means you aren’t always getting the full picture.
The only tool that can give you a real overview of how search engines crawl your site are log files. Despite this, many people are still obsessed with crawl budget — the number of URLs Googlebot can and wants to crawl.
Log file analysis may discover URLs on your site that you had no idea about but that search engines are crawling anyway — a major waste of Google server resources (Google Webmaster Blog):
“Wasting server resources on pages like these will drain crawl activity from pages that do actually have value, which may cause a significant delay in discovering great content on a site.”
While it’s a fascinating topic, the fact is that most sites don’t need to worry that much about crawl budget —an observation shared by John Mueller (Webmaster Trends Analyst at Google) quite a few times already.
There’s still a huge value in analyzing logs produced from those crawls, though. It will show what pages Google is crawling and if anything needs to be fixed.
When you know exactly what your log files are telling you, you’ll gain valuable insights about how Google crawls and views your site, which means you can optimize for this data to increase traffic. And the bigger the site, the greater the impact fixing these issues will have.
What are server logs?
A log file is a recording of everything that goes in and out of a server. Think of it as a ledger of requests made by crawlers and real users. You can see exactly what resources Google is crawling on your site.
You can also see what errors need your attention. For instance, one of the issues we uncovered with our analysis was that our CMS created two URLs for each page and Google discovered both. This led to duplicate content issues because two URLs with the same content was competing against each other.
Analyzing logs is not rocket science — the logic is the same as when working with tables in Excel or Google Sheets. The hardest part is getting access to them — exporting and filtering that data.
Looking at a log file for the first time may also feel somewhat daunting because when you open one, you see something like this:
Calm down and take a closer look at a single line:
66.249.65.107 - - [08/Dec/2017:04:54:20 -0400] "GET /contact/ HTTP/1.1" 200 11179 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
You’ll quickly recognize that:
66.249.65.107 is the IP address (who)
[08/Dec/2017:04:54:20 -0400] is the Timestamp (when)
GET is the Method
/contact/ is the Requested URL (what)
200 is the Status Code (result)
11179 is the Bytes Transferred (size)
“-” is the Referrer URL (source) — it’s empty because this request was made by a crawler
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) is the User Agent (signature) — this is user agent of Googlebot (Desktop)
Once you know what each line is composed of, it’s not so scary. It’s just a lot of information. But that’s where the next step comes in handy.
Tools you can use
There are many tools you can choose from that will help you analyze your log files. I won’t give you a full run-down of available ones, but it’s important to know the difference between static and real-time tools.
Static — This only analyzes a static file. You can’t extend the time frame. Want to analyze another period? You need to request a new log file. My favourite tool for analyzing static log files is Power BI.
Real-time — Gives you direct access to logs. I really like open source ELK Stack (Elasticsearch, Logstash, and Kibana). It takes a moderate effort to implement it but once the stack is ready, it allows me changing the time frame based on my needs without needing to contact our developers.
Start analyzing
Don’t just dive into logs with a hope to find something — start asking questions. If you don’t formulate your questions at the beginning, you will end up in a rabbit hole with no direction and no real insights.
Here are a few samples of questions I use at the start of my analysis:
Which search engines crawl my website?
Which URLs are crawled most often?
Which content types are crawled most often?
Which status codes are returned?
If you see that Google is crawling non-existing pages (404), you can start asking which of those requested URLs return 404 status code.
Order the list by the number of requests, evaluate the ones with the highest number to find the pages with the highest priority (the more requests, the higher priority), and consider whether to redirect that URL or do any other action.
If you use a CDN or cache server, you need to get that data as well to get the full picture.
Segment your data
Grouping data into segments provides aggregate numbers that give you the big picture. This makes it easier to spot trends you might have missed by looking only at individual URLs. You can locate problematic sections and drill down if needed.
There are various ways to group URLs:
Group by content type (single product pages vs. category pages)
Group by language (English pages vs. French pages)
Group by storefront (Canadian store vs. US store)
Group by file format (JS vs. images vs. CSS)
Don’t forget to slice your data by user-agent. Looking at Google Desktop, Google Smartphone, and Bing all together won’t surface any useful insights.
Monitor behavior changes over time
Your site changes over time, which means so will crawlers’ behavior. Googlebot often decreases or increases the crawl rate based on factors such as a page’s speed, internal link structure, and the existence of crawl traps.
It’s a good idea to check in with your log files throughout the year or when executing website changes. I look at logs almost on a weekly basis when releasing significant changes for large websites.
By analyzing server logs twice a year, at the very least, you’ll surface changes in crawler’s behavior.
Watch for spoofing
Spambots and scrapers don’t like being blocked, so they may fake their identity — they leverage Googlebot’s user agent to avoid spam filters.
To verify if a web crawler accessing your server really is Googlebot, you can run a reverse DNS lookup and then a forward DNS lookup. More on this topic can be found in Google Webmaster Help Center.
Merge logs with other data sources
While it’s no necessary to connect to other data sources, doing so will unlock another level of insight and context that regular log analysis might not be able to give you. An ability to easily connect multiple datasets and extract insights from them is the main reason why Power BI is my tool of choice, but you can use any tool that you’re familiar with (e.g. Tableau).
Blend server logs with multiple other sources such as Google Analytics data, keyword ranking, sitemaps, crawl data, and start asking questions like:
What pages are not included in the sitemap.xml but are crawled extensively?
What pages are included in the Sitemap.xml file but are not crawled?
Are revenue-driving pages crawled often?
Is the majority of crawled pages indexable?
You may be surprised by the insights you’ll uncover that can help strengthen your SEO strategy. For instance, discovering that almost 70 percent of Googlebot requests are for pages that are not indexable is an insight you can act on.
You can see more examples of blending log files with other data sources in my post about advanced log analysis.
Use logs to debug Google Analytics
Don’t think of server logs as just another SEO tool. Logs are also an invaluable source of information that can help pinpoint technical errors before they become a larger problem.
Last year, Google Analytics reported a drop in organic traffic for our branded search queries. But our keyword tracking tool, STAT Search Analytics, and other tools showed no movement that would have warranted the drop. So, what was going on?
Server logs helped us understand the situation: There was no real drop in traffic. It was our newly deployed WAF (Web Application Firewall) that was overriding the referrer, which caused some organic traffic to be incorrectly classified as direct traffic in Google Analytics.
Using log files in conjunction with keyword tracking in STAT helped us uncover the whole story and diagnose this issue quickly.
Putting it all together
Log analysis is a must-do, especially once you start working with large websites.
My advice is to start with segmenting data and monitoring changes over time. Once you feel ready, explore the possibilities of blending logs with your crawl data or Google Analytics. That’s where great insights are hidden.
Want more?
Ready to learn how to get cracking and tracking some more? Reach out and request a demo to get your very own tailored walkthrough of STAT.
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!
0 notes
tranlinhlan · 6 years ago
Text
Uncovering SEO Opportunities via Log Files
Posted by RobinRozhon
I use web crawlers on a daily basis. While they are very useful, they only imitate search engine crawlers’ behavior, which means you aren’t always getting the full picture.
The only tool that can give you a real overview of how search engines crawl your site are log files. Despite this, many people are still obsessed with crawl budget — the number of URLs Googlebot can and wants to crawl.
Log file analysis may discover URLs on your site that you had no idea about but that search engines are crawling anyway — a major waste of Google server resources (Google Webmaster Blog):
“Wasting server resources on pages like these will drain crawl activity from pages that do actually have value, which may cause a significant delay in discovering great content on a site.”
While it’s a fascinating topic, the fact is that most sites don’t need to worry that much about crawl budget —an observation shared by John Mueller (Webmaster Trends Analyst at Google) quite a few times already.
There’s still a huge value in analyzing logs produced from those crawls, though. It will show what pages Google is crawling and if anything needs to be fixed.
When you know exactly what your log files are telling you, you’ll gain valuable insights about how Google crawls and views your site, which means you can optimize for this data to increase traffic. And the bigger the site, the greater the impact fixing these issues will have.
What are server logs?
A log file is a recording of everything that goes in and out of a server. Think of it as a ledger of requests made by crawlers and real users. You can see exactly what resources Google is crawling on your site.
You can also see what errors need your attention. For instance, one of the issues we uncovered with our analysis was that our CMS created two URLs for each page and Google discovered both. This led to duplicate content issues because two URLs with the same content was competing against each other.
Analyzing logs is not rocket science — the logic is the same as when working with tables in Excel or Google Sheets. The hardest part is getting access to them — exporting and filtering that data.
Looking at a log file for the first time may also feel somewhat daunting because when you open one, you see something like this:
Calm down and take a closer look at a single line:
66.249.65.107 - - [08/Dec/2017:04:54:20 -0400] "GET /contact/ HTTP/1.1" 200 11179 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
You’ll quickly recognize that:
66.249.65.107 is the IP address (who)
[08/Dec/2017:04:54:20 -0400] is the Timestamp (when)
GET is the Method
/contact/ is the Requested URL (what)
200 is the Status Code (result)
11179 is the Bytes Transferred (size)
“-” is the Referrer URL (source) — it’s empty because this request was made by a crawler
Mozilla/5.0 (compatible; Googlebot/2.1; +http://bit.ly/2Uadn11) is the User Agent (signature) — this is user agent of Googlebot (Desktop)
Once you know what each line is composed of, it’s not so scary. It’s just a lot of information. But that’s where the next step comes in handy.
Tools you can use
There are many tools you can choose from that will help you analyze your log files. I won’t give you a full run-down of available ones, but it’s important to know the difference between static and real-time tools.
Static — This only analyzes a static file. You can’t extend the time frame. Want to analyze another period? You need to request a new log file. My favourite tool for analyzing static log files is Power BI.
Real-time — Gives you direct access to logs. I really like open source ELK Stack (Elasticsearch, Logstash, and Kibana). It takes a moderate effort to implement it but once the stack is ready, it allows me changing the time frame based on my needs without needing to contact our developers.
Start analyzing
Don’t just dive into logs with a hope to find something — start asking questions. If you don’t formulate your questions at the beginning, you will end up in a rabbit hole with no direction and no real insights.
Here are a few samples of questions I use at the start of my analysis:
Which search engines crawl my website?
Which URLs are crawled most often?
Which content types are crawled most often?
Which status codes are returned?
If you see that Google is crawling non-existing pages (404), you can start asking which of those requested URLs return 404 status code.
Order the list by the number of requests, evaluate the ones with the highest number to find the pages with the highest priority (the more requests, the higher priority), and consider whether to redirect that URL or do any other action.
If you use a CDN or cache server, you need to get that data as well to get the full picture.
Segment your data
Grouping data into segments provides aggregate numbers that give you the big picture. This makes it easier to spot trends you might have missed by looking only at individual URLs. You can locate problematic sections and drill down if needed.
There are various ways to group URLs:
Group by content type (single product pages vs. category pages)
Group by language (English pages vs. French pages)
Group by storefront (Canadian store vs. US store)
Group by file format (JS vs. images vs. CSS)
Don’t forget to slice your data by user-agent. Looking at Google Desktop, Google Smartphone, and Bing all together won’t surface any useful insights.
Monitor behavior changes over time
Your site changes over time, which means so will crawlers’ behavior. Googlebot often decreases or increases the crawl rate based on factors such as a page’s speed, internal link structure, and the existence of crawl traps.
It’s a good idea to check in with your log files throughout the year or when executing website changes. I look at logs almost on a weekly basis when releasing significant changes for large websites.
By analyzing server logs twice a year, at the very least, you’ll surface changes in crawler’s behavior.
Watch for spoofing
Spambots and scrapers don’t like being blocked, so they may fake their identity — they leverage Googlebot’s user agent to avoid spam filters.
To verify if a web crawler accessing your server really is Googlebot, you can run a reverse DNS lookup and then a forward DNS lookup. More on this topic can be found in Google Webmaster Help Center.
Merge logs with other data sources
While it’s no necessary to connect to other data sources, doing so will unlock another level of insight and context that regular log analysis might not be able to give you. An ability to easily connect multiple datasets and extract insights from them is the main reason why Power BI is my tool of choice, but you can use any tool that you’re familiar with (e.g. Tableau).
Blend server logs with multiple other sources such as Google Analytics data, keyword ranking, sitemaps, crawl data, and start asking questions like:
What pages are not included in the sitemap.xml but are crawled extensively?
What pages are included in the Sitemap.xml file but are not crawled?
Are revenue-driving pages crawled often?
Is the majority of crawled pages indexable?
You may be surprised by the insights you’ll uncover that can help strengthen your SEO strategy. For instance, discovering that almost 70 percent of Googlebot requests are for pages that are not indexable is an insight you can act on.
You can see more examples of blending log files with other data sources in my post about advanced log analysis.
Use logs to debug Google Analytics
Don’t think of server logs as just another SEO tool. Logs are also an invaluable source of information that can help pinpoint technical errors before they become a larger problem.
Last year, Google Analytics reported a drop in organic traffic for our branded search queries. But our keyword tracking tool, STAT Search Analytics, and other tools showed no movement that would have warranted the drop. So, what was going on?
Server logs helped us understand the situation: There was no real drop in traffic. It was our newly deployed WAF (Web Application Firewall) that was overriding the referrer, which caused some organic traffic to be incorrectly classified as direct traffic in Google Analytics.
Using log files in conjunction with keyword tracking in STAT helped us uncover the whole story and diagnose this issue quickly.
Putting it all together
Log analysis is a must-do, especially once you start working with large websites.
My advice is to start with segmenting data and monitoring changes over time. Once you feel ready, explore the possibilities of blending logs with your crawl data or Google Analytics. That’s where great insights are hidden.
Want more?
Ready to learn how to get cracking and tracking some more? Reach out and request a demo get your very own tailored walkthrough of STAT.
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
via Blogger http://bit.ly/2CD2HRA
0 notes
isearchgoood · 6 years ago
Text
Uncovering SEO Opportunities via Log Files
Posted by RobinRozhon
I use web crawlers on a daily basis. While they are very useful, they only imitate search engine crawlers’ behavior, which means you aren’t always getting the full picture.
The only tool that can give you a real overview of how search engines crawl your site are log files. Despite this, many people are still obsessed with crawl budget — the number of URLs Googlebot can and wants to crawl.
Log file analysis may discover URLs on your site that you had no idea about but that search engines are crawling anyway — a major waste of Google server resources (Google Webmaster Blog):
“Wasting server resources on pages like these will drain crawl activity from pages that do actually have value, which may cause a significant delay in discovering great content on a site.”
While it’s a fascinating topic, the fact is that most sites don’t need to worry that much about crawl budget —an observation shared by John Mueller (Webmaster Trends Analyst at Google) quite a few times already.
There’s still a huge value in analyzing logs produced from those crawls, though. It will show what pages Google is crawling and if anything needs to be fixed.
When you know exactly what your log files are telling you, you’ll gain valuable insights about how Google crawls and views your site, which means you can optimize for this data to increase traffic. And the bigger the site, the greater the impact fixing these issues will have.
What are server logs?
A log file is a recording of everything that goes in and out of a server. Think of it as a ledger of requests made by crawlers and real users. You can see exactly what resources Google is crawling on your site.
You can also see what errors need your attention. For instance, one of the issues we uncovered with our analysis was that our CMS created two URLs for each page and Google discovered both. This led to duplicate content issues because two URLs with the same content was competing against each other.
Analyzing logs is not rocket science — the logic is the same as when working with tables in Excel or Google Sheets. The hardest part is getting access to them — exporting and filtering that data.
Looking at a log file for the first time may also feel somewhat daunting because when you open one, you see something like this:
Calm down and take a closer look at a single line:
66.249.65.107 - - [08/Dec/2017:04:54:20 -0400] "GET /contact/ HTTP/1.1" 200 11179 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
You’ll quickly recognize that:
66.249.65.107 is the IP address (who)
[08/Dec/2017:04:54:20 -0400] is the Timestamp (when)
GET is the Method
/contact/ is the Requested URL (what)
200 is the Status Code (result)
11179 is the Bytes Transferred (size)
“-” is the Referrer URL (source) — it’s empty because this request was made by a crawler
Mozilla/5.0 (compatible; Googlebot/2.1; +http://bit.ly/eSXNch) is the User Agent (signature) — this is user agent of Googlebot (Desktop)
Once you know what each line is composed of, it’s not so scary. It’s just a lot of information. But that’s where the next step comes in handy.
Tools you can use
There are many tools you can choose from that will help you analyze your log files. I won’t give you a full run-down of available ones, but it’s important to know the difference between static and real-time tools.
Static — This only analyzes a static file. You can’t extend the time frame. Want to analyze another period? You need to request a new log file. My favourite tool for analyzing static log files is Power BI.
Real-time — Gives you direct access to logs. I really like open source ELK Stack (Elasticsearch, Logstash, and Kibana). It takes a moderate effort to implement it but once the stack is ready, it allows me changing the time frame based on my needs without needing to contact our developers.
Start analyzing
Don’t just dive into logs with a hope to find something — start asking questions. If you don’t formulate your questions at the beginning, you will end up in a rabbit hole with no direction and no real insights.
Here are a few samples of questions I use at the start of my analysis:
Which search engines crawl my website?
Which URLs are crawled most often?
Which content types are crawled most often?
Which status codes are returned?
If you see that Google is crawling non-existing pages (404), you can start asking which of those requested URLs return 404 status code.
Order the list by the number of requests, evaluate the ones with the highest number to find the pages with the highest priority (the more requests, the higher priority), and consider whether to redirect that URL or do any other action.
If you use a CDN or cache server, you need to get that data as well to get the full picture.
Segment your data
Grouping data into segments provides aggregate numbers that give you the big picture. This makes it easier to spot trends you might have missed by looking only at individual URLs. You can locate problematic sections and drill down if needed.
There are various ways to group URLs:
Group by content type (single product pages vs. category pages)
Group by language (English pages vs. French pages)
Group by storefront (Canadian store vs. US store)
Group by file format (JS vs. images vs. CSS)
Don’t forget to slice your data by user-agent. Looking at Google Desktop, Google Smartphone, and Bing all together won’t surface any useful insights.
Monitor behavior changes over time
Your site changes over time, which means so will crawlers’ behavior. Googlebot often decreases or increases the crawl rate based on factors such as a page’s speed, internal link structure, and the existence of crawl traps.
It’s a good idea to check in with your log files throughout the year or when executing website changes. I look at logs almost on a weekly basis when releasing significant changes for large websites.
By analyzing server logs twice a year, at the very least, you’ll surface changes in crawler’s behavior.
Watch for spoofing
Spambots and scrapers don’t like being blocked, so they may fake their identity — they leverage Googlebot’s user agent to avoid spam filters.
To verify if a web crawler accessing your server really is Googlebot, you can run a reverse DNS lookup and then a forward DNS lookup. More on this topic can be found in Google Webmaster Help Center.
Merge logs with other data sources
While it’s no necessary to connect to other data sources, doing so will unlock another level of insight and context that regular log analysis might not be able to give you. An ability to easily connect multiple datasets and extract insights from them is the main reason why Power BI is my tool of choice, but you can use any tool that you’re familiar with (e.g. Tableau).
Blend server logs with multiple other sources such as Google Analytics data, keyword ranking, sitemaps, crawl data, and start asking questions like:
What pages are not included in the sitemap.xml but are crawled extensively?
What pages are included in the Sitemap.xml file but are not crawled?
Are revenue-driving pages crawled often?
Is the majority of crawled pages indexable?
You may be surprised by the insights you’ll uncover that can help strengthen your SEO strategy. For instance, discovering that almost 70 percent of Googlebot requests are for pages that are not indexable is an insight you can act on.
You can see more examples of blending log files with other data sources in my post about advanced log analysis.
Use logs to debug Google Analytics
Don’t think of server logs as just another SEO tool. Logs are also an invaluable source of information that can help pinpoint technical errors before they become a larger problem.
Last year, Google Analytics reported a drop in organic traffic for our branded search queries. But our keyword tracking tool, STAT Search Analytics, and other tools showed no movement that would have warranted the drop. So, what was going on?
Server logs helped us understand the situation: There was no real drop in traffic. It was our newly deployed WAF (Web Application Firewall) that was overriding the referrer, which caused some organic traffic to be incorrectly classified as direct traffic in Google Analytics.
Using log files in conjunction with keyword tracking in STAT helped us uncover the whole story and diagnose this issue quickly.
Putting it all together
Log analysis is a must-do, especially once you start working with large websites.
My advice is to start with segmenting data and monitoring changes over time. Once you feel ready, explore the possibilities of blending logs with your crawl data or Google Analytics. That’s where great insights are hidden.
Want more?
Ready to learn how to get cracking and tracking some more? Reach out and request a demo get your very own tailored walkthrough of STAT.
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
via Blogger http://bit.ly/2MuSBGF #blogger #bloggingtips #bloggerlife #bloggersgetsocial #ontheblog #writersofinstagram #writingprompt #instapoetry #writerscommunity #writersofig #writersblock #writerlife #writtenword #instawriters #spilledink #wordgasm #creativewriting #poetsofinstagram #blackoutpoetry #poetsofig
0 notes
tracisimpson · 6 years ago
Text
Uncovering SEO Opportunities via Log Files
Posted by RobinRozhon
I use web crawlers on a daily basis. While they are very useful, they only imitate search engine crawlers’ behavior, which means you aren’t always getting the full picture.
The only tool that can give you a real overview of how search engines crawl your site are log files. Despite this, many people are still obsessed with crawl budget — the number of URLs Googlebot can and wants to crawl.
Log file analysis may discover URLs on your site that you had no idea about but that search engines are crawling anyway — a major waste of Google server resources (Google Webmaster Blog):
“Wasting server resources on pages like these will drain crawl activity from pages that do actually have value, which may cause a significant delay in discovering great content on a site.”
While it’s a fascinating topic, the fact is that most sites don’t need to worry that much about crawl budget —an observation shared by John Mueller (Webmaster Trends Analyst at Google) quite a few times already.
There’s still a huge value in analyzing logs produced from those crawls, though. It will show what pages Google is crawling and if anything needs to be fixed.
When you know exactly what your log files are telling you, you’ll gain valuable insights about how Google crawls and views your site, which means you can optimize for this data to increase traffic. And the bigger the site, the greater the impact fixing these issues will have.
What are server logs?
A log file is a recording of everything that goes in and out of a server. Think of it as a ledger of requests made by crawlers and real users. You can see exactly what resources Google is crawling on your site.
You can also see what errors need your attention. For instance, one of the issues we uncovered with our analysis was that our CMS created two URLs for each page and Google discovered both. This led to duplicate content issues because two URLs with the same content was competing against each other.
Analyzing logs is not rocket science — the logic is the same as when working with tables in Excel or Google Sheets. The hardest part is getting access to them — exporting and filtering that data.
Looking at a log file for the first time may also feel somewhat daunting because when you open one, you see something like this:
Calm down and take a closer look at a single line:
66.249.65.107 - - [08/Dec/2017:04:54:20 -0400] "GET /contact/ HTTP/1.1" 200 11179 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
You’ll quickly recognize that:
66.249.65.107 is the IP address (who)
[08/Dec/2017:04:54:20 -0400] is the Timestamp (when)
GET is the Method
/contact/ is the Requested URL (what)
200 is the Status Code (result)
11179 is the Bytes Transferred (size)
“-” is the Referrer URL (source) — it’s empty because this request was made by a crawler
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) is the User Agent (signature) — this is user agent of Googlebot (Desktop)
Once you know what each line is composed of, it’s not so scary. It’s just a lot of information. But that’s where the next step comes in handy.
Tools you can use
There are many tools you can choose from that will help you analyze your log files. I won’t give you a full run-down of available ones, but it’s important to know the difference between static and real-time tools.
Static — This only analyzes a static file. You can’t extend the time frame. Want to analyze another period? You need to request a new log file. My favourite tool for analyzing static log files is Power BI.
Real-time — Gives you direct access to logs. I really like open source ELK Stack (Elasticsearch, Logstash, and Kibana). It takes a moderate effort to implement it but once the stack is ready, it allows me changing the time frame based on my needs without needing to contact our developers.
Start analyzing
Don’t just dive into logs with a hope to find something — start asking questions. If you don’t formulate your questions at the beginning, you will end up in a rabbit hole with no direction and no real insights.
Here are a few samples of questions I use at the start of my analysis:
Which search engines crawl my website?
Which URLs are crawled most often?
Which content types are crawled most often?
Which status codes are returned?
If you see that Google is crawling non-existing pages (404), you can start asking which of those requested URLs return 404 status code.
Order the list by the number of requests, evaluate the ones with the highest number to find the pages with the highest priority (the more requests, the higher priority), and consider whether to redirect that URL or do any other action.
If you use a CDN or cache server, you need to get that data as well to get the full picture.
Segment your data
Grouping data into segments provides aggregate numbers that give you the big picture. This makes it easier to spot trends you might have missed by looking only at individual URLs. You can locate problematic sections and drill down if needed.
There are various ways to group URLs:
Group by content type (single product pages vs. category pages)
Group by language (English pages vs. French pages)
Group by storefront (Canadian store vs. US store)
Group by file format (JS vs. images vs. CSS)
Don’t forget to slice your data by user-agent. Looking at Google Desktop, Google Smartphone, and Bing all together won’t surface any useful insights.
Monitor behavior changes over time
Your site changes over time, which means so will crawlers’ behavior. Googlebot often decreases or increases the crawl rate based on factors such as a page’s speed, internal link structure, and the existence of crawl traps.
It’s a good idea to check in with your log files throughout the year or when executing website changes. I look at logs almost on a weekly basis when releasing significant changes for large websites.
By analyzing server logs twice a year, at the very least, you’ll surface changes in crawler’s behavior.
Watch for spoofing
Spambots and scrapers don’t like being blocked, so they may fake their identity — they leverage Googlebot’s user agent to avoid spam filters.
To verify if a web crawler accessing your server really is Googlebot, you can run a reverse DNS lookup and then a forward DNS lookup. More on this topic can be found in Google Webmaster Help Center.
Merge logs with other data sources
While it’s no necessary to connect to other data sources, doing so will unlock another level of insight and context that regular log analysis might not be able to give you. An ability to easily connect multiple datasets and extract insights from them is the main reason why Power BI is my tool of choice, but you can use any tool that you’re familiar with (e.g. Tableau).
Blend server logs with multiple other sources such as Google Analytics data, keyword ranking, sitemaps, crawl data, and start asking questions like:
What pages are not included in the sitemap.xml but are crawled extensively?
What pages are included in the Sitemap.xml file but are not crawled?
Are revenue-driving pages crawled often?
Is the majority of crawled pages indexable?
You may be surprised by the insights you’ll uncover that can help strengthen your SEO strategy. For instance, discovering that almost 70 percent of Googlebot requests are for pages that are not indexable is an insight you can act on.
You can see more examples of blending log files with other data sources in my post about advanced log analysis.
Use logs to debug Google Analytics
Don’t think of server logs as just another SEO tool. Logs are also an invaluable source of information that can help pinpoint technical errors before they become a larger problem.
Last year, Google Analytics reported a drop in organic traffic for our branded search queries. But our keyword tracking tool, STAT Search Analytics, and other tools showed no movement that would have warranted the drop. So, what was going on?
Server logs helped us understand the situation: There was no real drop in traffic. It was our newly deployed WAF (Web Application Firewall) that was overriding the referrer, which caused some organic traffic to be incorrectly classified as direct traffic in Google Analytics.
Using log files in conjunction with keyword tracking in STAT helped us uncover the whole story and diagnose this issue quickly.
Putting it all together
Log analysis is a must-do, especially once you start working with large websites.
My advice is to start with segmenting data and monitoring changes over time. Once you feel ready, explore the possibilities of blending logs with your crawl data or Google Analytics. That’s where great insights are hidden.
Want more?
Ready to learn how to get cracking and tracking some more? Reach out and request a demo get your very own tailored walkthrough of STAT.
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
0 notes
lawrenceseitz22 · 6 years ago
Text
Uncovering SEO Opportunities via Log Files
Posted by RobinRozhon
I use web crawlers on a daily basis. While they are very useful, they only imitate search engine crawlers’ behavior, which means you aren’t always getting the full picture.
The only tool that can give you a real overview of how search engines crawl your site are log files. Despite this, many people are still obsessed with crawl budget — the number of URLs Googlebot can and wants to crawl.
Log file analysis may discover URLs on your site that you had no idea about but that search engines are crawling anyway — a major waste of Google server resources (Google Webmaster Blog):
“Wasting server resources on pages like these will drain crawl activity from pages that do actually have value, which may cause a significant delay in discovering great content on a site.”
While it’s a fascinating topic, the fact is that most sites don’t need to worry that much about crawl budget —an observation shared by John Mueller (Webmaster Trends Analyst at Google) quite a few times already.
There’s still a huge value in analyzing logs produced from those crawls, though. It will show what pages Google is crawling and if anything needs to be fixed.
When you know exactly what your log files are telling you, you’ll gain valuable insights about how Google crawls and views your site, which means you can optimize for this data to increase traffic. And the bigger the site, the greater the impact fixing these issues will have.
What are server logs?
A log file is a recording of everything that goes in and out of a server. Think of it as a ledger of requests made by crawlers and real users. You can see exactly what resources Google is crawling on your site.
You can also see what errors need your attention. For instance, one of the issues we uncovered with our analysis was that our CMS created two URLs for each page and Google discovered both. This led to duplicate content issues because two URLs with the same content was competing against each other.
Analyzing logs is not rocket science — the logic is the same as when working with tables in Excel or Google Sheets. The hardest part is getting access to them — exporting and filtering that data.
Looking at a log file for the first time may also feel somewhat daunting because when you open one, you see something like this:
Calm down and take a closer look at a single line:
66.249.65.107 - - [08/Dec/2017:04:54:20 -0400] "GET /contact/ HTTP/1.1" 200 11179 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
You’ll quickly recognize that:
66.249.65.107 is the IP address (who)
[08/Dec/2017:04:54:20 -0400] is the Timestamp (when)
GET is the Method
/contact/ is the Requested URL (what)
200 is the Status Code (result)
11179 is the Bytes Transferred (size)
“-” is the Referrer URL (source) — it’s empty because this request was made by a crawler
Mozilla/5.0 (compatible; Googlebot/2.1; +http://bit.ly/eSXNch) is the User Agent (signature) — this is user agent of Googlebot (Desktop)
Once you know what each line is composed of, it’s not so scary. It’s just a lot of information. But that’s where the next step comes in handy.
Tools you can use
There are many tools you can choose from that will help you analyze your log files. I won’t give you a full run-down of available ones, but it’s important to know the difference between static and real-time tools.
Static — This only analyzes a static file. You can’t extend the time frame. Want to analyze another period? You need to request a new log file. My favourite tool for analyzing static log files is Power BI.
Real-time — Gives you direct access to logs. I really like open source ELK Stack (Elasticsearch, Logstash, and Kibana). It takes a moderate effort to implement it but once the stack is ready, it allows me changing the time frame based on my needs without needing to contact our developers.
Start analyzing
Don’t just dive into logs with a hope to find something — start asking questions. If you don’t formulate your questions at the beginning, you will end up in a rabbit hole with no direction and no real insights.
Here are a few samples of questions I use at the start of my analysis:
Which search engines crawl my website?
Which URLs are crawled most often?
Which content types are crawled most often?
Which status codes are returned?
If you see that Google is crawling non-existing pages (404), you can start asking which of those requested URLs return 404 status code.
Order the list by the number of requests, evaluate the ones with the highest number to find the pages with the highest priority (the more requests, the higher priority), and consider whether to redirect that URL or do any other action.
If you use a CDN or cache server, you need to get that data as well to get the full picture.
Segment your data
Grouping data into segments provides aggregate numbers that give you the big picture. This makes it easier to spot trends you might have missed by looking only at individual URLs. You can locate problematic sections and drill down if needed.
There are various ways to group URLs:
Group by content type (single product pages vs. category pages)
Group by language (English pages vs. French pages)
Group by storefront (Canadian store vs. US store)
Group by file format (JS vs. images vs. CSS)
Don’t forget to slice your data by user-agent. Looking at Google Desktop, Google Smartphone, and Bing all together won’t surface any useful insights.
Monitor behavior changes over time
Your site changes over time, which means so will crawlers’ behavior. Googlebot often decreases or increases the crawl rate based on factors such as a page’s speed, internal link structure, and the existence of crawl traps.
It’s a good idea to check in with your log files throughout the year or when executing website changes. I look at logs almost on a weekly basis when releasing significant changes for large websites.
By analyzing server logs twice a year, at the very least, you’ll surface changes in crawler’s behavior.
Watch for spoofing
Spambots and scrapers don’t like being blocked, so they may fake their identity — they leverage Googlebot’s user agent to avoid spam filters.
To verify if a web crawler accessing your server really is Googlebot, you can run a reverse DNS lookup and then a forward DNS lookup. More on this topic can be found in Google Webmaster Help Center.
Merge logs with other data sources
While it’s no necessary to connect to other data sources, doing so will unlock another level of insight and context that regular log analysis might not be able to give you. An ability to easily connect multiple datasets and extract insights from them is the main reason why Power BI is my tool of choice, but you can use any tool that you’re familiar with (e.g. Tableau).
Blend server logs with multiple other sources such as Google Analytics data, keyword ranking, sitemaps, crawl data, and start asking questions like:
What pages are not included in the sitemap.xml but are crawled extensively?
What pages are included in the Sitemap.xml file but are not crawled?
Are revenue-driving pages crawled often?
Is the majority of crawled pages indexable?
You may be surprised by the insights you’ll uncover that can help strengthen your SEO strategy. For instance, discovering that almost 70 percent of Googlebot requests are for pages that are not indexable is an insight you can act on.
You can see more examples of blending log files with other data sources in my post about advanced log analysis.
Use logs to debug Google Analytics
Don’t think of server logs as just another SEO tool. Logs are also an invaluable source of information that can help pinpoint technical errors before they become a larger problem.
Last year, Google Analytics reported a drop in organic traffic for our branded search queries. But our keyword tracking tool, STAT Search Analytics, and other tools showed no movement that would have warranted the drop. So, what was going on?
Server logs helped us understand the situation: There was no real drop in traffic. It was our newly deployed WAF (Web Application Firewall) that was overriding the referrer, which caused some organic traffic to be incorrectly classified as direct traffic in Google Analytics.
Using log files in conjunction with keyword tracking in STAT helped us uncover the whole story and diagnose this issue quickly.
Putting it all together
Log analysis is a must-do, especially once you start working with large websites.
My advice is to start with segmenting data and monitoring changes over time. Once you feel ready, explore the possibilities of blending logs with your crawl data or Google Analytics. That’s where great insights are hidden.
Want more?
Ready to learn how to get cracking and tracking some more? Reach out and request a demo get your very own tailored walkthrough of STAT.
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
from Blogger http://bit.ly/2TbHGnS via IFTTT
0 notes
swunlimitednj · 6 years ago
Text
Uncovering SEO Opportunities via Log Files
Posted by RobinRozhon
I use web crawlers on a daily basis. While they are very useful, they only imitate search engine crawlers’ behavior, which means you aren’t always getting the full picture.
The only tool that can give you a real overview of how search engines crawl your site are log files. Despite this, many people are still obsessed with crawl budget — the number of URLs Googlebot can and wants to crawl.
Log file analysis may discover URLs on your site that you had no idea about but that search engines are crawling anyway — a major waste of Google server resources (Google Webmaster Blog):
“Wasting server resources on pages like these will drain crawl activity from pages that do actually have value, which may cause a significant delay in discovering great content on a site.”
While it’s a fascinating topic, the fact is that most sites don’t need to worry that much about crawl budget —an observation shared by John Mueller (Webmaster Trends Analyst at Google) quite a few times already.
There’s still a huge value in analyzing logs produced from those crawls, though. It will show what pages Google is crawling and if anything needs to be fixed.
When you know exactly what your log files are telling you, you’ll gain valuable insights about how Google crawls and views your site, which means you can optimize for this data to increase traffic. And the bigger the site, the greater the impact fixing these issues will have.
What are server logs?
A log file is a recording of everything that goes in and out of a server. Think of it as a ledger of requests made by crawlers and real users. You can see exactly what resources Google is crawling on your site.
You can also see what errors need your attention. For instance, one of the issues we uncovered with our analysis was that our CMS created two URLs for each page and Google discovered both. This led to duplicate content issues because two URLs with the same content was competing against each other.
Analyzing logs is not rocket science — the logic is the same as when working with tables in Excel or Google Sheets. The hardest part is getting access to them — exporting and filtering that data.
Looking at a log file for the first time may also feel somewhat daunting because when you open one, you see something like this:
Calm down and take a closer look at a single line:
66.249.65.107 - - [08/Dec/2017:04:54:20 -0400] "GET /contact/ HTTP/1.1" 200 11179 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
You’ll quickly recognize that:
66.249.65.107 is the IP address (who)
[08/Dec/2017:04:54:20 -0400] is the Timestamp (when)
GET is the Method
/contact/ is the Requested URL (what)
200 is the Status Code (result)
11179 is the Bytes Transferred (size)
“-” is the Referrer URL (source) — it’s empty because this request was made by a crawler
Mozilla/5.0 (compatible; Googlebot/2.1; +http://bit.ly/eSXNch) is the User Agent (signature) — this is user agent of Googlebot (Desktop)
Once you know what each line is composed of, it’s not so scary. It’s just a lot of information. But that’s where the next step comes in handy.
Tools you can use
There are many tools you can choose from that will help you analyze your log files. I won’t give you a full run-down of available ones, but it’s important to know the difference between static and real-time tools.
Static — This only analyzes a static file. You can’t extend the time frame. Want to analyze another period? You need to request a new log file. My favourite tool for analyzing static log files is Power BI.
Real-time — Gives you direct access to logs. I really like open source ELK Stack (Elasticsearch, Logstash, and Kibana). It takes a moderate effort to implement it but once the stack is ready, it allows me changing the time frame based on my needs without needing to contact our developers.
Start analyzing
Don’t just dive into logs with a hope to find something — start asking questions. If you don’t formulate your questions at the beginning, you will end up in a rabbit hole with no direction and no real insights.
Here are a few samples of questions I use at the start of my analysis:
Which search engines crawl my website?
Which URLs are crawled most often?
Which content types are crawled most often?
Which status codes are returned?
If you see that Google is crawling non-existing pages (404), you can start asking which of those requested URLs return 404 status code.
Order the list by the number of requests, evaluate the ones with the highest number to find the pages with the highest priority (the more requests, the higher priority), and consider whether to redirect that URL or do any other action.
If you use a CDN or cache server, you need to get that data as well to get the full picture.
Segment your data
Grouping data into segments provides aggregate numbers that give you the big picture. This makes it easier to spot trends you might have missed by looking only at individual URLs. You can locate problematic sections and drill down if needed.
There are various ways to group URLs:
Group by content type (single product pages vs. category pages)
Group by language (English pages vs. French pages)
Group by storefront (Canadian store vs. US store)
Group by file format (JS vs. images vs. CSS)
Don’t forget to slice your data by user-agent. Looking at Google Desktop, Google Smartphone, and Bing all together won’t surface any useful insights.
Monitor behavior changes over time
Your site changes over time, which means so will crawlers’ behavior. Googlebot often decreases or increases the crawl rate based on factors such as a page’s speed, internal link structure, and the existence of crawl traps.
It’s a good idea to check in with your log files throughout the year or when executing website changes. I look at logs almost on a weekly basis when releasing significant changes for large websites.
By analyzing server logs twice a year, at the very least, you’ll surface changes in crawler’s behavior.
Watch for spoofing
Spambots and scrapers don’t like being blocked, so they may fake their identity — they leverage Googlebot’s user agent to avoid spam filters.
To verify if a web crawler accessing your server really is Googlebot, you can run a reverse DNS lookup and then a forward DNS lookup. More on this topic can be found in Google Webmaster Help Center.
Merge logs with other data sources
While it’s no necessary to connect to other data sources, doing so will unlock another level of insight and context that regular log analysis might not be able to give you. An ability to easily connect multiple datasets and extract insights from them is the main reason why Power BI is my tool of choice, but you can use any tool that you’re familiar with (e.g. Tableau).
Blend server logs with multiple other sources such as Google Analytics data, keyword ranking, sitemaps, crawl data, and start asking questions like:
What pages are not included in the sitemap.xml but are crawled extensively?
What pages are included in the Sitemap.xml file but are not crawled?
Are revenue-driving pages crawled often?
Is the majority of crawled pages indexable?
You may be surprised by the insights you’ll uncover that can help strengthen your SEO strategy. For instance, discovering that almost 70 percent of Googlebot requests are for pages that are not indexable is an insight you can act on.
You can see more examples of blending log files with other data sources in my post about advanced log analysis.
Use logs to debug Google Analytics
Don’t think of server logs as just another SEO tool. Logs are also an invaluable source of information that can help pinpoint technical errors before they become a larger problem.
Last year, Google Analytics reported a drop in organic traffic for our branded search queries. But our keyword tracking tool, STAT Search Analytics, and other tools showed no movement that would have warranted the drop. So, what was going on?
Server logs helped us understand the situation: There was no real drop in traffic. It was our newly deployed WAF (Web Application Firewall) that was overriding the referrer, which caused some organic traffic to be incorrectly classified as direct traffic in Google Analytics.
Using log files in conjunction with keyword tracking in STAT helped us uncover the whole story and diagnose this issue quickly.
Putting it all together
Log analysis is a must-do, especially once you start working with large websites.
My advice is to start with segmenting data and monitoring changes over time. Once you feel ready, explore the possibilities of blending logs with your crawl data or Google Analytics. That’s where great insights are hidden.
Want more?
Ready to learn how to get cracking and tracking some more? Reach out and request a demo get your very own tailored walkthrough of STAT.
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
from Blogger http://bit.ly/2Mqe17S via SW Unlimited
0 notes