#mirko's introduction post
Explore tagged Tumblr posts
mirkodoesstuff · 6 months ago
Text
WELCOME TO MY LITTLE CORNER ON THE INTERNET, VISITOR!
☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎 ☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎
Tumblr media
☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎 ☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎
Hello! My name’s Mirkoslavec, however you can refer to me as Mirko, Mirk or a secret third option (or my a completely different name, if we’re close). I'm 19 years old and my birthday is on June 8th!
☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎 ☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎
Ever since I was young, I was into various things, such as drawing, writing, creating various headcanons, reading, sleeping (this one is obvious) and, obviously, wanting to be original. I started to both draw and write ever since I was a little kid, and though most of my stories were either lost to time, were deleted ‘cause I was embarrassed and decided to just delete them from existence, I still improved through time, the same goes for drawings. However, unlike with stories, the old drawings are, mostly, still saved on my pen drive, however, most of them are from the year 2019, so it’s just 5 years ago. If I ever decide to post some of my old art there, I would do it as a form of redrawing those drawings, so I wouldn’t really feel ashamed of myself.
I’m really into various topics as well, mainly the darker ones, such as death, loss of loved ones, addictions, dark thoughts, dealing with various mental problems, having to come up with the past of the loved ones, that you recently learnt through the most unexpected way, torture, politics (as in, learning why things are that way, and not in another way), history (because I just love learning about the past, especially about Mediaeval Times and the last century). However, don’t expect me to get political - I just want to get away from the real world and just enjoy every free moment I have, and also I know, how people can get heated due to that topic.
☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎 ☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎
The mascot of this blog is a character called “Miroslav Bochto”, who seems to be, at first, just a normal, anthropomorphised food object (there is an object version of him), however actually isn’t an object at all. His reference, under his “used” form is there:
Tumblr media
Miroslav will be used mainly in “personal” art, such as in art, where I celebrate various holidays, such as vacations, Christmas, Halloween, art for other things, such as rants, reviews and just in general.
Of course, there are more than one “main” OC, however Miroslav is the main one, as he was created in thought of, directly, representing me.
☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎 ☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎
I know, currently, three languages - Polish (my native), English (pretty good) and German (basic knowledge), however I wish I could learn Russian, Greek, Spanish, Portuguese and Icelandic.
Another thing I want to learn soon is how to code and create 3D models. I have my reasons - with coding I could actually create a game on my own - either a fangame for one of the fandoms I belong to, or my own original game, with modelling it’s a mix between “I want to create games” and “I want to create cool stuff, that aren’t only 2D” due to the fact, that I;m aware about the whole thing with me having to not only learn how to code, but also knowing how to improve the game, how to create various events, that would get the players hooked, how to not make the game be unfairly hard or unfairly easy etc., and with modelling, I might focus on creating, firstly, only 3D models, completely for fun, and not for any games, unless… Something would change.
☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎 ☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎
Tumblr media
☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎 ☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎
I’m in many fandoms, so you might expect to not only see drawings, headcanons, opinions etc. related to only one fandom, but also for many fandoms.
I belong to:
Object Show Community (BFDI, II, LOTS, BOTO, plan to watch more Object Shows)
Five Nights at Freddy’s (and fangames, such as Five Nights at Candy’s, Playtime with Percy, Those Nights at Rachel’s, Dayshift at Freddy’s, The Joy of Creation, Popgoes, etc.)
Warrior Cats,
Cartoon communities (The Simpsons, Gravity Falls),
The Sims,
Omori,
Skyrim, 
Terraria,
Stardew Valley,
OCs (as in my original universe for OCs I have), 
Undertale (and its AUs),
Doki Doki Literature Club, 
Anthropomorphised animals
Brawl Stars
And many more, however I don’t really remember them, however, if I remember some, I’ll either add them or just upload a drawing from fandoms I didn’t list there!
☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎 ☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎
Tumblr media
☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎 ☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎
The tags, that I plan to use are divided in four categories, due to me wanting to make the blog be as easy access not only for me, but also for anyone reading the blog.
The first category is “Personal tags”, which are tags supposed to indicate what the post is supposed to be about.
#mirko draws - it’s obviously a post, featuring a drawing.
#mirko redraws - same as above, however it is more about redrawing either an old piece of mine, a scene in games/in movies/in comics or something else
#mirko’s opinions - used, when I share an opinion over various things
#mirko rants - as the name says, it is used when I rant about something
#mirko rambles - used, when I ramble about something, such as the newest released episode, the news about various games, etc.
#mirko reviews - a post, that indicates, that what I posted is a review of something (either a game, an episode of a TV series, or webseries)
#mirko reblogs - as name suggest it, it’s used when I reblog stuff from other people.
#mirko's answers - tag used for asks
#mirko vents - used only for venting (either through art or text… Or both)
#mirko’s special announcements - only used when I want to announce something special
#mirko’s introductory post - only used for the post you’re reading right now
#mirko’s headcanons - used on a post featuring my headcanons for various characters
#mirko’s comics - a comic, that was created by me.
#mirko updates stories - this tag is used, when a story, either on AO3 or Wattpad, gets updated
#shitposting time - a tag for various shitposts, that are going to be occasionally posted, related to various events.
#mirko’s designs - designs created by me
The second category is “Fandom-related” tags, that are only used for something related to projects for fandoms I’m part of.
#the rising moon universe - tag used for all posts, that are related to a BFDI AU, which is “what if BFDI characters were anthropomorphised, and there was more stuff to it?”
#goikian stories - a tag used for all posts, which are about an AU based on beta BFDI content (such as Firey comic series, Total Firey Island, Total Firey Island Points, etc.)
#along came a bubble - a tag used for all posts, which are about a BFDI AU, in which Bubble snapped at everyone, who mistreated her.
#among the clouds - a tag used for all posts, which are for a BFDI AU, in which TV needs to deal with a heavy loss of his
#battle cats - a tag used for all posts, which are for a BFDI x Warrior Cats AU.
#paltronics’ experiments - a tag used for all posts, which are for PWP AU, in which Paltronics decided to use a technology to create an updated cast of the original characters, adding more stuff to it.
#percy’s afterparty - an AU, in which many years after the incident, that changed the poodle’s life completely, Percy is forced to confront the forgotten past
#the playhouse of damned ones - an AU, in which the playhouse was abandoned for an unknown, for public, reason and Nick decides to see what happened in there
#our playtime - some sort of OMORI x PWP x BFDI (woah) AU
The third category is “OCs” related
#miroslav and friends - a series of stories/comics/drawings etc., featuring my persona - Miroslav and his gang in various situations
#lights in darkness - a series of stories/comics/drawings etc., featuring my original characters, still related to object shows, living on an island and having lots of adventures.
#the forest seven - OCs, that belong to the Forest in Goikian Stories
#the forest guards - a group of OCs (both objects and cats), that were guarding the Evil Forest for many years
#wolkrows - a group of demonic creatures, that have two forms - one “hidden” and another - the real, blob-like form
The fourth category is “things I did for people” related
#mirko’s art for people - a tag used, if the art was requested by someone
#mirko’s paid work - a tag used, if the art was bought by someone
#mirko’s gift - a tag used, if the art was a gift for someone
#mirko’s part of art-trade - a tag used, if the art was a part of art trade with someone
#mirko’s part of collab - a tag used for my part of a collab
☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎 ☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎
Tumblr media
☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎 ☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎
Naturally, the tags in fourth category wouldn’t be used so often, due to various reasons, as, for example school, me being busy with life or any other reasons.
Requests - CLOSED
C0mmissi0ns - CLOSED
Art-Trades - CLOSED
Collabs - CLOSED
☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎 ☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎
Tumblr media
☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎 ☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎
You know, I wish my blog could be kid-friendly, however… Me being me is making it impossible, so I require everyone, who are visiting my blog to be at least 15 years old! Under 15 do not interact!
Another thing I want to say, that on my blog you can find themes, such as:
Gore,
Death,
Torture,
Addictions,
Nightmares,
Dealing with mental problems,
Memory loss,
Paranormal activities,
Repressed memories,
Mystery pasts,
Breakups/bad romances,
Loss of friends/family etc.,
Kidnappings,
Body Horror,
Disturbing lore,
Suggestive themes,
So, if you’re uncomfortable with any of the topics above, don’t follow me, don’t try to force me to not talk/make stories based on those topics/etc., just because you dislike those topics! 
All of those topics will have warnings.
You can expect some of “lighter” stuff be there too, because even people like me need some fluff, right?
☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎 ☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎
Tumblr media
☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎 ☠୧⍤⃝ 𒋨𒆙𒆜𒄆𒁎
The list of my other social media is:
Gamejolt - Mirkoslavec
Discord - mirkoslavec
Toyhouse - Mirkoslavec
Deviantart - Mirkoslavec 
Cara - soon
Hope you will enjoy the stay there!
3 notes · View notes
mirkobloom77 · 8 months ago
Text
💚✨ Introduction post yay ✨💚
🔹 Hello I’m Mirko
🔹 I’m Mexican :) 🇲🇽
🔹 I speak Spanish (first language) and English ofc
🔹 My age is unimportant
🔹 My gender is unknown
🔹 Any pronouns, any gendered terms, go mad go crazy
🔹 I love getting asks and I may not answer cuz I either forgor or I didn’t see them but you’re all appreciated
🔹 I post art sometimes :)
🔹 In this blog I will not tolerate any form of Islamophobia, Antisemitism, Racism, LGBTQ-phobia or any other comments of this type. TERFs and Zionists are also unwanted here, for the record.
💚✨ Tags ✨💚
#Mirko’s art stuff = My art
(Also link to my picrew… it’s got cowies…)
#save & #save save save save = stuff I like and/or want to save
#gofundme = gofundmes ofc
💚✨ Comissions ✨💚
Im currently doing Comissions for Palestine! You can find the full post on it down below<3 It already provides with some art examples but feel free to go through #Mirko’s art stuff to see more.
25 notes · View notes
danihawks · 10 days ago
Text
DabiHawks A.U./ Chapter 6
Back at the apartment Mirko was panicking, she had been watching Feathers when she got the message on her phone from Jeanist.
"Shit!" She practically yelled, leaping off the floor she had been sitting on, playing with feathers. She had been trying to teach him to say her name, but Feathers couldn't pronounce the "r" and kept calling her Miko. "Ooooo, your daddy is not gonna be happy when he gets home" Mirko said out loud. "Shit" Feathers repeated innocently sweet sounding. "No, no no no, no no" "we don't say that word" Mirko said shaking her finger at Feathers. "No" Feathers repeated back to her before putting his finger in his mouth. Mirko walked over to the kitchenette in Kiego's apartment throwing things together. "We need to distract your daddy when he gets home so Uncle Jeanist is coming over and we're gonna make snacks for him and Daddy" Mirko said as if talking to Feathers. Feathers struggled to get up trying to crawl and follow Mirko into the kitchen. Mirko picked him up so he wouldn't get his hands dirty crawling all over the floor, since he had been putting his hands in his mouth recently, and brought him over to the counter with her so she could keep an eye on him while she put together some trail mix or popcorn or something, she had to think, what would be a comfort food, hmmmm… "coooky!!" Feathers said, putting his hands up in the air excitedly. "Cookies! That's a great idea!!" Mirko said ruffling Feather's hair. Feathers laughed before flopping over on the counter, knocking over a bag of flour on himself. Jeanist let himself in using the key he had, walking into the disaster scene. Feathers began to cry on the counter. "Are you trying to bake the baby?" Jeanist said entering the apartment, closing the door behind him. Mirko quickly wiped the flour off of Feathers face with a wet dish towel so it wouldn't get in his eyes and sting as Jeanist picked him up and placed him in the kitchen sink to wash him off. Feathers continued to cry before opening his eyes to see Jeanist and smiled, making a happy, baby-like screech. Jeanist smiled, patting Feathers on the head. Once he finished washing the flour off Feathers he took him into the nursery to get him changed. Mirko panicked quickly throwing together the ingredients for chocolate chip cookies in a small bowl she found in the cupboard. She used the little flour left in the bag and counter before mixing it all together and pouring the mix out on to a small flat pan and placing it in the oven. While the cookies baked Mirko got to work cleaning the counter, finishing by putting the dirty bowl she had used in the sink.
-Everyone I have been waiting so long to introduce this chapter since it's Feathers's man introduction, now I can post some character designs-
2 notes · View notes
ladysirenity23 · 2 years ago
Text
Before I start the Liyue arc in the Sagau Headcanons ..
Here are some stories I want to post in wattpad soo choose for me what to post first with these introductions :]] will do a poll after
"𝘔𝘰𝘮𝘮𝘺 𝘐𝘴𝘴𝘶𝘦𝘴"
You sought to kill your mother's former protégé as he returns as a teacher in your old high-school while being ignorant to the people who'll be affected by your petty revenge
X fem reader
Warnings:
child-neglect;
intense mentions of gore;
sadistic mannerisms;
toxic parent reader;
Very slow redemption ;
Love interests;
All Might
Endeavour
Mirko
Hawks
Fat Gum
Midnight
Eraserhead
Present Mic
Vlad king
Overhaul
Oc
+
"voleuse de coeurs"
Settling down in your homeland was never the life you'd wish to spend in, so you go to a trip all over Teyvat!Learning about the mysteries of each nations,Escaping each and every crime you have committed, And on the plus side!You may have stole a couple of hearts on the way, and an archon's gnoses while your at it!wait-
X fem reader
Love interests;
Aether
Lumine
Kaeya
Diluc
Childe
Rosaria
Eula
Amber
Kazuha
Ayaka
Thoma
Xiao
Razor
Bennett
Fischl
Sucrose
Albedo
Ayato
Chongyun
Ganyu
Scaramouche
Keqing
Xingqiu
Sara
Itto
Dainsleif
Oc
+
"Ang 𝕮𝖆𝖗𝖎ñ𝖔𝖘𝖆"
In which a once sealed goddess now roams the world of Teyvat with two companions who'm she tried to kill upon meeting them.
Xfemreader
Love interests ;
Venti
Zhongli
Baal
Dainsleif
Xiao
Ganyu
Kaeya
Aether
Lumine
Tartaglia
Scaramouche
Diluc
Baizhu
Oc
+
38 notes · View notes
unlikelyjedi · 2 years ago
Text
My Hero Academia Pride Headcanons (Pro Hero Edition)
If you'd like to see my headcanons for the 1-A kids plus Shinsou, it's linked here!!!
This is all about the Pro Heroes!!
Except, not all the Pro Heroes because that would be ridiculously long, and there are a lot of Pros I don't care about or just plain don't know what their sexuality would be.
You think I know the romantic trysts of a washing machine??
You think I know who Best Jeanist likes?? Mans too busy dying and resurrecting over and over to go on any dates.
So, this list will talk about only the Pros I'm interested in covering.
Disclaimer: These are my own thoughts and opinions. This is how I’m choosing to engage with this media in this post. There are other ways to engage with a chosen media and neither way of engagement invalidates the other. Art is subjective. Fandom is ultimately for fun! Don’t take me too seriously!
Now that the long introduction is out of the way:
Yagi Toshinori | All Might (he/him): Bisexual
Not only did All Might have an American Romance™️ with David Shield, not only did he have an on again/off again with Sir Nighteye, but I'm here to convince you that he is married to his Good Friend™️ Detective Tsukauchi Naomasa.
Go on, tell me I'm wrong. Why does this random police officer know the OFA secret when not even David Shield does, huh??? Tell me. You can't. They're married.
Oh and some people ship All Might with Midoriya's mom and that's valid, I guess. You do your thing, people!
I just don't think All Might is straight. I gave him the Bi-label arbitrarily. He could be pan or gay or something else feasibly. That's the fun thing about headcanons. you can make it the Fuck up!
(but I do like to be realistic within a certain parameter to how the character is written and I just think All Might and the Detective are sus)
Todoroki Enji | Endeavor (he/him): Straight
Some people write Endeavor as homophobic and while I think that's not an out of there interpretation, I think it's more nuanced.
I don't think he's homophobic in the outright hate sense. I think he's obsessed with power and lineage, and that manifests in his control of Shouto specifically.
So, if Fuyumi brought home a girlfriend or Natsuo brought home a boyfriend, it wouldn't be a big deal.
But Shouto bringing a boyfriend would be an issue because Endeavor wants the Ultimate Hero. To do what he couldn't do. That would be hard if his masterpiece brought a boy home.
This is beyond the scope of this discussion, but I do think Endeavor is allowed to atone for his mistakes and abuse. I don't believe he's afforded forgiveness from anyone he's hurt, and he absolutely deserves consequences, but Enji himself is allowed to do better and want to do better.
Where it's relevant to this discussion is I believe he'd drop that mindset against Shouto once he realized it was an overreach of his power and hurting his son. He'd eventually get to a point where he's not as controlling and not as much of an asshole.
Takami Keigo | Hawks (he/him): Bisexual
He's got the charisma! The flamboyance! The Trauma! He's Pro Hero Hawks! All the women and men swoon for him and he's got his pick of the market!
That is, if he actually had time. No, he's too busy making sure his city's safe. Too busy being held in a cage by a combination of Hero Commission conditioning and his own ideals.
Oh and let's not forget that he's spending all his off-hours at the behest of his side-job as a LOV spy.
It's a good thing one of their members is hot. Literally and figuratively. At least it gives him something to look at while he's trying to keep himself together, and oh no, he's caught feelings-
Usagiyama Rumi | Mirko (she/her): Sapphic
She's a lesbian. Most of the time. She likes girls. Loves them. Would date them.
So why is she sleeping with wanted criminal Shigaraki Tomura-
ijessbest on insta made me like this cursed ship so much!! To me, at least, it's a purely physical thing for both parties.
I think she's romantically and physically attracted to women, but also physically attracted to men?? That's my idea, anyway.
Aizawa Shouta | Eraserhead (he/him): Gay
Aizawa would be happy with just his cats and coffee, really, but he just had to like men. And specifically the loudest blond on the planet.
Aizawa absolutely had a crush on Oboro in high school. It's not clear what would've happened if he'd survived. Maybe he would have ended up with Oboro. Maybe they'd have split up. Maybe he was always meant to end up with Hizashi. Aizawa doesn't dwell on it. What happened happened and he wouldn't trade what he has now with Hizashi for anything.
Yamada Hizashi | Present Mic (he/him): Bisexual
On the opposite side of things, Hizashi laid eyes on the transfer student from the support course and was immediately smitten. He was not subtle about it either. He made Shouta a playlist, a playlist, and Shouta said "thanks, but I don't use spotfiy" and Hizashi cried for weeks. Nemuri won't let him live it down even to this day.
Years later, when Shouta asked him out, he nearly busted both their eardrums from the shock of it. He really thought all this time it had been unrequited, and he wasn't about to even attempt to bring up the idea of a relationship after Oboro.
Now they're happily married and Hizashi makes them do cute couple-y things together all the time. But not in front of the kids.
They're actually pretty subtle about it. Not even Midoriya picked up on it, and he picks up on everything. He only realized after Aizawa adopted Shinsou and noticed Hizashi also treating Shinsou like his own.
Kayama Nemuri | Midnight (she/her): Bisexual
Mostly men-attracted, but that doesn't make her any less bi. I also like the idea of her quirk working best on people who are attracted to her, not necessarily guys. I personally ship her with Ms. Joke actually.
Fukukado Emi | Ms. Joke (she/her): Lesbian
Emi (a lesbian) continually asks Aizawa (a gay man) to marry her because she finds it very funny! She likes annoying the shit out of Aizawa. But she likes women. When Emi asked Mic for his friend's number, Mic was about to have a fit before Emi explained it was Midnight's number she wanted. Mic quickly got with the program and (not-so-subtly) encouraged their relationship along.
Nishiya Shinji | Kamui Woods (he/him): Pansexual
As my Pan sister says: Everyone's eligible but none have applied. (or so he thinks)
I don't think he's had much luck with dating, despite being a pro. Is it because of his costume? Will he ever find love? He just hopes it's someone he likes and not someone egotistical like Mt. Lady. Could you imagine him dating her-
Tatsuma Ryuuko | Ryukyu (she/her): Lesbian
Lesbian Dragon Lady!!! Why, you may ask? Vibes. Would marry the Lesbian Dragon Lady.
Sakamata Kugo | Gang Orca (he/him): Aro/Ace
Despite his harsh exterior, he's really a people person, but not a relationship person. He'd love to make a queer-platonic connection and maybe raise a kid one day.
Takeyama Yu | Mt. Lady (she/her): Pansexual
Her ass is out for all genders. You know, like, "Nice to meet your assquaintance"?? No?? (I'm not tryna sexualize her I promise)
I think she's much more worried about ratings and numbers than a relationship. And if she did decide to start dating, she'd have her pick of the pool. She might like people regardless of gender, but she also has standards. She'd never settle for someone like Kamui Woods. Could you imagine her dating him-
Toyomitsu Taishiro | Fat Gum (he/him): Pansexual
He just loves who he loves. Comfortable in his sexuality and his body image. Truly someone to look up to.
Chatora Yawara | Tiger (he/him): Trans
Actually Canon!!! Tiger is a trans man!! It's so nice to see him open and visible! Even if it's just a small blurb. He's treated with respect by the narrative and I appreciate that!
That's it for the Pros!! Next time I'll tackle characters I've missed including some from the LOV and some students in other classes!
Until Next Time!!!
30 notes · View notes
dekusheroacademia · 3 years ago
Text
How I would have fixed the pace of BNHA
Okay, one of my main criticism for BNHA is the way the pace seems all over the place, with characters being sidelined and then thrown into the focus and then sidelined again, with the story now rushing to the ending. So. This is how I would have fixed, and distributed the current plot over two years of school time, with the final war happening at the beginning of the third year, just in time for the students to go back to normalcy.
YEAR ONE
Year one´s pace is pretty much a normal pace for a school story. Everything can easily be kept the same, with the end of year 1 being the Overhaul arc.
The only thing I would add is a smaller arc focusing on the hero world and some more classes. In particular a class about heroes in the rest of the world so we can learn how big of a deal Star and Stripes is, plus a small villain arc about the discrimination of heteromorphic quirks, maybe focusing on Shoji. This can be added as a build up to the Stain arc, with villains feeling emboldened by Stains´ presence and a fringe of them can be anti-heteromorphic quirks, or can be anti quirks in general.
Pepper in some information about how companies work with agencies, and throw in Redestro´s company - and everything would work the same.
Year one would end up with the Overhaul arc (post license) and Deku vs Kacchan 2 (plus Brava and Gentle criminal), so year two would start with their new status quo.
YEAR TWO
Because of the licensing exam and the way kids have been thrown into actual villains like Overhaul, the focus can be given to the Hero Public Safety Commission, with the introduction of Hawks and Mirko, plus the whole Endeavor as first one.
Something about the impossibility of organizing a tournament because of safety reasons, and class 2a vs class 2b as an alternative, plus Shinsou exams.
At the same time, Redestro can keep working as a villain, and Hawks can do his think as a spy. This would allow us to know more about the Safety Hero Commission, and give a spot to Mirko. Maybe they want to hire her too but she is distrustful. This would introduce us to heroes who went rogue against the commission (ex. Nagant) and would explain why Mirko doesn´t want to have interns.
The rest of the second year can be divided in more classes and subplots/arcs can be added. For example, a "festival like" subplot where we see Mirio adapting to being quirkless (we could have some insight on Deku and Aoyama here) and Eri coming to the realization that she doesn´t want to be scared anymore but actually wants to help him recovers his quirk. This could happen in parallel with the internship, so we could have the whole Endeavor dinner and internship (and Hawk´s investigation), Aizawa visiting the prison and thinking about his past while training Shinso and Eri, actual... other internships.
They could face some smugglers (working for Redestro) as a major villain arc (with unlocking another quirk plus actual focus on training Catch a Kacchan), and the second year would end up with the League conquering the Liberation front and the start of the war.
This means that at the end of the second year would end up with the start of the war, Bakugou´s sacrificing himself and Deku leaving. Because Eri and Mirio were focused on, his sudden quirk recovery would also not be out of nowhere.
YEAR THREE
Deku vigilante can be post year two, basically during the break. So we could see the students actually reacting to his letter, the parents moving in and more about what the new world is.
Maybe even the fall of the Commission, with their dirty secrets coming out as well thanks to prisoners escaping. This would make Nagant a major plotline instead of a one chapter fridged woman. Mirko could even be thrown in here and we could see her starting to want to work with the kids and being vendicated for her distrust of the commission.
The rest could literally be the same, with Deku coming back. Plus a moment of down before the war. For example, something about Mirio, Nejire and Tamaki reassuring Eri etc etc. During this moment of down time, where everyone is resting and recuperating then we can start to see that Aoyama does not seem that happy and the sudden revelation would not seem out of nowhere.
I think I am okay with no more down time from here on, as it looks like they all have a plans for the villains, but because of the surprise elements they are better to be revealed by flashbacks as Hori probably will.
The end of the war, whatever it is, could mean the beginning of normalcy again and their third year of school.
As year one is particularly dense, it would be also easy to move the license exam and Overhaul directly in year two.
12 notes · View notes
iidascalves · 4 years ago
Text
First patrol (Hawks x reader)
So I got a little carried away writing the beginning of this one, but I just REALLY love Mirko. I wasn’t sure what to use as the reader’s quirk so I just did the ability to create telekinetic force fields with energy in different shapes and shit. Also, (h/n) will mean your hero name. Once I finished I writing this I decided it was a little long so I split it into two parts. I guess this first part can be considered a various x reader lol. I’ll post part 2 soon! I’m having a lot of fun with these so please don’t be shy to send requests or asks! Thanks :)
----------------------------------------------------------------------------------------
“THAT’S NOT FUCKING FAIR” Bakugou screeched in the common room.
“I literally don’t know what you want me to say.” You stared blankly at Bakugou as he was practically foaming at the month. His hands began to emit smoke.
“Kacchan, calm down! (Y/n), I’m happy you got such a great opportunity!” Izuku tries to congratulate you while holding a death grip on Bakugou’s arm. “You and Mirko will make a great duo!”. You smiled at his reassurance and braced yourself for his detailed mutterings about the specifics of both your and your future mentor’s quirks.
“Thank you. I’m excited but nervous.” You shifted in your seat while your hands were in tight fists. “I’m excited to prove myself.”
“I’ve met Mirko before. She’ll enjoy working with you, I’m sure of it.” Todoroki spoke for the first time all evening from the dinner table as he slurped cold soba. You honestly had forgotten he was there.
“Oh yeah! Your father and Mirko team up sometimes, right?” Izuku mentioned as he turned on the couch to face Todoroki.
“Yes.” Todoroki took a slurp of soba before continuing. “If you run into him, be wary. He’s more concerned about his reputation than a rookie looking for guidance or protection. That’s why Hawks does his own thing most of the time. My dad can’t be bothered with anyone else.”
“I’m sure (y/n) will be in good hands with Mirko.” Izuku tried to ease the tension in the room. As Todoroki is a man of few words, it’s rare for him to share things like this. You decided you should head to bed to prepare for your long day tomorrow.
“Alright guys. Thanks for chatting with me. I’m off to bed.” After replies of good nights and wishes of luck, you tried sleep off the anxiety until tomorrow.
_____________________________________________________________
“Ready to rumble, (y/n)?!” Mirko enthusiastically greeted you when you entered her office.
“Yes Ma’am! Thank you for letting me join you today!” You bowed to Mirko and straightened up as you heard her walking toward you.
“No need to be so formal!” Mirko gave you a big slap on the back as she passed you. With your back aching and stinging, you closely followed her to the elevator. “I don’t take just anyone out to patrol with me, (Y/n). You got something special, kid.” She gave you a large smile as the elevator door closed. You were thrilled to finally start your internship, with your idol none the less.
“Thank you, Rumi. It means a lot coming from you.” You tried to calm the reddening of your face as you two descend to the lobby of her agency.
“Don’t sweat it! And remember that on the street I’m Mirko. Right, (h/n)?” Mirko smiled at you as the elevator rang.
_____________________________________________________
After a few hours of patrol you and Mirko still hadn’t had any calls or serious confrontations. Although popperazzi and other media outlets seemed to follow you both everywhere, they were only taking pictures from a distance as not to interfere. “Sorry that this is such a quiet day. I wanted to see you in action!” Mirko began chatting with you and you two walked.
“No, it’s alright. Something is bound to come up anyway, right?” You smiled and continued to survey your surroundings. A teenage boy ran up.
“You’re Mirko, right?” His face was a deep red.
“The one and only! Want a picture or something?” Mirko smiled at the boy. His head whipped around before his eyes frantically landed on you.
“Hi. Can you take our picture, please?” You held up the fan’s phone to take a picture with Mirko.
“1,2,3, smile!” You continued taking a few pictures until Mirko put her hand up to her ear intercom. You handed the phone back to the guy and awaited news. Mirko nodded at you after coming off the intercom.
“Let’s go. No time to waste.” Mirko turned serious as she dashed off to the lower part of town. You used your quirk to manifest a board to ride on in order to keep up.
As unfamiliar buildings flew past, you couldn’t recall seeing the surrounding landmarks on the sheet of information Mirko gave you about your sector.
“Mirko, are we close?” You grew anxious and unsure as you approached the scene.
“Yeah,” Mirko grinned as she gained momentum by swinging off a lamppost. “Stay sharp. This is uncharted territory for you.” You nodded and picked up speed, feeling the wind press against you.
Finally, you saw the scene you were summoned to. A monstrous villain was holding a car with a family trapped inside above his head. You didn’t recognize the villain, he was most likely an angry civilian that snapped. The villain was towering about thirty feet above you. He was angrily screaming, the veins on his neck and arms were bulging and strained. It was obvious this guy never used his quirk like this before.
“You think he used an enhancer?” You kept your eyes glued to the car the villain gripped.
“Probably.” Mirko’s smirk wavered and her brows furrowed. “Bunch of bastards have been juicing up and wrecking shit recently.” The villain began to shake the car and screech in anger.
“I’ll get the car, you get the guy?” You asked Mirko as your eyes focused in on the car and you activated your quirk.
“Read my mind. Just give me a boost.” Mirko smirked and slid a foot back in preparation to jump. “Let’s go.” Mirko lept sideways causing the villain to whip his head in her direction. You raised your left hand and manifested a platform under the car. Your right arm shot out as you made a small platform about seven feet in the air for Mirko to vault off of. Your eyes remained on the car as you heard Mirko’s feet pound on the platform and you saw a swift white streak knock the villain from under the vehicle. While Mirko repeatedly kicked the villain into submission, you lowered the car with the clamoring family to the ground. You ran to the car and escorted each member to the side where a small crown was gathered. You turned to see Mirko with the villain in a suffocating leg triangle. The villain’s screeching quieted and his body began to lose muscle and shrink.
“Mirko, should we take him in for questioning?” You pulled handcuffs out of your pocket and placed them into Mirko’s outstretched hand.
A gust of wind passed behind you making the hairs on the back of your neck stand up. You also felt an intense warmth behind you. “We can take him off your hands. You’re in our jurisdiction after all.” You spun around to see Endeavor and Hawks. Your hands clenched and your chest tightened at the sight of the two top heroes.
“Number 1 and 2, always a pleasure.” Mirko hauled the villain to his feet. “So what if we’re in your jurisdiction? You guys didn’t get here fast enough. That’s why we were called.” Mirko smirked.
“Mirko,” Endeavor began to speak. ” we were being briefed on an important future mission. Our delay was expected so they called you and uhh.. Shouto’s classmate.”
“’Shouto’s classmate’ is not the name of my intern, Endeavor.” Mirko put a hand on her hip and raised a brow at the number one hero. Her ears perked up at the arrival of an idea. “How about this: we walk this jerk to the precinct and do introductions over some lunch?”.
“As long as the place has chicken.” Hawks smiled at Mirko. You wanted to admire his handsome features, but decided against it out of fear of embarrassment if he caught you. “Endeavor treats since he was the reason we’re late!”. Endeavor crossed his arms and sighed. He then began walking in the direction of the precinct. Mirko and Hawks shared a laugh and Mirko began hauling the villain behind Endeavor. You paused before following. Your eyes were still trained on Endeavor. You wondered if he would have cooperated at all if you fought with him instead of Mirko. Hell, he didn’t even bother to learn your name after being friends with Shouto for the past year.
“So what’s your deal, kid?”. Hawks was suddenly walking by your side. You tensed at his sudden presence and looked ahead towards Mirko.
“My deal?” You glanced at him to see if his eyes were still on you, eyes briefly meeting before your head turned.
“Yeah. Does Endeavor spook you or something?”
“No.” You could feel your face getting warm. “He’s just intimidating, I guess. And hearing what Shouto has to say about him doesn’t really help.” You didn’t like being questioned like this.
“I get that. He’s a shitty dad.” Hawks stretched as you two walked. “He’s also a pretty difficult guy to get to know. He’s starting to change for the better though. But his social skills are still shit.” Hawks looked over at you to make sure his remark made you smile. He knew if he kept talking you’d loosen up and get more comfortable. “How’s your first patrol going?”
You glanced at him and smiled. “I can’t complain about lunch with the top two heroes.” Hawks laughed.
“Yeah, I guess. I’d say you’re doing pretty well for your first time. Mirko doesn’t team up with just anyone, you know.” Your face got even warmer as you became flustered once again.
“I’m mainly only good for defense and rescue.” You looked away from Hawks and started to fidget with your hands.
“Don’t be modest, kid. I saw you rescue that family back there.” Your face was own fire upon hearing his praise. “Also saw you kick ass at the sports festival. If it were up to me, I’d have you do more offense training.”
“T-thanks.” You said shyly as you scanned around you for something to look at to distract you from your own embarrassment. Things remained pretty quiet as you continued to walk to the precinct.
402 notes · View notes
diopup · 3 years ago
Text
【Le Introduction】
Hello all! °˖✧◝(⁰▿⁰)◜✧˖° I used tumblr back in the day but forgot about it for years and remembered it existed again. Slowly getting back in the swing of things on top of learning all the new updates that have been added on! Here are some fun facts as you explore this dumpster fire that is my tumblr page:
mostly based around sharing/posting content related to JJBA
but I share/post other content such as Inuyasha, MHA, HxH, and others from time to time
totally a dog irl don't worry about it
fav character(s) of all time 5ever: dio (jjba), doppio/diavolo (jjba), jotaro (jjba), mikitaka (jjba), sesshomaru (inuyasha), koga (inuyasha), sango (inuyasha), hisoka (HxH), chrollo (HxH), mirko (mha), jin (yyh), koto (yyh), rengoku (demon slayer), and melon (beastars)
if you happen to not like any of the characters I listed above for what I post that is totally okay but please be respectful
here mostly to have fun and share common interests w/ the wonderful folks on this website
my fav food of all time are peanut butter & raspberry jelly sammiches
Thank you for reading! Please enjoy (◡ᴥ◡) ♡
9 notes · View notes
daredgeek · 4 years ago
Text
BNHA Hamilton!AU(?) part 1
However! I will flesh out the idea around it since I have a few ideas and maybe someone actually wants to write this ^^
Genre-wise I think it would be angst (obviously) and fluff in between, but also drama, ‘cus yeah, Hawks going to the LOV
Since the reader is Mirko’s sibling then, I guess they would have a similar quirk to hers, rabbit related I mean. Or maybe if the reader is an adoptive sibling or something they could have something completely different. 
The chapters would probably base loosely on separate songs, but not all of them, since idk what you could write to some of them
Alexander Hamilton/My Shot:
Basically a recap on Hawks’ background story. How his childhood went and his way to become a hero. The chapter would end with him being a well-liked hero, maybe before he became number 2. Also including the plan of him becoming a spy in the LOV and him accepting the mission.
The Schuyler Sisters:
Introduction to Mirko and the reader and their bond between siblings. Maybe they hang out together,  maybe Mirko has a day off or something and they have a nice day in the city, going shopping, going to an arcade, or some other nonsense. I imagine the reader being a bit introverted or at least less extroverted then Mirko.
Right Hand Man:
Going a little into detail about the relationship between Hawks and Endeavor. Maybe the events of chapter 186 to 192.
A Winter’s Ball/Helpless:
Possibly the announcment of the Japanese Hero Billboard Charts, where the reader is being dragged to by Mirko (not really, they also want to support their sister, but nobody needs to know that).
Aaron Burr, Sir:
Hawks meeting up with Dabi to get into the LOV as a spy and describing the things he has to do to be seen worthy of the league. 
The Story of Tonight (+ Reprise):
Hawks being accepted in the inner circle of the LOV andgetting to know them a little better. Dabi(maybe?) & Hawks talk alone at some point where Dabi(again, maybe?) tells Hawks that he knows about the reader and that, if Hawks was to ever betray them he will make them (the reader) hurt. He (Dabi(?)) will search for them, will find them and will kill them. That’s where the angst comes in if that wasn’t clear ^^’‘
- - - - - - - - - - - - - - - - - - - - - - - - -
More ideas coming later (and this post probably being edited too)
Also, yes the songs are supposed to be out of order
38 notes · View notes
laceymorganwrites · 5 years ago
Text
Boku No Hero Academia characters on TikTok
Class 1-A:
Izuku: wholesome pep talks
Kirishima: memes, loves the pov Videos, he´s Always more confident after watching them
Bakugo: refuses to partake in any trend and leaves mean comments on any trend videos
Jiro: Posts Videos of her music
Denki: memes with all of his Friends, leaves nice comments on Jiro´s videos
Momo: all the DIY stuff
Todoroki: doesn´t have the app, but Midorya Always makes him watch tiktoks of the heroes and their friends
Aoyama: the classic lip sync, but he´s surprisingly good at it, will also take tiktoks of others if they ask him, he Always gets the best angles
Fumikage: takes Videos of him sitting on a Chair with death metal blasting in the background
Tsuyu: loves playing with the cute filters
Mineta: is blocked from the fucking app
Shoji: Tsuyu teaches him About filters and he loves the triple screen
Ochaco: does collabs with her friends
Iida: will spend Hours on the app to report People who post ´inappropriate´ videos
Mina: meme Squad with Denki and Kiri, she Always Comes up with the funniest ideas
Sero: duet king
Ojiro: does Videos of him Holding Things with his tail
Hagakure: Fashion account
Sato: Food videos
Koda: animal videos
Other students:
Shinsou: refuses to duet with Denki, but secretly watches all of his videos
Monoma: leaves hate comments on class 1-A´s Videos and starts a fight with Bakugo and Kendo in the comments every time
Tetsutetsu: Sport challenges, like the 100 push ups etc.
Kendo: is only on the app to Keep Monoma in check
Ibara: her praying with a filter and Choir Music playing in the background
Mirio: random Videos of Tamaki and Nejire, random Videos of Sir Nighteye, does challenges and duets with his Friends and Bubble Girl, does Sports challenges
Tamaki: only there for support, doesn´t take Videos himself, but likes watching other videos
Nejire: Fashion account, leaves positive comments everywhere
Mei: promoting her babies
Teachers:
All Might: doesn´t get the app at first, after an introduction from Mic he does pep talks
Aizawa: doesn´t have the app but makes involuntary guest appearances on Shinsou´s, Mic´s and class a´s videos
Mic: YELLING
Midnight: 18+ Stickers everywhere, and all she does is hold her Whip and wink at you (Denki once accidentally liked one of her Videos)
Nezu: Shares his philosophic thoughts
Vlad: takes videos of his class like a proud dad
Heroes:
Endeavour: doesn´t have the app, but gets memed a lot by Hawks
Hawks: memes Endeavor a lot and occasionally makes slowmotion Videos of his quirk
Ms Joke: tells jokes obv
Best Jeanist: Fashion account obv (yes, he does collabs and duets with Nejire and Hagakure)
Mirko: sports challenges and feminism
Mount Lady: sexy sexy Videos, gimme likes
Fatgum: comments on Kiri´s and Mirio´s (only the ones with Tamaki in it) Videos like a proud dad. Kirishima sometimes does funny and wholesome Videos with him and Tamaki at the internship
Villains:
Shigaraki: dissolves Things and People comment: ´satisfying´
Dabi: only on there for the cosplayers and dms
Toga: is one of the cosplayers, fake blood warning but it´s not fake
Magne: Fashion account, duets with Toga and Twice
Spinner: Stain cosplayer
Twice: does all Kinds of stuff, the Trends, fucks around with filters etc.
Giran: promotes his weapons etc.
Mr Compress: Posts random Videos of the league
Overhaul: roasts people´s dirty rooms
Hi, welcome to the end of this post, here´s my tiktok: @lmorgan_cosplay, so if you like cringe, cursed Videos and bad Cosplay, ya know the drill <3
94 notes · View notes
oupacademic · 6 years ago
Photo
Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media
Every year the third Thursday in November marks World Philosophy Day, UNESCO's collaborative 'initiative towards building inclusive societies, tolerance and peace’. To celebrate, we’ve curated a reading list of books and online resources on social and political philosophy, ranging from authority, democracy to human rights, as well as historical texts by philosophers who shaped the modern world. Browse the entries and start reading today.
Celebrate philosophy and explore our collection for more blog posts, articles and reading suggestions. Follow us @OUPPhilosophy on Twitter.
Knowledge and Truth in Plato: Stepping Past the Shadow of Socrates, by Catherine Rowett
Philosophy, Rhetoric, and Thomas Hobbes, by Timothy Raylor
John Locke: Literary and Historical Writings, by J.R. Milton,
Adam Smith: A Very Short Introduction, by Christopher J. Berry
Thomas Paine: Britain, America, and France in the Age of Enlightenment and Revolution, by J. C. D. Clark
The Social and Political Philosophy of Mary Wollstonecraft, by Sandrine Berges and Alan M. S. J. Coffee
Differences: Rereading Beauvoir and Irigaray, edited by Emily Anne Parker and Anne van Leeuwen
Thinking the Impossible: French Philosophy Since 1960, by Gary Gutting
 ‘Confucian Political Philosophy’ by George Klosko in The Oxford Handbook of the History of Political Philosophy
 Adam Smith’s Libertarian Paternalism’, by James R. Otteson in The Oxford Handbook of Freedom
‘Sophists, Epicureans, and Stoics’, by Mirko Canevaro and Benjamin Gray, in The Hellenistic Reception of Classical Athenian Democracy and Political Thought  from The Oxford Scholarship Online
‘In Defense of Uncivil Disobedience’ by Candice Delmas, in A Duty to Resist: When Disobedience Should Be Uncivil from The Oxford Scholarship Online
‘A chastened individualism? Existentialism and social thought’ in  Existentialism: A Very Short Introduction: VSI by Thomas Flynn from  Very Short Introductions
‘Looking At Rights’, in Human Rights: A VSI by Andrew Clapham from  Very Short Introductions
‘Why do we need political philosophy?’, by David Miller in Political Philosophy: A Very Short Introduction from  Very Short Introductions
49 notes · View notes
srasamua · 6 years ago
Text
Using Python to recover SEO site traffic (Part three)
When you incorporate machine learning techniques to speed up SEO recovery, the results can be amazing.
This is the third and last installment from our series on using Python to speed SEO traffic recovery. In part one, I explained how our unique approach, that we call “winners vs losers” helps us quickly narrow down the pages losing traffic to find the main reason for the drop. In part two, we improved on our initial approach to manually group pages using regular expressions, which is very useful when you have sites with thousands or millions of pages, which is typically the case with ecommerce sites. In part three, we will learn something really exciting. We will learn to automatically group pages using machine learning.
As mentioned before, you can find the code used in part one, two and three in this Google Colab notebook.
Let’s get started.
URL matching vs content matching
When we grouped pages manually in part two, we benefited from the fact the URLs groups had clear patterns (collections, products, and the others) but it is often the case where there are no patterns in the URL. For example, Yahoo Stores’ sites use a flat URL structure with no directory paths. Our manual approach wouldn’t work in this case.
Fortunately, it is possible to group pages by their contents because most page templates have different content structures. They serve different user needs, so that needs to be the case.
How can we organize pages by their content? We can use DOM element selectors for this. We will specifically use XPaths.
For example, I can use the presence of a big product image to know the page is a product detail page. I can grab the product image address in the document (its XPath) by right-clicking on it in Chrome and choosing “Inspect,” then right-clicking to copy the XPath.
We can identify other page groups by finding page elements that are unique to them. However, note that while this would allow us to group Yahoo Store-type sites, it would still be a manual process to create the groups.
A scientist’s bottom-up approach
In order to group pages automatically, we need to use a statistical approach. In other words, we need to find patterns in the data that we can use to cluster similar pages together because they share similar statistics. This is a perfect problem for machine learning algorithms.
BloomReach, a digital experience platform vendor, shared their machine learning solution to this problem. To summarize it, they first manually selected cleaned features from the HTML tags like class IDs, CSS style sheet names, and the others. Then, they automatically grouped pages based on the presence and variability of these features. In their tests, they achieved around 90% accuracy, which is pretty good.
When you give problems like this to scientists and engineers with no domain expertise, they will generally come up with complicated, bottom-up solutions. The scientist will say, “Here is the data I have, let me try different computer science ideas I know until I find a good solution.”
One of the reasons I advocate practitioners learn programming is that you can start solving problems using your domain expertise and find shortcuts like the one I will share next.
Hamlet’s observation and a simpler solution
For most ecommerce sites, most page templates include images (and input elements), and those generally change in quantity and size.
I decided to test the quantity and size of images, and the number of input elements as my features set. We were able to achieve 97.5% accuracy in our tests. This is a much simpler and effective approach for this specific problem. All of this is possible because I didn’t start with the data I could access, but with a simpler domain-level observation.
I am not trying to say my approach is superior, as they have tested theirs in millions of pages and I’ve only tested this on a few thousand. My point is that as a practitioner you should learn this stuff so you can contribute your own expertise and creativity.
Now let’s get to the fun part and get to code some machine learning code in Python!
Collecting training data
We need training data to build a model. This training data needs to come pre-labeled with “correct” answers so that the model can learn from the correct answers and make its own predictions on unseen data.
In our case, as discussed above, we’ll use our intuition that most product pages have one or more large images on the page, and most category type pages have many smaller images on the page.
What’s more, product pages typically have more form elements than category pages (for filling in quantity, color, and more).
Unfortunately, crawling a web page for this data requires knowledge of web browser automation, and image manipulation, which are outside the scope of this post. Feel free to study this GitHub gist we put together to learn more.
Here we load the raw data already collected.
Feature engineering
Each row of the form_counts data frame above corresponds to a single URL and provides a count of both form elements, and input elements contained on that page.
Meanwhile, in the img_counts data frame, each row corresponds to a single image from a particular page. Each image has an associated file size, height, and width. Pages are more than likely to have multiple images on each page, and so there are many rows corresponding to each URL.
It is often the case that HTML documents don’t include explicit image dimensions. We are using a little trick to compensate for this. We are capturing the size of the image files, which would be proportional to the multiplication of the width and the length of the images.
We want our image counts and image file sizes to be treated as categorical features, not numerical ones. When a numerical feature, say new visitors, increases it generally implies improvement, but we don’t want bigger images to imply improvement. A common technique to do this is called one-hot encoding.
Most site pages can have an arbitrary number of images. We are going to further process our dataset by bucketing images into 50 groups. This technique is called “binning”.
Here is what our processed data set looks like.
Adding ground truth labels
As we already have correct labels from our manual regex approach, we can use them to create the correct labels to feed the model.
We also need to split our dataset randomly into a training set and a test set. This allows us to train the machine learning model on one set of data, and test it on another set that it’s never seen before. We do this to prevent our model from simply “memorizing” the training data and doing terribly on new, unseen data. You can check it out at the link given below:
Model training and grid search
Finally, the good stuff!
All the steps above, the data collection and preparation, are generally the hardest part to code. The machine learning code is generally quite simple.
We’re using the well-known Scikitlearn python library to train a number of popular models using a bunch of standard hyperparameters (settings for fine-tuning a model). Scikitlearn will run through all of them to find the best one, we simply need to feed in the X variables (our feature engineering parameters above) and the Y variables (the correct labels) to each model, and perform the .fit() function and voila!
Evaluating performance
After running the grid search, we find our winning model to be the Linear SVM (0.974) and Logistic regression (0.968) coming at a close second. Even with such high accuracy, a machine learning model will make mistakes. If it doesn’t make any mistakes, then there is definitely something wrong with the code.
In order to understand where the model performs best and worst, we will use another useful machine learning tool, the confusion matrix.
When looking at a confusion matrix, focus on the diagonal squares. The counts there are correct predictions and the counts outside are failures. In the confusion matrix above we can quickly see that the model does really well-labeling products, but terribly labeling pages that are not product or categories. Intuitively, we can assume that such pages would not have consistent image usage.
Here is the code to put together the confusion matrix:
Finally, here is the code to plot the model evaluation:
Resources to learn more
You might be thinking that this is a lot of work to just tell page groups, and you are right!
Mirko Obkircher commented in my article for part two that there is a much simpler approach, which is to have your client set up a Google Analytics data layer with the page group type. Very smart recommendation, Mirko!
I am using this example for illustration purposes. What if the issue requires a deeper exploratory investigation? If you already started the analysis using Python, your creativity and knowledge are the only limits.
If you want to jump onto the machine learning bandwagon, here are some resources I recommend to learn more:
Attend a Pydata event I got motivated to learn data science after attending the event they host in New York.
Hands-On Introduction To Scikit-learn (sklearn)
Scikit Learn Cheat Sheet
Efficiently Searching Optimal Tuning Parameters
If you are starting from scratch and want to learn fast, I’ve heard good things about Data Camp.
Got any tips or queries? Share it in the comments.
Hamlet Batista is the CEO and founder of RankSense, an agile SEO platform for online retailers and manufacturers. He can be found on Twitter @hamletbatista.
The post Using Python to recover SEO site traffic (Part three) appeared first on Search Engine Watch.
from Digtal Marketing News https://searchenginewatch.com/2019/04/17/using-python-to-recover-seo-site-traffic-part-three/
2 notes · View notes
alanajacksontx · 6 years ago
Text
Using Python to recover SEO site traffic (Part three)
When you incorporate machine learning techniques to speed up SEO recovery, the results can be amazing.
This is the third and last installment from our series on using Python to speed SEO traffic recovery. In part one, I explained how our unique approach, that we call “winners vs losers” helps us quickly narrow down the pages losing traffic to find the main reason for the drop. In part two, we improved on our initial approach to manually group pages using regular expressions, which is very useful when you have sites with thousands or millions of pages, which is typically the case with ecommerce sites. In part three, we will learn something really exciting. We will learn to automatically group pages using machine learning.
As mentioned before, you can find the code used in part one, two and three in this Google Colab notebook.
Let’s get started.
URL matching vs content matching
When we grouped pages manually in part two, we benefited from the fact the URLs groups had clear patterns (collections, products, and the others) but it is often the case where there are no patterns in the URL. For example, Yahoo Stores’ sites use a flat URL structure with no directory paths. Our manual approach wouldn’t work in this case.
Fortunately, it is possible to group pages by their contents because most page templates have different content structures. They serve different user needs, so that needs to be the case.
How can we organize pages by their content? We can use DOM element selectors for this. We will specifically use XPaths.
For example, I can use the presence of a big product image to know the page is a product detail page. I can grab the product image address in the document (its XPath) by right-clicking on it in Chrome and choosing “Inspect,” then right-clicking to copy the XPath.
We can identify other page groups by finding page elements that are unique to them. However, note that while this would allow us to group Yahoo Store-type sites, it would still be a manual process to create the groups.
A scientist’s bottom-up approach
In order to group pages automatically, we need to use a statistical approach. In other words, we need to find patterns in the data that we can use to cluster similar pages together because they share similar statistics. This is a perfect problem for machine learning algorithms.
BloomReach, a digital experience platform vendor, shared their machine learning solution to this problem. To summarize it, they first manually selected cleaned features from the HTML tags like class IDs, CSS style sheet names, and the others. Then, they automatically grouped pages based on the presence and variability of these features. In their tests, they achieved around 90% accuracy, which is pretty good.
When you give problems like this to scientists and engineers with no domain expertise, they will generally come up with complicated, bottom-up solutions. The scientist will say, “Here is the data I have, let me try different computer science ideas I know until I find a good solution.”
One of the reasons I advocate practitioners learn programming is that you can start solving problems using your domain expertise and find shortcuts like the one I will share next.
Hamlet’s observation and a simpler solution
For most ecommerce sites, most page templates include images (and input elements), and those generally change in quantity and size.
I decided to test the quantity and size of images, and the number of input elements as my features set. We were able to achieve 97.5% accuracy in our tests. This is a much simpler and effective approach for this specific problem. All of this is possible because I didn’t start with the data I could access, but with a simpler domain-level observation.
I am not trying to say my approach is superior, as they have tested theirs in millions of pages and I’ve only tested this on a few thousand. My point is that as a practitioner you should learn this stuff so you can contribute your own expertise and creativity.
Now let’s get to the fun part and get to code some machine learning code in Python!
Collecting training data
We need training data to build a model. This training data needs to come pre-labeled with “correct” answers so that the model can learn from the correct answers and make its own predictions on unseen data.
In our case, as discussed above, we’ll use our intuition that most product pages have one or more large images on the page, and most category type pages have many smaller images on the page.
What’s more, product pages typically have more form elements than category pages (for filling in quantity, color, and more).
Unfortunately, crawling a web page for this data requires knowledge of web browser automation, and image manipulation, which are outside the scope of this post. Feel free to study this GitHub gist we put together to learn more.
Here we load the raw data already collected.
Feature engineering
Each row of the form_counts data frame above corresponds to a single URL and provides a count of both form elements, and input elements contained on that page.
Meanwhile, in the img_counts data frame, each row corresponds to a single image from a particular page. Each image has an associated file size, height, and width. Pages are more than likely to have multiple images on each page, and so there are many rows corresponding to each URL.
It is often the case that HTML documents don’t include explicit image dimensions. We are using a little trick to compensate for this. We are capturing the size of the image files, which would be proportional to the multiplication of the width and the length of the images.
We want our image counts and image file sizes to be treated as categorical features, not numerical ones. When a numerical feature, say new visitors, increases it generally implies improvement, but we don’t want bigger images to imply improvement. A common technique to do this is called one-hot encoding.
Most site pages can have an arbitrary number of images. We are going to further process our dataset by bucketing images into 50 groups. This technique is called “binning”.
Here is what our processed data set looks like.
Adding ground truth labels
As we already have correct labels from our manual regex approach, we can use them to create the correct labels to feed the model.
We also need to split our dataset randomly into a training set and a test set. This allows us to train the machine learning model on one set of data, and test it on another set that it’s never seen before. We do this to prevent our model from simply “memorizing” the training data and doing terribly on new, unseen data. You can check it out at the link given below:
Model training and grid search
Finally, the good stuff!
All the steps above, the data collection and preparation, are generally the hardest part to code. The machine learning code is generally quite simple.
We’re using the well-known Scikitlearn python library to train a number of popular models using a bunch of standard hyperparameters (settings for fine-tuning a model). Scikitlearn will run through all of them to find the best one, we simply need to feed in the X variables (our feature engineering parameters above) and the Y variables (the correct labels) to each model, and perform the .fit() function and voila!
Evaluating performance
After running the grid search, we find our winning model to be the Linear SVM (0.974) and Logistic regression (0.968) coming at a close second. Even with such high accuracy, a machine learning model will make mistakes. If it doesn’t make any mistakes, then there is definitely something wrong with the code.
In order to understand where the model performs best and worst, we will use another useful machine learning tool, the confusion matrix.
When looking at a confusion matrix, focus on the diagonal squares. The counts there are correct predictions and the counts outside are failures. In the confusion matrix above we can quickly see that the model does really well-labeling products, but terribly labeling pages that are not product or categories. Intuitively, we can assume that such pages would not have consistent image usage.
Here is the code to put together the confusion matrix:
Finally, here is the code to plot the model evaluation:
Resources to learn more
You might be thinking that this is a lot of work to just tell page groups, and you are right!
Mirko Obkircher commented in my article for part two that there is a much simpler approach, which is to have your client set up a Google Analytics data layer with the page group type. Very smart recommendation, Mirko!
I am using this example for illustration purposes. What if the issue requires a deeper exploratory investigation? If you already started the analysis using Python, your creativity and knowledge are the only limits.
If you want to jump onto the machine learning bandwagon, here are some resources I recommend to learn more:
Attend a Pydata event I got motivated to learn data science after attending the event they host in New York.
Hands-On Introduction To Scikit-learn (sklearn)
Scikit Learn Cheat Sheet
Efficiently Searching Optimal Tuning Parameters
If you are starting from scratch and want to learn fast, I’ve heard good things about Data Camp.
Got any tips or queries? Share it in the comments.
Hamlet Batista is the CEO and founder of RankSense, an agile SEO platform for online retailers and manufacturers. He can be found on Twitter @hamletbatista.
The post Using Python to recover SEO site traffic (Part three) appeared first on Search Engine Watch.
from IM Tips And Tricks https://searchenginewatch.com/2019/04/17/using-python-to-recover-seo-site-traffic-part-three/ from Rising Phoenix SEO https://risingphxseo.tumblr.com/post/184297809275
0 notes
kellykperez · 6 years ago
Text
Using Python to recover SEO site traffic (Part three)
When you incorporate machine learning techniques to speed up SEO recovery, the results can be amazing.
This is the third and last installment from our series on using Python to speed SEO traffic recovery. In part one, I explained how our unique approach, that we call “winners vs losers” helps us quickly narrow down the pages losing traffic to find the main reason for the drop. In part two, we improved on our initial approach to manually group pages using regular expressions, which is very useful when you have sites with thousands or millions of pages, which is typically the case with ecommerce sites. In part three, we will learn something really exciting. We will learn to automatically group pages using machine learning.
As mentioned before, you can find the code used in part one, two and three in this Google Colab notebook.
Let’s get started.
URL matching vs content matching
When we grouped pages manually in part two, we benefited from the fact the URLs groups had clear patterns (collections, products, and the others) but it is often the case where there are no patterns in the URL. For example, Yahoo Stores’ sites use a flat URL structure with no directory paths. Our manual approach wouldn’t work in this case.
Fortunately, it is possible to group pages by their contents because most page templates have different content structures. They serve different user needs, so that needs to be the case.
How can we organize pages by their content? We can use DOM element selectors for this. We will specifically use XPaths.
For example, I can use the presence of a big product image to know the page is a product detail page. I can grab the product image address in the document (its XPath) by right-clicking on it in Chrome and choosing “Inspect,” then right-clicking to copy the XPath.
We can identify other page groups by finding page elements that are unique to them. However, note that while this would allow us to group Yahoo Store-type sites, it would still be a manual process to create the groups.
A scientist’s bottom-up approach
In order to group pages automatically, we need to use a statistical approach. In other words, we need to find patterns in the data that we can use to cluster similar pages together because they share similar statistics. This is a perfect problem for machine learning algorithms.
BloomReach, a digital experience platform vendor, shared their machine learning solution to this problem. To summarize it, they first manually selected cleaned features from the HTML tags like class IDs, CSS style sheet names, and the others. Then, they automatically grouped pages based on the presence and variability of these features. In their tests, they achieved around 90% accuracy, which is pretty good.
When you give problems like this to scientists and engineers with no domain expertise, they will generally come up with complicated, bottom-up solutions. The scientist will say, “Here is the data I have, let me try different computer science ideas I know until I find a good solution.”
One of the reasons I advocate practitioners learn programming is that you can start solving problems using your domain expertise and find shortcuts like the one I will share next.
Hamlet’s observation and a simpler solution
For most ecommerce sites, most page templates include images (and input elements), and those generally change in quantity and size.
I decided to test the quantity and size of images, and the number of input elements as my features set. We were able to achieve 97.5% accuracy in our tests. This is a much simpler and effective approach for this specific problem. All of this is possible because I didn’t start with the data I could access, but with a simpler domain-level observation.
I am not trying to say my approach is superior, as they have tested theirs in millions of pages and I’ve only tested this on a few thousand. My point is that as a practitioner you should learn this stuff so you can contribute your own expertise and creativity.
Now let’s get to the fun part and get to code some machine learning code in Python!
Collecting training data
We need training data to build a model. This training data needs to come pre-labeled with “correct” answers so that the model can learn from the correct answers and make its own predictions on unseen data.
In our case, as discussed above, we’ll use our intuition that most product pages have one or more large images on the page, and most category type pages have many smaller images on the page.
What’s more, product pages typically have more form elements than category pages (for filling in quantity, color, and more).
Unfortunately, crawling a web page for this data requires knowledge of web browser automation, and image manipulation, which are outside the scope of this post. Feel free to study this GitHub gist we put together to learn more.
Here we load the raw data already collected.
Feature engineering
Each row of the form_counts data frame above corresponds to a single URL and provides a count of both form elements, and input elements contained on that page.
Meanwhile, in the img_counts data frame, each row corresponds to a single image from a particular page. Each image has an associated file size, height, and width. Pages are more than likely to have multiple images on each page, and so there are many rows corresponding to each URL.
It is often the case that HTML documents don’t include explicit image dimensions. We are using a little trick to compensate for this. We are capturing the size of the image files, which would be proportional to the multiplication of the width and the length of the images.
We want our image counts and image file sizes to be treated as categorical features, not numerical ones. When a numerical feature, say new visitors, increases it generally implies improvement, but we don’t want bigger images to imply improvement. A common technique to do this is called one-hot encoding.
Most site pages can have an arbitrary number of images. We are going to further process our dataset by bucketing images into 50 groups. This technique is called “binning”.
Here is what our processed data set looks like.
Adding ground truth labels
As we already have correct labels from our manual regex approach, we can use them to create the correct labels to feed the model.
We also need to split our dataset randomly into a training set and a test set. This allows us to train the machine learning model on one set of data, and test it on another set that it’s never seen before. We do this to prevent our model from simply “memorizing” the training data and doing terribly on new, unseen data. You can check it out at the link given below:
Model training and grid search
Finally, the good stuff!
All the steps above, the data collection and preparation, are generally the hardest part to code. The machine learning code is generally quite simple.
We’re using the well-known Scikitlearn python library to train a number of popular models using a bunch of standard hyperparameters (settings for fine-tuning a model). Scikitlearn will run through all of them to find the best one, we simply need to feed in the X variables (our feature engineering parameters above) and the Y variables (the correct labels) to each model, and perform the .fit() function and voila!
Evaluating performance
After running the grid search, we find our winning model to be the Linear SVM (0.974) and Logistic regression (0.968) coming at a close second. Even with such high accuracy, a machine learning model will make mistakes. If it doesn’t make any mistakes, then there is definitely something wrong with the code.
In order to understand where the model performs best and worst, we will use another useful machine learning tool, the confusion matrix.
When looking at a confusion matrix, focus on the diagonal squares. The counts there are correct predictions and the counts outside are failures. In the confusion matrix above we can quickly see that the model does really well-labeling products, but terribly labeling pages that are not product or categories. Intuitively, we can assume that such pages would not have consistent image usage.
Here is the code to put together the confusion matrix:
Finally, here is the code to plot the model evaluation:
Resources to learn more
You might be thinking that this is a lot of work to just tell page groups, and you are right!
Mirko Obkircher commented in my article for part two that there is a much simpler approach, which is to have your client set up a Google Analytics data layer with the page group type. Very smart recommendation, Mirko!
I am using this example for illustration purposes. What if the issue requires a deeper exploratory investigation? If you already started the analysis using Python, your creativity and knowledge are the only limits.
If you want to jump onto the machine learning bandwagon, here are some resources I recommend to learn more:
Attend a Pydata event I got motivated to learn data science after attending the event they host in New York.
Hands-On Introduction To Scikit-learn (sklearn)
Scikit Learn Cheat Sheet
Efficiently Searching Optimal Tuning Parameters
If you are starting from scratch and want to learn fast, I’ve heard good things about Data Camp.
Got any tips or queries? Share it in the comments.
Hamlet Batista is the CEO and founder of RankSense, an agile SEO platform for online retailers and manufacturers. He can be found on Twitter @hamletbatista.
The post Using Python to recover SEO site traffic (Part three) appeared first on Search Engine Watch.
source https://searchenginewatch.com/2019/04/17/using-python-to-recover-seo-site-traffic-part-three/ from Rising Phoenix SEO http://risingphoenixseo.blogspot.com/2019/04/using-python-to-recover-seo-site.html
0 notes
bambiguertinus · 6 years ago
Text
Using Python to recover SEO site traffic (Part three)
When you incorporate machine learning techniques to speed up SEO recovery, the results can be amazing.
This is the third and last installment from our series on using Python to speed SEO traffic recovery. In part one, I explained how our unique approach, that we call “winners vs losers” helps us quickly narrow down the pages losing traffic to find the main reason for the drop. In part two, we improved on our initial approach to manually group pages using regular expressions, which is very useful when you have sites with thousands or millions of pages, which is typically the case with ecommerce sites. In part three, we will learn something really exciting. We will learn to automatically group pages using machine learning.
As mentioned before, you can find the code used in part one, two and three in this Google Colab notebook.
Let’s get started.
URL matching vs content matching
When we grouped pages manually in part two, we benefited from the fact the URLs groups had clear patterns (collections, products, and the others) but it is often the case where there are no patterns in the URL. For example, Yahoo Stores’ sites use a flat URL structure with no directory paths. Our manual approach wouldn’t work in this case.
Fortunately, it is possible to group pages by their contents because most page templates have different content structures. They serve different user needs, so that needs to be the case.
How can we organize pages by their content? We can use DOM element selectors for this. We will specifically use XPaths.
For example, I can use the presence of a big product image to know the page is a product detail page. I can grab the product image address in the document (its XPath) by right-clicking on it in Chrome and choosing “Inspect,” then right-clicking to copy the XPath.
We can identify other page groups by finding page elements that are unique to them. However, note that while this would allow us to group Yahoo Store-type sites, it would still be a manual process to create the groups.
A scientist’s bottom-up approach
In order to group pages automatically, we need to use a statistical approach. In other words, we need to find patterns in the data that we can use to cluster similar pages together because they share similar statistics. This is a perfect problem for machine learning algorithms.
BloomReach, a digital experience platform vendor, shared their machine learning solution to this problem. To summarize it, they first manually selected cleaned features from the HTML tags like class IDs, CSS style sheet names, and the others. Then, they automatically grouped pages based on the presence and variability of these features. In their tests, they achieved around 90% accuracy, which is pretty good.
When you give problems like this to scientists and engineers with no domain expertise, they will generally come up with complicated, bottom-up solutions. The scientist will say, “Here is the data I have, let me try different computer science ideas I know until I find a good solution.”
One of the reasons I advocate practitioners learn programming is that you can start solving problems using your domain expertise and find shortcuts like the one I will share next.
Hamlet’s observation and a simpler solution
For most ecommerce sites, most page templates include images (and input elements), and those generally change in quantity and size.
I decided to test the quantity and size of images, and the number of input elements as my features set. We were able to achieve 97.5% accuracy in our tests. This is a much simpler and effective approach for this specific problem. All of this is possible because I didn’t start with the data I could access, but with a simpler domain-level observation.
I am not trying to say my approach is superior, as they have tested theirs in millions of pages and I’ve only tested this on a few thousand. My point is that as a practitioner you should learn this stuff so you can contribute your own expertise and creativity.
Now let’s get to the fun part and get to code some machine learning code in Python!
Collecting training data
We need training data to build a model. This training data needs to come pre-labeled with “correct” answers so that the model can learn from the correct answers and make its own predictions on unseen data.
In our case, as discussed above, we’ll use our intuition that most product pages have one or more large images on the page, and most category type pages have many smaller images on the page.
What’s more, product pages typically have more form elements than category pages (for filling in quantity, color, and more).
Unfortunately, crawling a web page for this data requires knowledge of web browser automation, and image manipulation, which are outside the scope of this post. Feel free to study this GitHub gist we put together to learn more.
Here we load the raw data already collected.
Feature engineering
Each row of the form_counts data frame above corresponds to a single URL and provides a count of both form elements, and input elements contained on that page.
Meanwhile, in the img_counts data frame, each row corresponds to a single image from a particular page. Each image has an associated file size, height, and width. Pages are more than likely to have multiple images on each page, and so there are many rows corresponding to each URL.
It is often the case that HTML documents don’t include explicit image dimensions. We are using a little trick to compensate for this. We are capturing the size of the image files, which would be proportional to the multiplication of the width and the length of the images.
We want our image counts and image file sizes to be treated as categorical features, not numerical ones. When a numerical feature, say new visitors, increases it generally implies improvement, but we don’t want bigger images to imply improvement. A common technique to do this is called one-hot encoding.
Most site pages can have an arbitrary number of images. We are going to further process our dataset by bucketing images into 50 groups. This technique is called “binning”.
Here is what our processed data set looks like.
Adding ground truth labels
As we already have correct labels from our manual regex approach, we can use them to create the correct labels to feed the model.
We also need to split our dataset randomly into a training set and a test set. This allows us to train the machine learning model on one set of data, and test it on another set that it’s never seen before. We do this to prevent our model from simply “memorizing” the training data and doing terribly on new, unseen data. You can check it out at the link given below:
Model training and grid search
Finally, the good stuff!
All the steps above, the data collection and preparation, are generally the hardest part to code. The machine learning code is generally quite simple.
We’re using the well-known Scikitlearn python library to train a number of popular models using a bunch of standard hyperparameters (settings for fine-tuning a model). Scikitlearn will run through all of them to find the best one, we simply need to feed in the X variables (our feature engineering parameters above) and the Y variables (the correct labels) to each model, and perform the .fit() function and voila!
Evaluating performance
After running the grid search, we find our winning model to be the Linear SVM (0.974) and Logistic regression (0.968) coming at a close second. Even with such high accuracy, a machine learning model will make mistakes. If it doesn’t make any mistakes, then there is definitely something wrong with the code.
In order to understand where the model performs best and worst, we will use another useful machine learning tool, the confusion matrix.
When looking at a confusion matrix, focus on the diagonal squares. The counts there are correct predictions and the counts outside are failures. In the confusion matrix above we can quickly see that the model does really well-labeling products, but terribly labeling pages that are not product or categories. Intuitively, we can assume that such pages would not have consistent image usage.
Here is the code to put together the confusion matrix:
Finally, here is the code to plot the model evaluation:
Resources to learn more
You might be thinking that this is a lot of work to just tell page groups, and you are right!
Mirko Obkircher commented in my article for part two that there is a much simpler approach, which is to have your client set up a Google Analytics data layer with the page group type. Very smart recommendation, Mirko!
I am using this example for illustration purposes. What if the issue requires a deeper exploratory investigation? If you already started the analysis using Python, your creativity and knowledge are the only limits.
If you want to jump onto the machine learning bandwagon, here are some resources I recommend to learn more:
Attend a Pydata event I got motivated to learn data science after attending the event they host in New York.
Hands-On Introduction To Scikit-learn (sklearn)
Scikit Learn Cheat Sheet
Efficiently Searching Optimal Tuning Parameters
If you are starting from scratch and want to learn fast, I’ve heard good things about Data Camp.
Got any tips or queries? Share it in the comments.
Hamlet Batista is the CEO and founder of RankSense, an agile SEO platform for online retailers and manufacturers. He can be found on Twitter @hamletbatista.
The post Using Python to recover SEO site traffic (Part three) appeared first on Search Engine Watch.
from Digtal Marketing News https://searchenginewatch.com/2019/04/17/using-python-to-recover-seo-site-traffic-part-three/
0 notes
evaaguilaus · 6 years ago
Text
Using Python to recover SEO site traffic (Part three)
When you incorporate machine learning techniques to speed up SEO recovery, the results can be amazing.
This is the third and last installment from our series on using Python to speed SEO traffic recovery. In part one, I explained how our unique approach, that we call “winners vs losers” helps us quickly narrow down the pages losing traffic to find the main reason for the drop. In part two, we improved on our initial approach to manually group pages using regular expressions, which is very useful when you have sites with thousands or millions of pages, which is typically the case with ecommerce sites. In part three, we will learn something really exciting. We will learn to automatically group pages using machine learning.
As mentioned before, you can find the code used in part one, two and three in this Google Colab notebook.
Let’s get started.
URL matching vs content matching
When we grouped pages manually in part two, we benefited from the fact the URLs groups had clear patterns (collections, products, and the others) but it is often the case where there are no patterns in the URL. For example, Yahoo Stores’ sites use a flat URL structure with no directory paths. Our manual approach wouldn’t work in this case.
Fortunately, it is possible to group pages by their contents because most page templates have different content structures. They serve different user needs, so that needs to be the case.
How can we organize pages by their content? We can use DOM element selectors for this. We will specifically use XPaths.
For example, I can use the presence of a big product image to know the page is a product detail page. I can grab the product image address in the document (its XPath) by right-clicking on it in Chrome and choosing “Inspect,” then right-clicking to copy the XPath.
We can identify other page groups by finding page elements that are unique to them. However, note that while this would allow us to group Yahoo Store-type sites, it would still be a manual process to create the groups.
A scientist’s bottom-up approach
In order to group pages automatically, we need to use a statistical approach. In other words, we need to find patterns in the data that we can use to cluster similar pages together because they share similar statistics. This is a perfect problem for machine learning algorithms.
BloomReach, a digital experience platform vendor, shared their machine learning solution to this problem. To summarize it, they first manually selected cleaned features from the HTML tags like class IDs, CSS style sheet names, and the others. Then, they automatically grouped pages based on the presence and variability of these features. In their tests, they achieved around 90% accuracy, which is pretty good.
When you give problems like this to scientists and engineers with no domain expertise, they will generally come up with complicated, bottom-up solutions. The scientist will say, “Here is the data I have, let me try different computer science ideas I know until I find a good solution.”
One of the reasons I advocate practitioners learn programming is that you can start solving problems using your domain expertise and find shortcuts like the one I will share next.
Hamlet’s observation and a simpler solution
For most ecommerce sites, most page templates include images (and input elements), and those generally change in quantity and size.
I decided to test the quantity and size of images, and the number of input elements as my features set. We were able to achieve 97.5% accuracy in our tests. This is a much simpler and effective approach for this specific problem. All of this is possible because I didn’t start with the data I could access, but with a simpler domain-level observation.
I am not trying to say my approach is superior, as they have tested theirs in millions of pages and I’ve only tested this on a few thousand. My point is that as a practitioner you should learn this stuff so you can contribute your own expertise and creativity.
Now let’s get to the fun part and get to code some machine learning code in Python!
Collecting training data
We need training data to build a model. This training data needs to come pre-labeled with “correct” answers so that the model can learn from the correct answers and make its own predictions on unseen data.
In our case, as discussed above, we’ll use our intuition that most product pages have one or more large images on the page, and most category type pages have many smaller images on the page.
What’s more, product pages typically have more form elements than category pages (for filling in quantity, color, and more).
Unfortunately, crawling a web page for this data requires knowledge of web browser automation, and image manipulation, which are outside the scope of this post. Feel free to study this GitHub gist we put together to learn more.
Here we load the raw data already collected.
Feature engineering
Each row of the form_counts data frame above corresponds to a single URL and provides a count of both form elements, and input elements contained on that page.
Meanwhile, in the img_counts data frame, each row corresponds to a single image from a particular page. Each image has an associated file size, height, and width. Pages are more than likely to have multiple images on each page, and so there are many rows corresponding to each URL.
It is often the case that HTML documents don’t include explicit image dimensions. We are using a little trick to compensate for this. We are capturing the size of the image files, which would be proportional to the multiplication of the width and the length of the images.
We want our image counts and image file sizes to be treated as categorical features, not numerical ones. When a numerical feature, say new visitors, increases it generally implies improvement, but we don’t want bigger images to imply improvement. A common technique to do this is called one-hot encoding.
Most site pages can have an arbitrary number of images. We are going to further process our dataset by bucketing images into 50 groups. This technique is called “binning”.
Here is what our processed data set looks like.
Adding ground truth labels
As we already have correct labels from our manual regex approach, we can use them to create the correct labels to feed the model.
We also need to split our dataset randomly into a training set and a test set. This allows us to train the machine learning model on one set of data, and test it on another set that it’s never seen before. We do this to prevent our model from simply “memorizing” the training data and doing terribly on new, unseen data. You can check it out at the link given below:
Model training and grid search
Finally, the good stuff!
All the steps above, the data collection and preparation, are generally the hardest part to code. The machine learning code is generally quite simple.
We’re using the well-known Scikitlearn python library to train a number of popular models using a bunch of standard hyperparameters (settings for fine-tuning a model). Scikitlearn will run through all of them to find the best one, we simply need to feed in the X variables (our feature engineering parameters above) and the Y variables (the correct labels) to each model, and perform the .fit() function and voila!
Evaluating performance
After running the grid search, we find our winning model to be the Linear SVM (0.974) and Logistic regression (0.968) coming at a close second. Even with such high accuracy, a machine learning model will make mistakes. If it doesn’t make any mistakes, then there is definitely something wrong with the code.
In order to understand where the model performs best and worst, we will use another useful machine learning tool, the confusion matrix.
When looking at a confusion matrix, focus on the diagonal squares. The counts there are correct predictions and the counts outside are failures. In the confusion matrix above we can quickly see that the model does really well-labeling products, but terribly labeling pages that are not product or categories. Intuitively, we can assume that such pages would not have consistent image usage.
Here is the code to put together the confusion matrix:
Finally, here is the code to plot the model evaluation:
Resources to learn more
You might be thinking that this is a lot of work to just tell page groups, and you are right!
Mirko Obkircher commented in my article for part two that there is a much simpler approach, which is to have your client set up a Google Analytics data layer with the page group type. Very smart recommendation, Mirko!
I am using this example for illustration purposes. What if the issue requires a deeper exploratory investigation? If you already started the analysis using Python, your creativity and knowledge are the only limits.
If you want to jump onto the machine learning bandwagon, here are some resources I recommend to learn more:
Attend a Pydata event I got motivated to learn data science after attending the event they host in New York.
Hands-On Introduction To Scikit-learn (sklearn)
Scikit Learn Cheat Sheet
Efficiently Searching Optimal Tuning Parameters
If you are starting from scratch and want to learn fast, I’ve heard good things about Data Camp.
Got any tips or queries? Share it in the comments.
Hamlet Batista is the CEO and founder of RankSense, an agile SEO platform for online retailers and manufacturers. He can be found on Twitter @hamletbatista.
The post Using Python to recover SEO site traffic (Part three) appeared first on Search Engine Watch.
from Digtal Marketing News https://searchenginewatch.com/2019/04/17/using-python-to-recover-seo-site-traffic-part-three/
0 notes