#I like writing alphabet stuff but dang I really start running out of ideas the lower I go
Explore tagged Tumblr posts
Text
Falling Devil Yandere Alphabet
WARNINGS: GENDER NOT SPECIFIED + YANDERE THEMES + HUMAN READER + MENTIONS OF CANNIBALISM + NOT PROOFREAD + AT SOME POINT I JUST GAVE UP + OOC I THINK OOPS
NOTES: I literally watched Shrek, Hotel Transylvania, and Ninja turtles 1 and 2 while making this. It’s ridiculous how easily I get distracted when writing but whatever.
A = Affection (How do they show their love and affection? How intense would it get?)
She may have existed since the beginning of time—I forgot whether this was implied or theorized—but I think someone like her is resistant to influence. let alone by such powerful emotions as love or obsession, or in this instance, both...
Thus, she finds herself engulfed in the boundless depths of her obsessive love for you, a sensation she embraces wholeheartedly. She revels in the overpowering emotions she feels, expressing her affection through tangible acts of care—such as babying you. You could say she embodies the role of a devoted butler or caregiver. While she also yearns to demonstrate her affection by preparing meals for you, she respects your aversion to the idea of consuming human food—since you’re literally a human yourself.
B = Blood (How messy are they willing to get when it comes to their darling?)
For someone typically so polite and mature, she won't hesitate to spill blood in your name if necessary.
Key word: If
If anyone or anything dares to come between you and her or threatens your safety, she will paint every corner of the earth with their blood and guts. There is no force on earth, in heaven, or in hell that she will allow to endanger you or her relationship with you. So, if you wish to avoid a gruesome bloodshed, you better remove that person yourself before she takes matters into her own hands. With that being said, she takes both your safety and the sanctity of your relationship with her extremely seriously.
C = Cruelty (How would they treat their darling once abducted? Would they mock them?)
No, she would not mock or make fun of you in any way. There's not much to elaborate on here. Just know that she won't make you feel bad for being so easily abducted by her. On the contrary, she’ll only express her pleasure and satisfaction that you’re finally with her. She'll gently caress your teary cheeks, her touch tender but completely dismissive of your tears, focusing solely on the joy she feels at finally having you all to herself.
D = Darling (Aside from abduction, would they do anything against their darling's will?)
I have a strong feeling that she likes to baby you a lot. Do you know what this means?
This means that she'll hover over you, constantly attending to your every need as if you're incapable of managing on your own. Each morning, instead of allowing you the autonomy to dress yourself, she takes charge, dressing you in outfits she deems the cutest. It's akin to a child's intense fixation on their cherished doll, with her playing the role of the child and you, unwittingly, cast as the doll. It's a twisted dynamic where her sense of power is derived from your perceived reliance on her, blurring the lines between caregiver and captor.
To add to that, her control extends beyond mere clothing choices. Understanding your human needs, she recognizes the necessity of regular meals for your well-being. However, unfortunately for you, her culinary expertise lies in cooking humans for sustenance. Despite this, she will persistently coax you into sampling her cuisine, assuring you that it is as delectable as any other meat. But don’t worry, she refrains from actually preparing the meal unless you explicitly express a desire to try it. And should you muster the courage to taste her food, it's important that you maintain a facade of enjoyment, for the consequences of expressing your dislike for the dish she so lovingly prepared for you are dire. In essence, she wouldn’t force you to eat her dishes—but if you express wanting to try them, you better finish it all.
E = Exposed (How much of their heart do they bare to their darling? How vulnerable are they when it comes to their darling?)
She's not particularly emotive, nor does she wear her feelings on her sleeve. It's not that she's ashamed of her emotions—she simply tends to maintain a cool and composed demeanor most of the time. However, since she's fallen deeply in love with you, she has no qualms about revealing the extent of her love. The only emotion she may hesitate to display around you is her anger. Nevertheless, she'll willingly show you her vulnerability.
F = Fight (How would they feel if their darling fought back?)
Honestly? The first few times you rebel against her, she'll stay composed, not lashing out at you. However, if you start fighting against her consistently…
She’ll be offended. Very offended.
She's given you all her patience, tolerance, love, and support—and this is how you repay her? By infuriating her with your constant disobedience? How disrespectful of you! She isn't one to tolerate such defiance. You've tested her limits, and now you must face the consequences. Prepare for some very strict scolding from the chef herself.
G = Game (Is this a game to them? How much would they enjoy watching their darling try to escape?)
Absolutely, this isn't a game to her, not in the slightest. She doesn't joke about this—her relationship with you is far from anything resembling a game. She regards the relationship with the utmost importance and doesn't even entertain the thought of trivializing it as such.
You're sorely mistaken if you think she derives any amusement from watching you attempt to escape her loving embrace. Because spoiler alert: she doesn't. I'll concede that, depending on the ingenuity of your escape plan, she might be slightly impressed. However, her predominant emotions will be disappointment and frustration at your decision to flee from her and the love she offers.
H = Hell (What would be their darling's worst experience with them?)
Anytime you’re eating.
She's the sole creator of your meals—breakfast, lunch, dinner, you name it. After all, she's literally hell's chef—whose culinary prowess could possibly rival hers? Admittedly, her dishes typically feature ingredients of the human variety, but just because you're human doesn't mean you can't partake! However, since she respects your reluctance to try them, she prepares normal food for you (steak, spaghetti, soup, or whatever). Surprisingly, despite her expertise in cooking humans, her skills extend seamlessly to crafting dishes that humans typically enjoy, something you’re very grateful for! Otherwise, if not for that, you’re sure she would’ve made you a cannibal long ago.
The reason I consider this your worst experience with her is because every single time she insists on coaxing you into trying her food, persuading you to try something different, and promising it will delight your taste buds. It gets to the point of annoyance. Once, in an attempt to end her relentless persuasion, you reluctantly accept—only to discover she had prepared a rather huge meal made entirely of human flesh. You're left with the choice of either eating it all or giving it a try but then spitting it out, leading to voicing your dislike of the food she so lovingly made, which ultimately results in her extreme frustration—which then escalates to the point of endangering your life, making it an unforgettable and traumatizing ordeal.
I feel like this part didn’t make much sense. Oops.
I = Ideals (What kind of future do they have in mind for/ with their darling?)
Her vision of a future with you isn't unreasonable, at least not in her eyes. All she desires is your total and complete obedience, love, and loyalty. She envisions a time when you abandon every trace of rebellion and disobedience from your soul, mind, and body, allowing her to love and care for you without resistance. She wants to choose your outfits, pamper you with her affection, follow you everywhere, and shower your face with kisses. The ultimate fulfillment of her dream would be you finally letting go of your human morals and indulging in the delicious meals she prepares: limbs and body parts of humans.
J = Jealousy (Do they get jealous? Do they lash out or find a way to cope?)
When our beloved chef is jealous, she maintains a veneer of professionalism. Well, kind of. She confronts the source of her jealousy without hesitation, but there's an unmistakable edge to her demeanor. You can sense her annoyance, even if she tries to mask it. Think of it like this: she addresses the situation directly, yet her words and actions carry a subtle, simmering irritation that makes her feelings clear.
"Excuse me, this is my partner," she'll assert calmly, her tone tinged with a hint of annoyance, her closed eyes twitching slightly in suppressed anger.
Now, depending on the response of the other person, things can unfold in one of two ways: she unleashes her wrath, using her powers to invert gravity on the offender, therefore making that person fall upwards into the sky, inevitably plunging them into the very door that leads to hell. Or she’ll opt for a warning, her tone conveying the gravity of the situation without resorting to immediate punishment.
K = Kisses (How do they act around or with their darling?)
She remains the same: polite and composed. However, some wouldn't be mistaken if they claimed to notice a faint smile on her face whenever you're around—a silent expression of her pleasure and relief at having you near. Regarding possession, it's likely evident to most people. Depending on the situation, like when she's on the hunt for humans who have the ingredients required for whatever dish she’s working on, she might have her arm firmly around your waist or wrist—perhaps to the extent that you're lifted slightly off the ground, unable to touch it anymore. She's tall, very tall, and she's aware that there will come a point where you can't keep up with her anymore, so she occasionally employs one of those long arms of hers to lift you off the ground, if only just slightly.
L = Love letters (How would they go about courting or approaching their darling?)
She wouldn’t. She strikes me as someone who watches from afar.
But the moment she realizes the intensity of her feelings for you, she won't hesitate for a second. She'll approach you directly and declare, "You belong to me now."
No matter your answer, you're hers now.
M = Mask (Are their true colors drastically different from the way they act around everyone else?)
Absolutely not. She couldn't care less about who witnesses her overwhelming love and obsession for you. If anyone were to dare to call her out on it, she'd simply turn her head in their direction, her eyes closed as always, and dismiss them, urging them to give you and her some space.
N = Naughty (How would they punish their darling?)
Scolding. That's her method of discipline. She'll be as stern as necessary during scoldings, but physical harm is off the table. Still, she's strict and unforgiving in her approach. You might initially believe that a mere scolding won't affect you much, but trust me, when she scolds you, you'll find yourself feeling surprisingly remorseful. She has a way of making you question yourself, leaving you wondering why you feel so guilty after her scolding is over. In short, you may underestimate the impact of her scolding at first, but eventually, it starts to hit you.
O = Oppression (How many rights would they take away from their darling?)
Honestly, she wouldn't strip away many rights from you. The only ones she'll take are your right to privacy and your right to choose your own outfits, as she insists on selecting and dressing you herself. If you dare to voice complaints, she'll simply brush them aside, reminding you that she's seen countless human bodies—after all, she literally cooks humans.
P = Patience (How patient are they with their darling?)
She's patient enough to tolerate a few complaints from you, but insults are where she draws the line. Just keep yourself in her good graces, and the chances of her becoming angry with you decrease significantly. By staying on her good side, I mean offering her your complete and unwavering obedience, love, and loyalty.
Q = Quit (If their darling dies, leaves, or successfully escapes, would they ever be able to move on?)
If you were to meet your demise, she'd make it her mission to avenge you and ensure that whoever was responsible for your death pays the ultimate price. She'll deliver a punishment far worse than hell itself. So rest assured, if you were to meet such a fate, she'd be seeking revenge in your name. Until she looks in every corner and crevice of the earth to avenge your death, she won’t be able to move on.
The only time I can see her actually moving on from your death is if you were to die under natural circumstances.
If you were to escape, resentment would begin to fester and swell within her heart at your sudden departure. The idea of her beloved leaving her so abruptly would fuel an ever-growing sense of bitterness within her. Each passing moment would only serve to intensify this gnawing feeling of resentment, pulling at her heartstrings with increasing force day by day. Consequently, she won't be able to move on until she finds you again. However, her motivation for seeking you out would no longer stem from her immense love but from the overwhelming resentment and anger that now consumes her.
R = Regret (Would they ever feel guilty about abducting their darling? Would they ever let their darling go?)
No, she doesn't experience a single iota of guilt when she abducts you. Guilt isn't an emotion she's familiar with, nor does she see a reason to entertain it. After all, why should she feel guilty? She now has the person she loves most closer to her than ever, and if anything, she's pleased by the outcome.
Moreover, she won't ever release her hold on you. You'll never be granted freedom from her grasp until the day you draw your final breath, which will likely occur due to natural circumstances such as old age, a heart attack, or any other form of disease.
S = Stigma (What brought about this side of them? childhood, curiosity, etc?)
Curiosity.
As a devil who typically sees humans as nothing more than necessary ingredients for her dishes, she doesn't usually care or think much of them. That is, until she meets you. You must have caught her eye because there was something undeniably different about you, something that sparked her curiosity and ultimately caused her to fall for you—hard. I'm talking about the kind of fall where she face-plants the floor type hard.
T = Tears (How do they feel about seeing their darling scream, cry, and/or isolate themselves?)
She still doesn't feel a drop of guilt when she sees you cry, isolate yourself, or scream. The closest she ever comes to guilt is disappointment—and guilt and disappointment aren't even in the same realm, so you can already imagine how that goes. However, she won't feel disappointment toward you unless you scream; in that case, her disappointment is directed at you. If you isolate yourself or cry, she'll be disappointed in herself for making you feel this way—I guess you could say this is her own way of feeling "guilt." In those instances, she'll try to comfort you—so that's a small consolation. But if you scream, she’ll simply tell you to knock it off.
U = Unique (Would they do anything different from the classic yandere?)
My mind is actually so blank at this point that I cannot think, so please just take this: She won't ever physically harm you—not even mentally.
V = Vice (What weakness can their darling exploit in order to escape?)
Remember all the times I mentioned how she desperately wants you to try her food, but you refuse because eating human meat would make you a cannibal? Well, here's a potential advantage: if you tell her you want to try her dishes, you’ll surprise her. She'll enthusiastically prepare the meal, and that's your moment—it's either now or never. While she's cooking, you have your chance to make a run for it. But be cautious; she's a long-legged woman with enough stamina to catch up to you within mere seconds. So, act wisely.
W = Wit's end (Would they ever hurt their darling?)
No.
Not sure if this counts, but you know how she only targets humans if they're necessary ingredients for her dishes? Well, here's the great news for you: you're not and will never be an option as an ingredient for her. You are the only human she will never, ever consider attacking, regardless of whether you possess one of the ingredients she's seeking. Even if all the ingredients for her dish happen to be everything that you are, she will never turn to you as a food source. She'll scour every corner and crevice of the earth if necessary, but she'll never resort to cooking you. (This part was supposed to go into the unique section, but I changed my mind because it felt wrong. So if this doesn’t make sense, sorry💪😞)
Yeah, in essence, she'll never inflict harm upon you. While your worst experiences with her may have been near-death encounters, she'll never actually harm you. The most she'll do in terms of hurting you is exert a firm grip. That's it.
X = Xoanon (How much would they revere or worship their darling? To what length would they go to win their darling over?)
She's not a worshipper, but her actions might suggest otherwise. The way she cares for you feels akin to that of a devoted caregiver. She insists on dressing you herself, feeding you, and accompanying you everywhere. It's understandable if you interpret her actions as worship, but to her, it's not about worship—it's simply about loving you that much.
And no, she's not going to any extraordinary lengths to win you over because in her mind, you're already hers. There's no need for her to go to such lengths when you're already with her.
Y = Yearn (How long do they pine after their darling before they snap?)
Y’know how I just mentioned that in her mind, you're already hers? Well, I doubt you share the same sentiment, do you? She's not oblivious, and she's well aware that you likely don't reciprocate her feelings. If your feelings toward her remain unchanged even with all her persuasions, she'll remain surprisingly patient for quite some time. What I'm getting at is that her endurance in trying to win you over during the relationship is remarkable. I can't envision her snapping at you, no matter how much time passes as she endeavors to capture your heart.
Hoping this part made sense 😗
Z = Zenith (Would they ever break their darling?)
Like I've hammered home countless times, no—not mentally, not physically, but perhaps emotionally. Remember when I mentioned how her scolding can make you question whether you're truly ungrateful for her love? Well, emotions play a significant role in that scolding too. She doesn't necessarily toy with your emotions, but she aims for you to feel at least a tiny tinge of guilt for every instance you've angered or disappointed her.
#chainsaw man#csm#falling devil#csm x reader#csm x you#falling devil x reader#falling devil x you#yandere x reader#yandere x you#yandere csm#csm falling devil#yandere alphabet#the way you can see my writing gradually decrease each section actually bothers me#I like writing alphabet stuff but dang I really start running out of ideas the lower I go#anyways I hope this was at least somewhat in character
59 notes
·
View notes
Text
11/11/11 tag game
Answer 11 questions, make 11 new questions, tag 11 persons!
I was tagged by @waterfallwritings for this! Thank you, your questions were really interesting and fun to answer! o(^▽^)o
(Sorry if I got a bit lengthy, it was just so nice to do something not university related after exams!)
1. How do you come up with ideas for your WIPs?
The heavy artillery from the get go, eh? *cracks knuckles* Okay, to be honest, I'm not sure. I've never really thought of it, they're just there, clamoring for attention (plot bunnies are my best ally and worst enemy). I definitely have bouts of very intense inspiration and days when I just,, can't. Even if I know where the scene is going, how it's going, and why, the words aren't there. Or they're all wrong. (This is when I default to writing ugly-crying emotional breakdowns or sex. Likely both.)
Working out a story is a game of association laced with concepts and core elements for me. Like this: dragons (core element) + mountains (association) + tribe/clan (concept) + shapeshifting (association/concept) + relocation/settlers (core element). And that's basically my dragon wip.
Eld's story is based on a Doctor Who quote "demons run when a good man goes to war". Ren and Kuro grew up with me; at some point they just started acting on their own - I just throw shit at them and sees what shakes loose at this point. (They have five kids! How???? did that?? happen???)
(I'm a sucker for prompts. My brain can see a single word and just, run of with it hollering in glee.)
2. How do you get past gaps in the plot?
Urrrrgh, I have to get past them??
I struggle, is what I do. Typically I let it sit, soundly on the back-burner in my mind, until I've mulled through my story to the point where the hole is gone. (This takes months, and with my sci-fi wip I ended up rewriting the dang thing completely at the third draft after eight years of working on it. Scrapping it was painful.)
Or I try a different angle. Sometimes it works.
3. What motivates you to keep writing?
I love writing. There's really no more significant reason than that. Writing allows me to express myself, create and explore worlds and characters who wouldn't exist otherwise. And it lets me just exist without any layers. When I've been hurting, writing has helped me get the pain out with no more than tears.
And I love words and languages; the way we have about 10 different words to say "snow" (partly because Swedish mesh several words into one but still) and maybe 2 (3?) for heat. That there are groups of languages with the same ancestors that are so close; how absolutely amazingly different they can be (I just learned "y" is not considered a vowel in English and I'm???? Completely blown. What. What do you mean it's not a vowel. Are you sure???). And languages with different alphabets and ones that use pictures to represent ideas instead of sounds! And sign languages!!
And idioms! It's so cool how idioms can carry words of wisdom, caution and reassurance, and rarely can be translated (classical examples from Swedish "There's no danger on the roof" and "The rain is standing like sticks in the ground") because they lose their connections to the cultures they are used in.
The universes in my head are as full of life as the real world and not nearly as anxiety-inducing. I have stories to tell. And you know that feeling when you’re in the zone and everything is flowing and you’re writing 10′000 words in a go? That.
4. Do you do any other kind of creative writing?
I dabble in poetry? Like, very sporadically and with mixed results. I have a friend into slam poetry who opened my eyes to it, too.
(Would fanfiction go here too?)
5. Do you have any other creative hobbies besides writing?
Urngh, yeah, too many. If I’m not reading, my hands need to be moving or I’m an unhappy bean. Though, writing is the only thing I never put down. Ever.
Okay, so, I draw (badly), both on paper and digitally. Mostly landscapes. I also try to make house sketches/plans. And I paint (a bit better than I draw), prefer oils or acrylics over water colors. My partner and I also paint miniature models when there is time.
I also crochet and knit, and I love origami. I roleplay (Dungeons & Dragons, whenever the DMs have time), and I play the violin (and piano) and write simple music for myself.
I garden if there's time in the spring and during summer, and I absolutely love these little fairy-gardens that have been popping up everywhere. On that note, I have more houseplants than I have space for.
I'm also thinking to start up a little thing making bracelets and bead strings for fidgeting. I needed some kind of stim toy to be able to focus and I wanted something silent with many different sensations to keep me entertained. I hunted around a bit but eventually made my own and they turned out pretty nice!
(I also like to bake, especially pies and breads.)
6. What do you do when you’re stuck on a scene and don’t know how to get it out / write it?
I slam the key words in. And then I ignore it until it stops fighting back so much.
Or I backtrack. Sometimes I've written myself into a corner unknowingly.
Sometimes I drop a wip that's giving me grief and work on another, or I use word/idea prompts to get me started.
7. How do you decide how to end your WIP?
God, please tell me because I don't hecking know. Should I do an epilogue? Should I leave it open/ambiguous? Should I just cut it off and leave the next step to the reader? Should there be a "true" ending, with goodbyes (actual or metaphorical)?
Urrrrrrrrgh. Good Lord, endings.
8. When in the process of writing do you decide how its going to end? Or do you kind of just wait til you get there?
Either I know from the start, before I write the first words, or I wait. Which tends to mean frustrating the hell out of myself. I have started to go through my wips (whether original or fanfiction) and give them all bare-bones outlines, because not having endings is a big problem for me.
9. Why did you decide to join writeblr?
Basically when I decided I had had enough of the "join to see more" button or the "sensitive material" warning. And when I realized there was a really nice writing community here I could maybe become a part of. (A major reason was actually @concerningwolves advice posts.)
10. What’s your favourite food?
(CW: Maybe skip if you’re vegetarian/vegan/you’d rather not read about meat.)
Chinese deep-fried chicken with sweet-and-sour sauce (not the spicy chili kind, the actual pineapple and tomato juice based kind) with rice. No question about it.
Mom's "blodbröd med fläsk" is a close runner up though, but we only eat it once a year, at the midwinter solstice. It's homemade Swedish tunnbröd (hard thin-bread) with blood instead of water in it that you dip in boiling water to make it soft, with white sauce, and fried, thoroughly salted pork.
(Believe me, some country-side Swedes in the northern parts are still pretty pagan about the sun coming back, me included. It's a big deal when you go between no night/darkness and then very little/no sun.)
11. If you had to kill off a character in your WIP, who would it be and why?
People are dying right and left in most of them already, since three include large-scale wars, so there's no shortage there.
But if I had to choose a main-character or a directly supporting character? (MY BABIES! NO.)
I think Ren, from the sci-fi wip, because he would be free from both responsibility and physical and mental pain. (My boi is a wreck.) It wouldn't be unlikely either. But at this point it would destroy my story! 😂 Less story-destroying would be their foster-guardian Sandra. It would still force me to write a completely new arc, but it would be do-able.
Although, regarding the fantasy wip Firestorm, Kebarock dying in their war would crush Sunling. That could be done without losing the plot entirely. Hmmm.
Puh, that was a lot of thinking! Okay, I'll be tagging.. @concerningwolves @weaver-of-fantasies-and-fables @adorhauer @focusdumbass @sleepy-and-anxious @els-writes @meteorwrites @sebastian-writer @telvivere @thescribesloft and @aceymichaelis No obligation to do this of course! <3 (And if I tagged you and you’d rather not be tagged in games, I apologize, please let me know)
And here are your questions if you want to:
1. What about your wip makes you smile?
2. What's the hardest decision you've had to make in regards to a wip?
3. What text font do you prefer writing in? Or do you write by hand?
4. Are there pets in your wip? If not, what pet might your character(s) keep?
5. What AU would you love to see/write for your wip?
6. Is there any type of music/a song in particular that you associate with your wip?
7. Are you a night owl or an early bird/When do you write?
8. Favorite beverage?
9. Where do you prefer to write? At home? In a library? On the bus/train?
10. What are your first 3 to 5 associations with the word 'writing'? Why those?
11. What do you do when you're bored?
Hope you enjoy! o(^◇^)o
#11/11/11 tag#11 asks tag game#tag game#ask game#writing related#writing community#writeblr community
1 note
·
View note
Text
Ever wonder about that mysterious Content-Type tag? You know, the one you’re supposed to put in HTML and you never quite know what it should be?
Did you ever get an email from your friends in Bulgaria with the subject line “???? ?????? ??? ????”?
I’ve been dismayed to discover just how many software developers aren’t really completely up to speed on the mysterious world of character sets, encodings, Unicode, all that stuff. A couple of years ago, a beta tester for FogBUGZ was wondering whether it could handle incoming email in Japanese. Japanese? They have email in Japanese? I had no idea. When I looked closely at the commercial ActiveX control we were using to parse MIME email messages, we discovered it was doing exactly the wrong thing with character sets, so we actually had to write heroic code to undo the wrong conversion it had done and redo it correctly. When I looked into another commercial library, it, too, had a completely broken character code implementation. I corresponded with the developer of that package and he sort of thought they “couldn’t do anything about it.” Like many programmers, he just wished it would all blow over somehow.
But it won’t. When I discovered that the popular web development tool PHP has almost complete ignorance of character encoding issues, blithely using 8 bits for characters, making it darn near impossible to develop good international web applications, I thought, enough is enough.
So I have an announcement to make: if you are a programmer working in 2003 and you don’t know the basics of characters, character sets, encodings, and Unicode, and I catch you, I’m going to punish you by making you peel onions for 6 months in a submarine. I swear I will.
And one more thing:
IT’S NOT THAT HARD.
In this article I’ll fill you in on exactly what every working programmershould know. All that stuff about “plain text = ascii = characters are 8 bits” is not only wrong, it’s hopelessly wrong, and if you’re still programming that way, you’re not much better than a medical doctor who doesn’t believe in germs. Please do not write another line of code until you finish reading this article.
Before I get started, I should warn you that if you are one of those rare people who knows about internationalization, you are going to find my entire discussion a little bit oversimplified. I’m really just trying to set a minimum bar here so that everyone can understand what’s going on and can write code that has a hope of working with text in any language other than the subset of English that doesn’t include words with accents. And I should warn you that character handling is only a tiny portion of what it takes to create software that works internationally, but I can only write about one thing at a time so today it’s character sets.
A Historical Perspective
The easiest way to understand this stuff is to go chronologically.
You probably think I’m going to talk about very old character sets like EBCDIC here. Well, I won’t. EBCDIC is not relevant to your life. We don’t have to go that far back in time.
Back in the semi-olden days, when Unix was being invented and K&R were writing The C Programming Language, everything was very simple. EBCDIC was on its way out. The only characters that mattered were good old unaccented English letters, and we had a code for them called ASCII which was able to represent every character using a number between 32 and 127. Space was 32, the letter “A” was 65, etc. This could conveniently be stored in 7 bits. Most computers in those days were using 8-bit bytes, so not only could you store every possible ASCII character, but you had a whole bit to spare, which, if you were wicked, you could use for your own devious purposes: the dim bulbs at WordStar actually turned on the high bit to indicate the last letter in a word, condemning WordStar to English text only. Codes below 32 were called unprintable and were used for cussing. Just kidding. They were used for control characters, like 7 which made your computer beep and 12 which caused the current page of paper to go flying out of the printer and a new one to be fed in.
And all was good, assuming you were an English speaker.
Because bytes have room for up to eight bits, lots of people got to thinking, “gosh, we can use the codes 128-255 for our own purposes.” The trouble was, lots of people had this idea at the same time, and they had their own ideas of what should go where in the space from 128 to 255. The IBM-PC had something that came to be known as the OEM character set which provided some accented characters for European languages and a bunch of line drawing characters… horizontal bars, vertical bars, horizontal bars with little dingle-dangles dangling off the right side, etc., and you could use these line drawing characters to make spiffy boxes and lines on the screen, which you can still see running on the 8088 computer at your dry cleaners’. In fact as soon as people started buying PCs outside of America all kinds of different OEM character sets were dreamed up, which all used the top 128 characters for their own purposes. For example on some PCs the character code 130 would display as é, but on computers sold in Israel it was the Hebrew letter Gimel (), so when Americans would send their résumés to Israel they would arrive as rsums. In many cases, such as Russian, there were lots of different ideas of what to do with the upper-128 characters, so you couldn’t even reliably interchange Russian documents.
Eventually this OEM free-for-all got codified in the ANSI standard. In the ANSI standard, everybody agreed on what to do below 128, which was pretty much the same as ASCII, but there were lots of different ways to handle the characters from 128 and on up, depending on where you lived. These different systems were called code pages. So for example in Israel DOS used a code page called 862, while Greek users used 737. They were the same below 128 but different from 128 up, where all the funny letters resided. The national versions of MS-DOS had dozens of these code pages, handling everything from English to Icelandic and they even had a few “multilingual” code pages that could do Esperanto and Galician on the same computer! Wow! But getting, say, Hebrew and Greek on the same computer was a complete impossibility unless you wrote your own custom program that displayed everything using bitmapped graphics, because Hebrew and Greek required different code pages with different interpretations of the high numbers.
Meanwhile, in Asia, even more crazy things were going on to take into account the fact that Asian alphabets have thousands of letters, which were never going to fit into 8 bits. This was usually solved by the messy system called DBCS, the “double byte character set” in which someletters were stored in one byte and others took two. It was easy to move forward in a string, but dang near impossible to move backwards. Programmers were encouraged not to use s++ and s– to move backwards and forwards, but instead to call functions such as Windows’ AnsiNext and AnsiPrev which knew how to deal with the whole mess.
But still, most people just pretended that a byte was a character and a character was 8 bits and as long as you never moved a string from one computer to another, or spoke more than one language, it would sort of always work. But of course, as soon as the Internet happened, it became quite commonplace to move strings from one computer to another, and the whole mess came tumbling down. Luckily, Unicode had been invented.
Unicode
Unicode was a brave effort to create a single character set that included every reasonable writing system on the planet and some make-believe ones like Klingon, too. Some people are under the misconception that Unicode is simply a 16-bit code where each character takes 16 bits and therefore there are 65,536 possible characters. This is not, actually, correct. It is the single most common myth about Unicode, so if you thought that, don’t feel bad.
In fact, Unicode has a different way of thinking about characters, and you have to understand the Unicode way of thinking of things or nothing will make sense.
Until now, we’ve assumed that a letter maps to some bits which you can store on disk or in memory:
A -> 0100 0001
In Unicode, a letter maps to something called a code point which is still just a theoretical concept. How that code point is represented in memory or on disk is a whole nuther story.
In Unicode, the letter A is a platonic ideal. It’s just floating in heaven:
A
This platonic A is different than B, and different from a, but the same as A and A and A. The idea that A in a Times New Roman font is the same character as the A in a Helvetica font, but different from “a” in lower case, does not seem very controversial, but in some languages just figuring out what a letter is can cause controversy. Is the German letter ß a real letter or just a fancy way of writing ss? If a letter’s shape changes at the end of the word, is that a different letter? Hebrew says yes, Arabic says no. Anyway, the smart people at the Unicode consortium have been figuring this out for the last decade or so, accompanied by a great deal of highly political debate, and you don’t have to worry about it. They’ve figured it all out already.
Every platonic letter in every alphabet is assigned a magic number by the Unicode consortium which is written like this: U+0639. This magic number is called a code point. The U+ means “Unicode” and the numbers are hexadecimal. U+0639 is the Arabic letter Ain. The English letter A would be U+0041. You can find them all using the charmaputility on Windows 2000/XP or visiting the Unicode web site.
There is no real limit on the number of letters that Unicode can define and in fact they have gone beyond 65,536 so not every unicode letter can really be squeezed into two bytes, but that was a myth anyway.
OK, so say we have a string:
Hello
which, in Unicode, corresponds to these five code points:
U+0048 U+0065 U+006C U+006C U+006F.
Just a bunch of code points. Numbers, really. We haven’t yet said anything about how to store this in memory or represent it in an email message.
Encodings
That’s where encodings come in.
The earliest idea for Unicode encoding, which led to the myth about the two bytes, was, hey, let’s just store those numbers in two bytes each. So Hello becomes
00 48 00 65 00 6C 00 6C 00 6F
Right? Not so fast! Couldn’t it also be:
48 00 65 00 6C 00 6C 00 6F 00 ?
Well, technically, yes, I do believe it could, and, in fact, early implementors wanted to be able to store their Unicode code points in high-endian or low-endian mode, whichever their particular CPU was fastest at, and lo, it was evening and it was morning and there were already two ways to store Unicode. So the people were forced to come up with the bizarre convention of storing a FE FF at the beginning of every Unicode string; this is called a Unicode Byte Order Mark and if you are swapping your high and low bytes it will look like a FF FE and the person reading your string will know that they have to swap every other byte. Phew. Not every Unicode string in the wild has a byte order mark at the beginning.
For a while it seemed like that might be good enough, but programmers were complaining. “Look at all those zeros!” they said, since they were Americans and they were looking at English text which rarely used code points above U+00FF. Also they were liberal hippies in California who wanted to conserve (sneer). If they were Texans they wouldn’t have minded guzzling twice the number of bytes. But those Californian wimps couldn’t bear the idea of doubling the amount of storage it took for strings, and anyway, there were already all these doggone documents out there using various ANSI and DBCS character sets and who’s going to convert them all? Moi? For this reason alone most people decided to ignore Unicode for several years and in the meantime things got worse.
Thus was invented the brilliant concept of UTF-8. UTF-8 was another system for storing your string of Unicode code points, those magic U+ numbers, in memory using 8 bit bytes. In UTF-8, every code point from 0-127 is stored in a single byte. Only code points 128 and above are stored using 2, 3, in fact, up to 6 bytes.
This has the neat side effect that English text looks exactly the same in UTF-8 as it did in ASCII, so Americans don’t even notice anything wrong. Only the rest of the world has to jump through hoops. Specifically, Hello, which was U+0048 U+0065 U+006C U+006C U+006F, will be stored as 48 65 6C 6C 6F, which, behold! is the same as it was stored in ASCII, and ANSI, and every OEM character set on the planet. Now, if you are so bold as to use accented letters or Greek letters or Klingon letters, you’ll have to use several bytes to store a single code point, but the Americans will never notice. (UTF-8 also has the nice property that ignorant old string-processing code that wants to use a single 0 byte as the null-terminator will not truncate strings).
So far I’ve told you three ways of encoding Unicode. The traditional store-it-in-two-byte methods are called UCS-2 (because it has two bytes) or UTF-16 (because it has 16 bits), and you still have to figure out if it’s high-endian UCS-2 or low-endian UCS-2. And there’s the popular new UTF-8 standard which has the nice property of also working respectably if you have the happy coincidence of English text and braindead programs that are completely unaware that there is anything other than ASCII.
There are actually a bunch of other ways of encoding Unicode. There’s something called UTF-7, which is a lot like UTF-8 but guarantees that the high bit will always be zero, so that if you have to pass Unicode through some kind of draconian police-state email system that thinks 7 bits are quite enough, thank you it can still squeeze through unscathed. There’s UCS-4, which stores each code point in 4 bytes, which has the nice property that every single code point can be stored in the same number of bytes, but, golly, even the Texans wouldn’t be so bold as to waste that much memory.
And in fact now that you’re thinking of things in terms of platonic ideal letters which are represented by Unicode code points, those unicode code points can be encoded in any old-school encoding scheme, too! For example, you could encode the Unicode string for Hello (U+0048 U+0065 U+006C U+006C U+006F) in ASCII, or the old OEM Greek Encoding, or the Hebrew ANSI Encoding, or any of several hundred encodings that have been invented so far, with one catch: some of the letters might not show up! If there’s no equivalent for the Unicode code point you’re trying to represent in the encoding you’re trying to represent it in, you usually get a little question mark: ? or, if you’re reallygood, a box. Which did you get? -> �
There are hundreds of traditional encodings which can only store somecode points correctly and change all the other code points into question marks. Some popular encodings of English text are Windows-1252 (the Windows 9x standard for Western European languages) and ISO-8859-1, aka Latin-1 (also useful for any Western European language). But try to store Russian or Hebrew letters in these encodings and you get a bunch of question marks. UTF 7, 8, 16, and 32 all have the nice property of being able to store any code point correctly.
The Single Most Important Fact About Encodings
If you completely forget everything I just explained, please remember one extremely important fact. It does not make sense to have a string without knowing what encoding it uses. You can no longer stick your head in the sand and pretend that “plain” text is ASCII.
There Ain’t No Such Thing As Plain Text.
If you have a string, in memory, in a file, or in an email message, you have to know what encoding it is in or you cannot interpret it or display it to users correctly.
Almost every stupid “my website looks like gibberish” or “she can’t read my emails when I use accents” problem comes down to one naive programmer who didn’t understand the simple fact that if you don’t tell me whether a particular string is encoded using UTF-8 or ASCII or ISO 8859-1 (Latin 1) or Windows 1252 (Western European), you simply cannot display it correctly or even figure out where it ends. There are over a hundred encodings and above code point 127, all bets are off.
How do we preserve this information about what encoding a string uses? Well, there are standard ways to do this. For an email message, you are expected to have a string in the header of the form
Content-Type: text/plain; charset="UTF-8"
For a web page, the original idea was that the web server would return a similar Content-Type http header along with the web page itself — not in the HTML itself, but as one of the response headers that are sent before the HTML page.
This causes problems. Suppose you have a big web server with lots of sites and hundreds of pages contributed by lots of people in lots of different languages and all using whatever encoding their copy of Microsoft FrontPage saw fit to generate. The web server itself wouldn’t really know what encoding each file was written in, so it couldn’t send the Content-Type header.
It would be convenient if you could put the Content-Type of the HTML file right in the HTML file itself, using some kind of special tag. Of course this drove purists crazy… how can you read the HTML file until you know what encoding it’s in?! Luckily, almost every encoding in common use does the same thing with characters between 32 and 127, so you can always get this far on the HTML page without starting to use funny letters:
<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
But that meta tag really has to be the very first thing in the <head> section because as soon as the web browser sees this tag it’s going to stop parsing the page and start over after reinterpreting the whole page using the encoding you specified.
What do web browsers do if they don’t find any Content-Type, either in the http headers or the meta tag? Internet Explorer actually does something quite interesting: it tries to guess, based on the frequency in which various bytes appear in typical text in typical encodings of various languages, what language and encoding was used. Because the various old 8 bit code pages tended to put their national letters in different ranges between 128 and 255, and because every human language has a different characteristic histogram of letter usage, this actually has a chance of working. It’s truly weird, but it does seem to work often enough that naïve web-page writers who never knew they needed a Content-Type header look at their page in a web browser and it looks ok, until one day, they write something that doesn’t exactly conform to the letter-frequency-distribution of their native language, and Internet Explorer decides it’s Korean and displays it thusly, proving, I think, the point that Postel’s Law about being “conservative in what you emit and liberal in what you accept” is quite frankly not a good engineering principle. Anyway, what does the poor reader of this website, which was written in Bulgarian but appears to be Korean (and not even cohesive Korean), do? He uses the View | Encoding menu and tries a bunch of different encodings (there are at least a dozen for Eastern European languages) until the picture comes in clearer. If he knew to do that, which most people don’t.
For the latest version of CityDesk, the web site management software published by my company, we decided to do everything internally in UCS-2 (two byte) Unicode, which is what Visual Basic, COM, and Windows NT/2000/XP use as their native string type. In C++ code we just declare strings as wchar_t (“wide char”) instead of char and use the wcs functions instead of the str functions (for example wcscat and wcslen instead of strcat and strlen). To create a literal UCS-2 string in C code you just put an L before it as so: L"Hello".
When CityDesk publishes the web page, it converts it to UTF-8 encoding, which has been well supported by web browsers for many years. That’s the way all 29 language versions of Joel on Software are encoded and I have not yet heard a single person who has had any trouble viewing them.
This article is getting rather long, and I can’t possibly cover everything there is to know about character encodings and Unicode, but I hope that if you’ve read this far, you know enough to go back to programming, using antibiotics instead of leeches and spells, a task to which I will leave you now.
0 notes