crudcomic
crudcomic
C.R.U.D.
17 posts
Comics Regarding Unpopular Databases
Don't wanna be here? Send us removal request.
crudcomic · 12 years ago
Photo
Tumblr media
Political Durability
Salvatore Sanfilippo wrote a post yesterday concerning his own take on sexism in IT. Although I personally found the post merely sophomoric, and largely predicated on straw men (straw women?), it kicked off a firestorm on the Twitters. Complete with people actually claiming to stop using Redis in response. I'm not sure which is stupider, so I'll instead just make fun of them both. Then, I'll throw Oracle in just for good measure, because it's an easy target.
3 notes · View notes
crudcomic · 13 years ago
Photo
Tumblr media
Highly Available?
This is my last retcon of an existing comic, more appropriate for print. I promise the next one will be all new material.
I would like, if I may, to share my thoughts on the term "highly available", for a likely controversial reason. The word available, the "A" in CAP, is a binary thing when it comes to distributed database architecture. Either the database is available (always serves requests, as long as a node can be connected to) or it may be unavailable (certain nodes will be unable, or flat refuse to serve requests in some cases). That's the crux of my distaste: you can no more be highly available than you can be highly true.
Strangely, consistency has the same constraint, but this seems more commonly understood. The term "eventually consistent" skirts around it by admitting it's not consistent, it just attempts to be. It's consistent enough, and everyone accepts that. But "highly available" does not imply less than perfectly available, but rather moreso. Worse still, "highly available" has been increasingly marketed to be congruous with low latency, which really, they are only tacitly related.
Given the ACID-BASE comparison, I actually prefer BA (basically available). HASE would be very misleading (and kills the chemistry joke, to boot). So what's the alternative? How about high yield... "yield" being the percentage of requests that receive a response. It implies nothing but the truth: that your "high yield" database is no more available than my "low yield" one.
2 notes · View notes
crudcomic · 13 years ago
Photo
Tumblr media
From the Department of Redundancy Department
This is a comic about redundancy. It's also redundant since it's a redone version of my second comic. The difference is, this shall appear in my upcoming book, Seven Databases in Seven Weeks.
Something you need to understand about replication: it exists to improve read yield, uptime, and recovery. That's about it. What it won't do is is increase write availability or allow you to scale horizontally like you can with sharded data. I just wanted to make this clear. It seems a common misunderstanding about replication that, since it is "distributed", that equals "scalable".
There are largely two strategies in the task of horizontally scaling data: replication and sharding (sharding existing to stretch resources by dividing computation across several machines or, in weird cases, CPUs). This is why I hate the word distributed... it's too vague to be useful.
Mongo supports both and manages them in quite deliberately. Contrast this to something like Riak which shards and replicates by default, and you must be very deliberate to turn one off. I won't venture to say which is better, because it's largely driven by use-case. But what I will venture to say is that one is much easier to manage operationally. Then again, Riak doesn't have a company to manage it for you.
23 notes · View notes
crudcomic · 13 years ago
Photo
Tumblr media
Ulterior Motives
I was proud of the internet today. The anti-SOPA boycotts where epic, forcing 18 senators and counting to drop SOPA support for this nightmare. That's good news for everyone who lives on the internet, which I'm pretty sure is everyone (based on a polling of folks within shouting distance... which seemed sciency enough). But I can't help feeling some SOPA blackout sites were in it for other reasons. This doesn't phase me, per se, since I don't care about the motives of anyone who helps me out... not really.
In other news, Amazon DynamoDB was publicized today, which should make anyone considering a run into a Riak DBaaS to crap their pants good and proper. Not that we don't all pine for a Riak host, and there are certainly improvements made against the basic Dynamo offerings (of which Riak is of its ilk), but it's hard to compete with Amazon's hardware. Incase you were unaware, their Dynamo servers will run on a vast field of SSDs (SSD stands for "Sexy State Drive"). Even God is jealous of this storage power.
Amazon threw down the gauntlet à la mode. Good thing too, since 100% of all people hate EBS.
In related news, I look forward to "Sloppy Quorum" working its way into common vernacular.
23 notes · View notes
crudcomic · 13 years ago
Photo
Tumblr media
Happy Halloween
Remember to backup your data, or your phone will ring at 2am and you'll die seven days later - or worse, you'll have to explain to your customers why their data is missing.
15 notes · View notes
crudcomic · 13 years ago
Photo
Tumblr media
In Defense of DBaaS
Expect more of this behavior. Much… much more. This is why I use a hosted solution. I’m "That Guy". I certainly know how to mapreduce, design datastructures, index correctly, write efficient queries, and cetera. Such operations are always on my mind like a brainslug. That said, I know as much about NoSQL administration as I know about speleology (suffice to say, I'm cripplingly claustrophobic).
Here is a fact: managing data servers is a discipline in amongst itself. The learning curve isn’t steep, but the heights are vertiginous. Simply installing and using a database does not make you a database expert, any more than getting a new drum kit for Christmas makes you a percussionist.
Offloading data storage to a third party might feel risky to those of us raise on a steady diet of Linux ownership. But there’s that old African chestnut that says: “If you want to go quickly, go alone. If you want to go far, go together.” Ironically, offloading will help you launch quicker and go farther. If you’re worried about vendor lock, or whatever, don’t. There are almost universally more than one provider per database implementation, and you can always take your toys and go elsewhere.
But the best case I can make is this: I like to sleep at night - let someone else worry about scaling out, and applying 2am zero-day security patches. I'll see you in the morning.
25 notes · View notes
crudcomic · 13 years ago
Photo
Tumblr media
The Downside of Crowdsourcing
Update: After only 3 hours of uptime, I took my Riak server down. It was hacked (whitehat, thankfully) by Aphyr. I love you, Internets. I'd say how he did it, but he's promised to do a write-up. I don't want to spoil the fun.
Update 2: Well, here's the fun.
I actually did this: http://databevy.com:8098. Feel free to play... explore! Just please keep the penis pics to a minimum.
People ask what I find so intriguing about Riak, especially in light of its considerable downsides and I find it hard to answer. It’s not merely its architecture, or tweakable consistency, but the fact that it speaks web more than any database out there. All interaction is an HTTP request or response (like the web) - in the truest sense of the word all. Sure, at its heart Riak is just a key/value store, in the same way that at its core The Beatles were just a band. Riak keys values by URL (like the web). Values are just base64 encoded data (like the web), meaning that if you store an image with a MIME type of jpeg, you GET the URL, and Riak returns a jpeg.
You point out that other DBs have HTTP interfaces, but it’s not the same. I can’t hit a URL in Couch and get anything other than a document, even if I encoded an image inside… it still requires some intermediary to decode that image and get the juicy picture underneath (although, to be fair, Couch has built-in this intermediary, called inline attachments. But let's be clear... it is a workaround. The web doesn't operate on the "attachment" principle). Riak has no such decoding, because it knows fuckall about documents. Actually, let me back up… Riak did know nothing of documents as of two weeks ago. The new 1.0 release has support for secondary indexes, so pushing in JSON data means you can query that data directly. So… I guess I should scratch a "downside" off the list (Though, to throw a bone to Couch, Riak's understanding of documents is very, very weak by comparison).
15 notes · View notes
crudcomic · 13 years ago
Photo
Tumblr media
Relational Management Systems
I accept that we spend most of our lives talking over one another. And though I can never match my wife’s unbridled enthusiasm for bridal events, at least she tries to listen when I geek out. I could reasonably argue that my interests are intrinsically more stimulating, even with the understanding that one's worldview is irreparably colored by egocentrism. But having seen the logistics of wedding dress construction firsthand I may have to rethink my stance. I’m pretty sure those things contain girders, and what’s the tensile strength of ribbon, anyway? It's hard not to be awed by the sheer physicality of a 110 pound Atlasette hauling around 220 pounds of lace.
But on the other hand, I can store wedding photos in the Riak cloud in a world-spanning network consisting of billions of dollars of research and several pinnacles of human achievement. Yeah… yeah. I tried, honey. But high technology is just way cooler.
8 notes · View notes
crudcomic · 13 years ago
Photo
Tumblr media
Differences
In marketing scenarios, one finds that the more similar two products are, the more granular differentials are in focus. As James Carse says: Belief is always belief against something. There are no large followings of people proclaiming the sun will rise - because there's no one who doubts it. This reductio ad absurdum collapses many products down to the point nearing identity; save some minor salient feature, which is then magnified to inappropriate proportions. But in actuality, Burger King is just McDonald's, replacing the creepy clown with a creepy king; Lutheranism is just Catholicism without the Pope.
It's from this vantage we can view the classic OSS DB feud: MySQL v. PostgreSQL. Forget that they’re both ACID compliant, (mostly standard) SQL RDBMS, with over a decade of continuous production use and development. MySQL has swappable storage engines (MyISAM, InnoDB, etc), PostgreSQL allows customized contributions (GiST indexes, or cube-datatypes). On the other hand, MySQL sucks at multi-core processors, while PostgreSQL’s scale-out is not as mature as MySQL. MySQL can be friendlier to non-expert users (selecting non grouped-by fields won’t fail, and it’s always had an amazing console), and PostgreSQL can claim to adhere more to (at least the spirit of) the standards.
I offer this topic to the sacrifice, because just last week I conversed with a chap to whom PostgreSQL was a poor misunderstood genius, locked away in the closet, whilst MySQL was just the dumb quarterback, galavanting around town and sharing his herpes. Being the database votary I am, I marvel at the ongoing polemic. I had hoped the arise of NoSQL would temper this debate (common enemies being the greatest of peacemakers), but it seems too deeply embedded - like bones.
So let me be the condescending adult and say: You’re both beautiful children, in your own special way. You are 99% of the same. Now please stop fighting. At least we can all agree, you’re both better than Oracle.
18 notes · View notes
crudcomic · 13 years ago
Photo
Tumblr media
Blasé
I was on the road last week, so this week's comic was originally created and uploaded to my backup grid on an airplane. I literally pushed data to the cloud amongst the clouds.
So as I sat in my chair in the sky, with a portable supercomputer and drawing in $5000 Photoshop software by way of an augmented reality pen, I was surprised to find myself unimpressed. It was downright mundane. That’s how I know we’re in the future. We suffer from an ennui once reserved for spoiled sons of Dukes.
So I scrapped the work, and instead drew a flowchart on a napkin using a sharpie. Technology must remain compelling, lest it make us complacent.
29 notes · View notes
crudcomic · 13 years ago
Photo
Tumblr media
Define "Any"
The introductory idea is always law—it becomes the basis for future comparison. Humans have a general aversion to learning anything twice, so old ideas tend to be sticky. They must be argued against. Traditionalists can always make the fair point: “Well, it’s worked thusfar…”. Longevity is a powerful statement.
So it's therefore no surprise that graph databases have taken so long to reach widespread adoption. Although they map more naturally to the style of object-oriented programming espoused by the most popular modern frameworks (Rails, Django, JEE), they trend against the current reign of RDBMS terror.
My wish is, as node.js continues its NASCAResque rise (get it?... fast), one of the current MVC/MTV contenders (express.js, bricks.js, batman.js) will see the light. Please use Neo4j as the default data layer. Hell, I'd take OrientDB-Graph, GraphDB, HypergraphDB... even FlockDB (just kidding... no).
12 notes · View notes
crudcomic · 13 years ago
Photo
Tumblr media
In Defense of Sophistication
I know Arel is old news to everyone, so this strip must seem about as timely as as Google Wave joke. But for someone like me who professionally death marches in the Rails 2 ghetto, I yearn for the urbane. It’s on my brain too often to be healthy.
ActiveRecord always felt hindsighted, and Arel drug it into the 21st century. Now if only database constraints were managed, I’d be slightly less suicidal in the morning. Avoiding DB constraints to keep parity with SQLite3 is, as the French say, le retarded. It’s like keeping the highway speed limit at 10 MPH because some people want to drive golf carts.
My general belief is this: Arel is awesome, but like chocolate or reading Reddit, going overboard is a Bad Thing™. Adding custom Arel operations, just so you can write Ruby code, won’t make this PostgreSQL snippet any clearer (find the closest distance genre points within a bounding cube). And really, the SQL won’t port anyway.
SELECT *, cube_distance(genres, '1,0,0') dist FROM movies WHERE cube_enlarge('(1,0,0)'::cube, 2, 3) @> genres ORDER BY dist;
TL;DR: Arel is awesome, but SQL is still awesomer.
21 notes · View notes
crudcomic · 13 years ago
Photo
Tumblr media
Couch + Mongo = BFF?
I’m very nearly embarrassed by the play on words here. Nearly. I really just wanted to draw Candy Spelling's house.
Someone please cite the law that decrees a single project shan’t use both Mongo and Couch—beyond the case that people just don’t. Their sweet spots fall distant enough that any union should breed a reasonable synergy—yet they are often cast in adversarial roles (despite clear protests from those-in-the-know). I like to believe the so-called haters are mythical—mere fanboys unable to see the other clearly.
You may believe I overstate their differences. But I remain uncertain of any evidence that they both: (A) store documents, and (B) implement mapreduce, is exculpatory. To the contrary, my left brain points out that data migration between them is so trivial, they should attract magnetically.
Couch pwns (as the kids say) at replication and availability (just embed the fucker… how much more available can you get?). But if a case exists for how it scales x-box-huge while remaining consistent and easy to query (mapreduce ain’t simple) vis-à-vis Mongo, I’ve yet to hear it. Couch progress sidles on, but Mongo is currently easy to shard (read: scale) ad astra, and ad-hoc query's out of the box. It still makes me giddy. On the flip-side, good luck dealing with master-master replication in Mongo, something Couch handles beautifully. This behavior is perfect for synchronizing nodes across unstable networks. “Durable-Availability” is Couch’s middle name. “Easy-Consistency” is Mongo’s. (I know, right? Those are weird middle names…)
It may be a fool’s errand,  but I’d love to see a Couch/Mongo hybrid—a huge Couch cluster to synch millions of devices, then data replicated to a Mongo backend for normalizing and reporting.
TL;DR: if you’re Craigslist, use Mongo. If you’re Doctors Without Borders mobile clinics, use Couch. But if you’re Four Square, I’d consider using both.
15 notes · View notes
crudcomic · 13 years ago
Photo
Tumblr media
Redis is Right-sized
Todays strip was inspired by 7db7w co-author Jim, who pointed out that though Redis is like Memcached only moreso, it's considerably more Volvo than M1A1.
Let me flip my hole cards pre-flop by admitting that I love Redis. I was a member of that mistaken mass who swallowed the myth of the moribund key/value discipline. KVs are in fact not limited to mere set/get operations. Little extras like atomicity, durability, and complex value queries needn't inherently break an implementation of its familial speed requirement.
One year ago I was warned that Redis performed poorly, advice which I conveniently ignored, based on the reality that it did perform in practice. After some investigation, I decided my sources must all subscribe to the same Caching for Dummies newsletter. Bad benchmarks are amongst the worst sins unrecognized by the Catholic church, second only, probably, to believing them. There exist real reasons to avoid Redis (lack of consistent hashing, for one) without concocting false ones. Do not let rumors of Memcached outperforming Redis be your reason for skipping out (even if they were true).
That's good advice, as a matter of law;
Eric Redmond
17 notes · View notes
crudcomic · 13 years ago
Photo
Tumblr media
On Neo4j and Graph-based Arguments
A Salutation
So you’ve stumbled upon this site. Although unlikely to LOL in the descriptive sense, you may nod knowingly strip-wise, which places you in quite a charmed circle. Congratulations, you are a data nerd. Considering our shared inquisitive natures lends me minor predictive powers over our collective states of mind, here are what your questions might very well be, preemptively answered:
What’s all this then?
This non-serial series of comics holds the database field as its center of gravity. Though, ownership being what it is, I reserve rights to break orbit.
So you only cover the database subgenre?
To call this a comic subgenre may be too permissive, or rather, too insulting to truer subgenres of such stature as technobilly and mathcore. This is new ground. Just call me the Penny Arcade of database-cum-programmer-web-comics, albeit a scalar amalgam: I am Gaycho. No, wait…
Why?
What began as a few scattered drops for an upcoming book, is coalescing T-1000-style into a solid, steely, sexy form.
How are these comics getting better?
I’ve done strips for years, but this Wacom tablet is a bitch to wrap one's head around (and by head, I mean hands).
I know we all hope this continues. The Venn Diagram of database experts and humor is a sad intersection, so minor as to be tangential. I hope to fatten this class like an Irish Home Ec.
Stay tuned;
Eric Redmond
5 notes · View notes
crudcomic · 13 years ago
Photo
Tumblr media
Stick with Postgres
35 notes · View notes
crudcomic · 14 years ago
Photo
Tumblr media
Riak On Availability
4 notes · View notes