tech-and-software - Tumblr blog

tech-and-software · 8 years ago

Text

Backing up dynamodb without a big cost

I have a dynamodb table for a project, in which there’s only about 800 items. I used the data pipeline to do a daily snapshot to s3, so I can roll back manually to older data if I need to for some reason.

Data pipeline sets up a daily process which launches a cluster of highly powered EC2 instances and runs a Hadoop process on them to extract the data.

For the size of table I’ve got, this is hugely inefficient. The cost per month was about $23 (stupidly I didn’t do anything about it for a while). Ok, not a huge amount, but annoying enough. One of the bills shows this as 60 hours of m3.xlarge instance hours ($17.58) and 60 hours for EMR ($4.20). So that’s about 45 instance minutes per day to extract 800 items from dynamodb into a file and put it on s3. Bah.

So, I wrote a tiny script to do it instead, which runs as a lambda function. This costs $0. It probably doesn’t have any of the advantages of Hadoop (there’s no rate limiting, there’s no parallelisation). It also reads the entire table contents into memory before writing to s3. But for my use case and current table size it’s fine. It doesn’t have to spin up/spin down a bunch of m3.xlarge instances every day and I've saved $23 per month. If my table size grows I'll revisit the script, but as it stands it's ok.

Here's the script, for Lambda with nodejs. You'll need to adapt it for your own needs obviously but it's straightforward. The dynamodb.scan().promise() result gets passed to s3.putObject(), I chain the promise from that and return with the null callback to Lambda to indicate success.

/* jshint esversion: 6, node: true */ 'use strict'; const AWS = require('aws-sdk'); AWS.config.update({region: 'eu-west-1'}); const dynamodb = new AWS.DynamoDB(); const s3 = new AWS.S3(); const s3bucketName = 'TARGET_BUCKET_HERE'; const tableNames = { 'en_GB': 'TABLE_NAME_HERE' }; var params = { TableName: tableNames.en_GB, Select: 'ALL_ATTRIBUTES' }; exports.handler = (event, context, callback) => { console.log('Received event:', JSON.stringify(event, null, 2)); dynamodb.scan(params).promise().then(function (res) { return s3.putObject({ Bucket: s3bucketName, Key: params.TableName, Body: JSON.stringify(res) }).promise(); }).then(function() { callback(null); }).catch (function (err) { callback (new Error(err)); }); };

#aws #dynamodb #backup #nodejs #lambda #emr

0 notes

tech-and-software · 8 years ago

Text

Elastic beanstalk - nginx proxy error

I used Elastic Beanstalk from AWS to run the API of a recent project. The application was nodejs, one of the environments Beanstalk supports out of the box. The advantage is that it configures the EC2 instances, autoscaling group and load balancer, deploys your app and monitors the fleet automatically, so you don’t have to muck around setting that up.

Beanstalk configures nginx to run in front of nodejs, where it can serve up static files or proxy through to nodejs for dynamic content. The nodejs app is zipped up and uploaded to S3, where Beanstalk picks it up and installs it.

However, I was getting a proxy error when I deployed, which initially stumped me, and nothing specific turned up on Stack Overflow.

I figured it out eventually - I had generated the nodejs app using express-generator. This places app.js at the top level of the app. Beanstalk was running that, as I had left the Node command in the Beanstalk application configuration blank. app.js exits immediately, and Beanstalk was repeatedly re-running it. Meanwhile nginx can’t proxy through incoming HTTP requests to the nodejs server, as it’s not running, so returns the proxy error to clients.

The app is meant to be started with the bin/www script which the express-generator creates, started with npm start. So the fix was just to change the Node command to npm start, in the Beanstalk config. I noticed afterwards that the Beanstalk Nodejs documentation states the default order if you leave this blank is to try app.js, server.js then npm start. As app.js was there, but exits immediately, Beanstalk was getting stuck on that.

Here’s the setting in the software configuration section of the AWS console for Beanstalk:

#aws #elastic beanstalk #nodejs #express #nginx

0 notes

tech-and-software · 8 years ago

Text

A javascript concordance script

For Guvnut UK I needed a concordance of the Tory manifesto in order to create the list of most used terms. The words appear as pre-built search terms on the search page before a search term has been used. I had a quick search on the net but nothing seemed appropriate, so I figured I’d knock one up. There were 1 or 2 online concordance generators but they didn’t seem to work.

I wrote a nodejs script, which I’ve put on Github here as a gist called concordance.js. As you can see it’s short. The input is the raw text from the manifesto, which I extracted from the PDF by just copying and pasting into a text editor. Each line of STDIN is read, then a regexp splits the words by whitespace, then makes each word lowercase and stores in an array. An allWords object is built, one property per word. If a word already exists in it then the count for that word is incremented. Finally, to output, all the words are printed to STDOUT, one per line, with the count.

This outputs what’s effectively a CSV, which I imported to Google Docs, then sorted by the word count. I had to tidy up the resulting words a bit, and come up with a threshold for word counts to include. To create the object which drives the search page (it’s generated from a pug template) I wrote another script which converts the final word list into a static Javascript object embedded in the page with the link label and target.

It’s not very high tech but did the job. I reused the script afterwards to create another search page for Guvnut US, derived from Trump’s campaign promises, which you can see here.

#concordance #javascript #software #nodejs

0 notes