#it'd sure be nice if tumblr suggested the most recently used tags
Explore tagged Tumblr posts
div-divington · 11 months ago
Text
Tumblr media Tumblr media Tumblr media Tumblr media Tumblr media
Federal Bureau of Control
--> the Research Sector
5 notes · View notes
blech · 6 months ago
Text
tumblr-backup and datasette
I've been using tumblr_backup, a script that replicates the old Tumblr backup format, for a while. I use it both to back up my main blog and the likes I've accumulated; they outnumber posts over two to one, it turns out.
Sadly, there isn't an 'archive' view of likes, so I have no idea what's there from way back in 2010, when I first really heavily used Tumblr. Heck, even getting back to 2021 is hard. Pulling that data to manipulate it locally seems wise.
I was never quite sure it'd backed up all of my likes, and it turns out that a change to the API was in fact limiting it to the most recent 1,000 entries. Luckily, someone else noticed this well before I did, and a new version, tumblr-backup, not only exists, but is a Python package, which made it easy to install and run. (You do need an API key.)
I ran it using this invocation, which saved likes (-l), didn't download images (-k), skipped the first 1,000 entries (-s 1000), and output to the directory 'likes/full' (-O):
tumblr-backup -j -k -l -s 1000 blech -O likes/full 
This gave me over 12,000 files in likes/full/json, one per like. This is great, but a database is nice for querying. Luckily, jq exists:
jq -s 'map(.)' likes/full/json/*.json > likes/full/likes.json
This slurps (-s) in every JSON file, iterates over them to make a list, and then saves it in a new JSON file, likes.json. There was a follow-up I did to get it into the right format for sqlite3:
jq -c '.[]' likes/full/likes.json > likes/full/likes-nl.json
A smart reader can probably combine those into a single operator.
Using Simon Willison's sqlite-utils package, I could then load all of them into a database (with --alter because the keys of each JSON file vary, so the initial column setup is incomplete):
sqlite-utils insert likes/full/likes.db lines likes/full/likes-nl.json --nl --alter
This can then be fed into Willison's Datasette for a nice web UI to query it:
datasette serve --port 8002 likes/full/likes.d
There are a lot of columns there that clutter up the view: I'd suggest this is a good subset (it also shows the post with most notes (likes, reblogs, and comments combined) at the top):
select rowid, id, short_url, slug, blog_name, date, timestamp, liked_timestamp, caption, format, note_count, state, summary, tags, type from lines order by note_count desc limit 101
Happy excavating!
2 notes · View notes