Don't wanna be here? Send us removal request.
Text
Here are the Kodama performances I recorded: https://drive.google.com/drive/folders/1PHmXYrsjcdZXyXOAB3RlycfEQxI95Kab?usp=sharing
Note that some of them only play in one channel due to an error in the code at the time of recording.
0 notes
Video
vimeo
This video acts as both a tutorial and demonstration of a Kodama performance.
The first 2 minutes show how to run Kodama.
The rest of the video is a slightly abridged but otherwise unaltered performance of Kodama.
Additionally, I show you which Youtube videos I played to Kodama during the performance.
Skip to 11:00 to see Kodama respond to the user whistling into the microphone (given the dataset of sounds collected from the Youtube videos).
To be very clear: the performance starts at the 2:17 minute mark.
0 notes
Video
tumblr
Here is a quick video of a Kodama performance edited to look and sound nice. The plotting of the samples is sped up (fitting an hour worth of dataset changes into a minute). The accompanying audio is not sped up it is a slightly spruced up version of the audio performance cut to fit into a minute.
What Kodama was listening to was ambient albums, studio acapellas on Youtube and ASMR for this particular performance.
0 notes
Text
How this Blog Works
I kept a work log in the form of a text document during development of my project. I left putting it up on this blog until a few days before the deadline in case you are curious about the submission dates of the various blog posts not corresponding to the work log entries date.
Each post in this blog will be directly from an entry in my work log (which can be viewed on the github) as well as added notes as I double-check them.
The projects github: https://github.com/ratmother/Kodama
0 notes
Text
May 2020
This month has mostly been about writing the final report, doing research and finalizing the code. I’ve made many small(ish) changes to the code such as fixing the graphical plotting issue (fixed version pictured) that I did not notice until recently( I was not clearing the subplots so the sample points which were removed still appeared on the graph). I also added reverb effects to the SC code and the output sounds a couple magnitudes better. Additionally I added a longer onset to longer sounds in SC for a more relaxed effect. The audio output is very much like I had wanted it to be early on, an ambient echoing spirit sound, but I am somewhat disappointed that I did not make it more robust. If I carry on with developing Kodama after submitting it for marking I will focus on making a compositional system which takes the audio features of the input audio buffer and uses it to make more interesting choices (Perhaps using an audio sequencer which is aware of the recent sequence of audio features from the input and decides on the placement of sounds in the sequence based on this. A granulation process to soften or stretch sounds when the input audio is detected as soft and slow. Also pitch-shifting sounds to match the input pitch, speeding up or slowing down sounds to match the bpm of the input, reducing or increasing transients sounds to match the input - all based on the audio input buffers features).
While writing the final report I did notice silly ‘features’ of my code that did not make sense which I promptly changed (e.g. input samples which pass the distance check and are stored cause the nearest neighbors to that input sample to not be sent to OSC, my reasoning was because they are more distant than nearest neighbors to input samples which did not pass the distance check they might sound more inaccurate -- however this inaccuracy is desirable for creative moments in the performance of Kodama.
0 notes
Text
April 2020
Added file removal queue. In prior iterations I would remove files immediately upon detection that they are to be pruned from the dataset. This results in problems with the audio generation system as it queues files for playing and some of those files ended up deleted before use. This queue allows up to 100 files to be stored, and deletes older files past the 100 mark which gives plenty of time for the audio generation system to catch up. By sending SC arrays containing lists containing [file name, morality, size, etc] I can control SC completely from Python without relying on SC to manage control of the performance. Everything is working as expected however there is work to be done on the performative aspect. Because of my cheap microphone it is mostly noise I hear coming back which is unpleasant, I’ll try to find a way to alleviate this. The SC plays the correct sound files without error but the effect is a delayed response due to the queueing of files, this needs to be changed.
I have changed the way cluster pruning works. Instead of pruning samples which are not ‘cluster centres’ a samples mortality is based on the probability of being inside a cluster. Samples outside of a cluster will accumulate higher mortality values and have an increased chance of being pruned. This is an improvement theoretically because it allows whole clusters to stabilize rather than the somewhat arbitrary focus on the cluster centres which did not seem to form stable clusters in my observations. I have also colour-coded the cluster classes that HDBSCAN discovers. The alpha-channel corresponds to the probability of the sample. (Pictured)
Note (Made May 28th): If I recall correctly, discussions with supervisor resolved around helping me with Supercollider coding at this point because I was (and still am really) quite unfamiliar with Supercollider. Luckily, my supervisor is good with SC and was able to provide me with simple code examples that I built upon.
0 notes
Text
March 2020
I’ve changed the way audio is recorded, instead of soundfile recording audio of a fixed length and each cycle waiting for the recording to finish before starting my system is now much more efficient e.g dim-threads targeting samples of 1 second long waiting for 3 seconds (if 3 second long dim-threads were present) before starting its thread now only wait 1 second. This is achieved by using a ringbuffer, something which I had used in the past but removed due to bugs. I fixed these bugs by making the ‘blocksize’ in sounddevice equivalent to the sample rate as well as making the code revolve around each dim-thread reading the dataset and being more independent as a result. Another key change was putting the audio stream onto its own thread allowing for the dim-threads to rely solely on the ringbuffer. In prior iterations of the code,a ‘cycle’ refers to the function ‘parcelization’ (now called ‘consolidate), this part of the code would start the recording process and read the dataset dim-wise, this function now acts as a waste manager. Each dim-thread decides what sample is to be deleted or kept, additionally a DBSCAN prunning function is called in the consolidate function as well, it working in much the same way as dim-threads and deciding what is kept or discarded. Consolidate reads the dataset and finds samples which are marked as ‘deleted’ and permanently removes the sample.
I have also reworked the immortality system, it now works on probabilities and the scale of the dataset. Instead of cluster centres being granted immunity, cluster centres ‘immortality’ parameter are not added to, whereas non-cluster centres are added to. As the dataset grows the immortality value is scaled in proportion with the dataset – importantly this scaling is done to every sample not just new ones e.g a sample with immortality of 0.1 after not being a cluster centre for one cycle becomes 10.0 later on in the running of the system despite always being in the centre. Because all samples mortality scales with the dataset, we can control how large the dataset should be without limiting the ability for Kodama to take in new information and lose old information which is no longer relevant.
All the code which deals with the emotion system or Openframeworks-Python communication has been removed from my github, I had changed direction months ago clearly but this makes it more official.
I updated the plotting function to plot all the dim-threads instead of just one. (Pictured)
0 notes
Text
February 2020
I’ve added an immortality parameter to each sample, if the sample is not selected by DBSCAN as a cluster center (a type of clustering algorithm which finds clusters of varying sizes and is quite good at ignoring noise) then on each cycle it loses some immortality and can be pruned. This allows for better control of the amount of samples going in while still allowing Kodama to retain samples which are at cluster centres for as long as they are ‘relevant’. After the previous discussions with my supervisor I decided to focus on testing the system this month and making sure that my system was accurately selected sounds. To do this I needed to use matplotlib to see what the DBSCAN clusters look like. I found that there was a problem after graphing the clusters, they were far too close to each other, this was due to a bug with how I was pruning input samples (it is supposed to only accept input samples which are distant to the nearest dataset samples and it was doing the opposite). Additionally, my SC code was not good as I do not know how to make efficient SC code, it wasn’t threadsafe and was unstable. Luckily my supervisor does and was able to provide me with a better foundation for my SC code. He suggested that I separate the dim-thread samples using chars, but I couldn’t manage to get that to work due to how I set up the communication. My plan is to have SC simply read the buffer length and deduce how to play it from its size rather than by Python sorting the samples by size for it. We also discussed adding feature weighting to the audio feature extraction function.
Pictured is the plotted DBSCAN data, red circles indicate cluster centers.
0 notes
Text
Jan. 2020
Attempted to move to Windows to use faster audio information retrieval algorithms and use certain audio generative C++ code or even Max MSP, it did not work out and was far more troublesome than nessesary. I moved back to Linux and I’ve decided to play the output audio in Supercollider (SC) and drop the C++ and Openframeworks side of the code. Technically the python wrapped Essentia is C++ but that is under the hood. Additionally I’ve implemented OSC (originally to communicate with Max MSP) and put it onto its own thread. I collect all the nearest dataset entries and append their file locations to a list and send it via OSC to SC to be played randomly. This is just to test how well the system is working in terms of accuracy and speed. I plan on making audio output come from SC and Python be the ‘brain’.
Note (made 28th May): I had begun to put a lot of work into the project at this point and adjusting to moving back to London, this caused me to forgo my usual extensive work logging. I had missed my first session with my supervisor due to the timing of my return to London. I recall we did meet in December and January and discussed the need for better observation of what my code was actually doing which led me down the path of graphical plotting with matplotlib, which proved very helpful. He mentioned that I should make sure that my sound-matching algorithm is accurate. I did by-ear tests of recall and continued to do my tests of Kodama’s sound-matching capabilities through by-ear tests throughout development.
My supervisor also gave me relevant papers to look at for inspiration and guidance such as the REAL-TIME CORPUS-BASED CONCATENATIVE SYNTHESIS WITH CATART paper (online here: http://recherche.ircam.fr/anasyn/schwarz/publications/dafx2006/catart-dafx2006-long.pdf) and the MASOM: A Musical Agent Architecture based on Self-Organizing Maps, Affective Computing, and Variable Markov Models paper (online here: https://www.researchgate.net/publication/317357259_MASOM_A_Musical_Agent_Architecture_based_on_Self-Organizing_Maps_Affective_Computing_and_Variable_Markov_Models). I wrote about these papers in my preliminary project report
My supervisor also suggested I work on the Kodama theme and look at different ways to take inspiration from Japanese cinema, specifically ‘ghostly’ sounds. This made me consider studying Kodama and Japanese culture for artistic inspiration, this actually resulted in a philosophical and social-theory based investigation into Shintoism and animism as applied to smart technology rather than cinematic sounds.
0 notes
Text
Dec. 30th – Jan. 2020
As mentioned in the previous entry I will now explain the some of the changes I’ve made to the system as well as new ones I’ve added in this period. I have reworked how I segment audio into slices and extract features from those slices to find the nearest neighbour in the dataset. Now only one slice-size (called dim in my code short for dimension, each data entry is sorted by the size of its slices) from the input can be saved to the dataset per audio buffer cycle(it is saved only if all the nearest neighbours to it are far away enough). I have replaced the feature extraction library I was using, Librosa, with Essentia. Essentia is a C++ audio analysis library with python bindings – it is much, much faster than Librosa. In my testing I’ve observed speed increases of around 40-60%. For more information about speeds see: https://www.ntnu.edu/documents/1001201110/1266017954/DAFx-15_submission_43_v2.pdf, which is a helpful guide to the available audio feature extraction toolboxes. I still use Librosa to split the audio into chunks initially, but this aspect of the code is likely to change in the future. Additionally, I have added threading to the python scripts and have observed speed increases of around 40%, despite python being limited by the global interpreter lock (GIL). I think that this is because the feature extraction and machine learning libraries depend on a lot of the processing to be done in languages other than python (probably C++), which frees the GIL for another thread to run. Here is some pseudo-code below for better understanding of the usage of threading I’ve deployed(segment-type being the ‘type’ of segmentation, e.g whether the audio is split into 8 slices or 2 slices etc. before feature extraction on each slice is computed):
INPUT → [Threader 1, Thread 1: Audio segmentation] (Organize dataset D by segmentation-type/begin feature extraction process on input) FOR N segment-types: [Threader 2, Thread N(segment-type)] ... (Processes audio features) → [N][input_data] When Thread 1 is DONE: // We find the nearest entries of D[N] to [N]. [Threader 1, Thread 2][Threader 1, Thread 3][Threader 1, Thread 4] … (N[1] nearest to D[1])(N[2] nearest to D[2])(N[1] nearest to D[2]) (If N[n] has got no nearby neighbours in D[n] we add N[n] to D[n]) // D[n] and N[n] refer to a list of data entries sorted by the segment-type.
0 notes
Text
16th Nov. – Dec. 30th 2019
I got a lot done in this period and made signifiant changes to the data pipeline, it is mostly documented in the changes made on github rather than through this work log, so I am going to compare the changes I’ve made to the previous versions on github in another entry to this log when I get the time. Quick summary: I added my own method of segmenting the audio data in both the dataset and the input, the scripts now split audio into chunks of various sizes and extract features from the chunks. In the retrieval part of the system it compares the PCA (using feature vectors) as before, but now instead of whole files or buffers it now only compares chunks of a similar size and retrieves the nearest chunks audio data. This is very expensive computationally, especially for ‘real time’ analysis so I did a lot of optimizing (to the best of my ability, its still quite slow). I’ve found this is more accurate then before and has more interesting results e.g Kodama listens to pop music and brings up a selection of differently sized singing samples, drum samples, and samples which sound similar to the musics content. I still don’t know if it is ‘transform-independent’ for lack of a better term, if the features I’m using really capture the essence of the sound-content rather than room, distance, space.
0 notes
Text
16th Nov. 2019
Well, it wasn't 'working perfectly', pyaudio was mangling the audio -- I recorded out what audio data saved to the buffer to a .wav and it was pure glitchy noise. To fix this, I increased the buffer length in the pyaudio stream. Things now do actually work perfectly, although the microphone quality is still impressing itself onto the results far too much. Today I've added a new feature, the script now will record 'abnormal' sounds, sounds which are above a preset distance-threshold ( the microphone inputs PCA coordinates collective distance from 5 nearest sounds are what is checked against, this is done for each ringbuffer) will have their features saved and the audio rendered to a .wav file. I've also added 5 more buffers, each capturing a different length of audio to give the system both short term and long term audio signals as inputs to extract features from and then to compare PCA coordinates with the dataset. The result is working and has interesting results, the system seems to 'ease' into the microphone and ignores the original dataset after awhile because the microphone has qualities which are not present in the original dataset, but are present in the abnormal sounds (which are added to the dataset). The system is very responsive to its environment including its sensory organ the microphone.
0 notes
Text
15th Nov. 2019
More testing has revealed that erroneous selections of (not)similar sounds was due to my low quality cheap microphone, the mic is extremely bass heavy and has a ringing high pitch as well as mild-to-loud noise. This was a great find! I was wondering why so often my system would choose 808s, noise, and high pitch piano sounds all the time. So to find out that it was actually working perfectly was great. In fact its selection of similar sounds diagnosed the problems with the microphone fairly accurately, neat! It is likely to became more accurate at low volumes (as noted yesterday) because the microphones signal issues would be less prominent. I could EQ out the problematic frequencies but I'm a music producer, why don't I have a good microphone yet? No more cheapening out on microphones for me.
0 notes
Text
24th Oct. - 14th Nov. 2019
I waited to get a microphone before further testing (of audio_comparative_analysis.py) and have been testing the systems ability to correctly select similar sounds to microphone input for a few days. It wasn't easy to convert microphone input into something which librosa can use, I had to do a lot of digging and found this post https://blog.francoismaillet.com/epic-celebration/ which allowed me to rethink my method of microphone-to-buffer-to-librosa. I took the ring buffer in callback idea from here. Testing shows that the system detects things generally well now, whereas before it was chaotic and impossible to test, but further testing with a bigger dataset is required. The system will play drum-like sounds when listening to drummers, and will play human sounds when listening to humans, as predicted and intended the system will play sounds that are not immediately obviously related to what its listening to but shares qualities.
It was a bit of a shot in the dark, I didn't know if the methods I was using to store audio information into the buffer would come out the same as how the audio files are loaded in via librosa.load(). But because the epic-celebration blog post is doing something similar to me, I hoped that whatever the blogger is not showing about their system wouldn't be a problem and it doesn't seem to be.
A few problems I could see occurring in the future is:
How exactly does the difference in time between the audio files effect the system? The epic-celebration system requires similarly timed inputs because all of its dataset uses the same amount of time. What am I sacrificing (or frankly completely missing) by ignoring time? I specifically chose a non-time based MFCC feature calculator to avoid time, it simply collects an average of all the MFCCs in a file at a set amount, but I do not know how this might effect the overall system.
The system might be sensitive to different microphones.
It seemed like the system factored in loudness too much, when I turned down the mic input it seemed to become more accurate, it might not be a bad idea to add information to the data so that its not just using MFCCs. Or perhaps even out the loudness of the data somehow.
0 notes
Text
24th Oct. 2019
I've fixed up the audio analysis script so that it saves out feature data using pickle and reads it using pickle(avoiding having to process large directories of sounds more than once) and it works perfectly. Even with a far larger amount of files than before, the PCA real-time analysis is pretty quick. Obviously the intial feature extraction is much longer, but with the saving-out feature, thats totally acceptable. Openframeworks' soundplayer is responsive and can play multiple files at the same time, the interaction between the script and Openframeworks is very performative thanks to the simplicity of the information being exchanged. Right now, the system plays sounds similar to the microphone input (which is static noise at the moment due to some problems with my computers microphone) , 5 sounds at a time with each key press. I'll need to start creating the system which governs which of the nearest sounds to play based on their synapse information.
0 notes
Text
23rd Oct. 2019
I've added the following today: C++ reads the list of sound files saved out from python and creates a synapse for each one, placing all of them into a 'synapses' vector (the index-to-file is the same in this vector as the dataset index-to-file is in python) and it also reads the audio analysis python results which is a bunch of indexes (the indexes of the sound files nearest to the microphone input, this was accomplished last week talked about in previous entries). C++ then plays the synapses (plays their file-path and collects expression data from the user(s)) that the indexes point to. Effectively, C++ plays the sound files, synapses, known to be nearest in similarity to the incoming microphone input and records user expression data while they play.
I've mostly finalized whats been in the works for a few weeks now, adding for loops and vectors and connecting the audio analysis python script to C++ properly. Now to test how many much it can handle, I also need to get a working microphone to properly test the user experience when it comes to audio-reflection. I can still start to work on the 'recommendation' (checking synapses expression values, eg. play what is known to cause happiness/movement out of the nearest sounds to the mic input) system for the synapses in the mean time.
0 notes
Text
22nd Oct. 2019
Cleaned up code and added comments in preparation for some changes. First, I want to try to move the PCA/NNeighbour algos into C++, as I could use those algos with the synapses as it is too convoluted to move data back and forth to Python from C++.
Ok, so Ive decided to use Essentia to accomplish this goal but I've run into problems with building the library and using it, declaring an issue here https://github.com/MTG/essentia/issues/921 ... sadly in the process of installing ffemeg my audio_analysis python script is broken and gives me a cryptic error to do with ALSA. There isn't many replies to a lot of the issues, which is worrying.
Later on in the day I fixed the cryptic error with ALSA by changing the device index. Essentia's monoloader still doesn't work. I've decided against using Essentia for now given difficulty of using it and frankly being unlikely to receive help, plus I have what I want working in python.
0 notes