Skip to content

6 Evaluation draft

kacper edited this page May 24, 2023 · 2 revisions

TO DO:

Add:

  • tests
  • Future development

Redact (reduce):

  • goals
  • scope
  • process

Evaluation (1000)

In overview, the goal of this project was to construct an accessible, creative Python workflow, providing ML-informed capabilities to a digital audio environment of choice, with a main focus on granular synthesis implementations in those environments.

Goals

Notably, the research aims were centered on two main areas. Firstly, the way that ML tools relate to experimental methods of sound making given their accessibility, mostly taking into account technical requirements, prerequisite technical knowledge and the learning curve. Secondly, the technical efficiency and quality of each respective part of the assembled system and its output. Encouragingly, this work has mostly met the former part of motivations, as the conceived pipeline is easy to use for training and generation and has the potential to be embedded within various audio programming software. Training results are low in size which makes them simple to share. Pre-trained examples are included in the repo, effectively making them ready to use with the Max MSP implementation of a synthesizer provided alongside. Crucially, even given its many technical shortcomings, it proved to function as a substantially interesting musicking tool in both studio and live performance contexts, mainly due to the well-implemented. This being mentioned, the individual parts of the processing leave a lot of room for improvement and were significantly underappreciated during the project planning phase. The reasons for this will be discussed in more detail in the paragraph which follows.

Scope

The foundation for this project was an existing pipeline of audio analysis and classification derived from previous works with MIR. Its former parts were considered to be sufficient to drive the planned new features. Indendedly, partial but well-planned refactoring efforts would have to be undertaken, mostly with the real-time use and simplicity of the interface in mind. From the optimization point of view, audio segmentation and analysis algorithms are known to perform well on personal computer hardware. Tools such as Catart, at the time of release perhaps requiring a substantial amount of computing power can run without much issue on an average consumer laptop. Therefore, most difficulties were expected to occur in the design of an appropriate model architecture for modeling the time series of grain labels. Such architecture would also have to be simple and lightweight enough to be more appropriate for personal use, compared to deep, multi-layer networks contemporarily used for audio generation. What has emerged during project development is that not enough importance has been given to ensuring each of the components is adequately robust. Multiple benchmarks would force the direction into backtracking the errors over the preceding parts of the process. Without an optimal choice of extracted features, it is problematic to judge the performance of different clustering algorithms, as the outcome depends heavily on the former. Inaccurately assigned classes in turn create confusion when evaluating generation outcomes, which then affect the quality of synthesis etc. Each of the functionalities included is complex in a domain of its own. Likely, a standalone endeavor in crafting a sturdy method for either segmentation, feature extraction or unsupervised classification of varied types of data could prove to be a significant challenge. Comparably, implementing an example of a granular synthesizer in Max MSP which was to be coupled with the generation script turned out to be a relatively straightforward process. This is perhaps to do with my higher level of expertise and intuition when it comes to the programming environment, as most of the obstacles encountered were related to the specifics of the embedded Gen~ DSP programming language.

Tests

Multiple user tests from different took place, including:

  • in cloud training user tests (accessiblity)
  • real-time use user tests (accessiblity and quality of output)
  • three attempts of instrumental use in real-world performance situations

Process

Having no pre-thought structure employed apart from the vague plan created as a part of my project proposal, but a rather clear idea I was following. I began by sketching out the planned structure of the project which I then attempted to implement, advised to do so by my supervisor Dave Griffiths. Considering the timeframe of the project, I decided to organize it using Git and hosted a repository on GitHub after revising some knowledge about project management with Dave. Commencing mainly driven by inspiration to synthesize different ML methods was certainly a good and motivating start. Over time, it has evidenced itself as a significant though disorganized learning experience. A variety of ideas have been conceived and confronted during its evolution while a multitude of techniques has been attempted, whether in efforts of implementing parts of the pipeline from scratch, adapting code from existing examples or acquiring essential data science skills such as proficiency with Numpy and tensor operations. Concurrently, successfully designing a workflow employing many involved methods turned out to be out of my reach at the current level of expertise in audio ML. During the recent benchmarks and attempts at improving the technical weaknesses of the project, it became increasingly obvious that considering the scale of problems I was aiming to solve, I could have instead attempted to implement and adapt one of the workflows I have chosen as inspiration for this project. This could prospectively move me closer to accomplishing the technical goals I set for myself, also providing a more structured framework for my efforts.

Problem-solving

Most of the mentioned new features were implemented rather smoothly, with anticipated issues turning up later on in later elaboration. In such cases, I would usually open up one of the multiple repository issues, which helped me to plan better in the short term and have a better overview of the project and how it is progressing. I generally tend to focus on a single objective until it is solved, which tends to be problematic when more complex problems are encountered, as they can easily consume large amounts of time without much effect. Dave recommended I limit the time spent working on a specific issue, no matter how crucial, to about 90-120 minutes. Such an approach, when applied, appropriately reduced the number of hours spent on harder tasks, which often I was able to solve with a fresher mind when returning a few days later. Other stuff:

  • not many approachable materials exist
  • some cross-referencing across relevant projects can be done

Development

  • re-implementation of respective parts of the workflow
  • possibly involving scraping parts of existing projects
  • classification on its own might be enough
  • token generation would be much easier for midi so that's another thing
Clone this wiki locally