This will be my experiments notes, ordered by most recent.

Core Questions

Can we get transformers doing diarisation
Can we create synthetic combinations of tts datasets to experiment with diarisation difficulty.
- Will this help generalise to different domains.
What happens as we increase the difficult of the data with interrupts, noise, number of speakers, loudness of speakers.
What if we were to speed up the audio (Just for compute efficiency, or slow it down. How would this affect the results?

Ideas (Not implemented yet)

Optimal inference parameter search
- Run through a variety of configs and samples with known outputs and figure out the best combinations
Training with CFG

February 21, 2024

Akroates greek for listener.

February 19, 2024

February 18, 2024