Tech Blog
Machine Learning
November 29, 2020

Takeaways from Interspeech 2020, Part 1: attending a virtual conference

Théodore Bluche

Principal Machine Learning Scientist, Sonos Voice Experience

Clément Doire

Senior Signal Processing and Machine Learning Scientist, Sonos Voice Experience

Clément Doumouro

Senior Machine Learning Scientist, Sonos Voice Experience

Alice Coucke

Head of Machine Learning Research, Voice Experience

a high dimensional rubick's cube

Last month, from October 25th to 29th, the Interspeech conference, one of the most prominent conferences in speech processing, took place. Initially planned to happen in Shanghai, China, it was converted to a fully virtual conference due to the ongoing pandemic. We attended the conference to learn more about the current trends in speech processing systems and to present a paper we submitted on keyword spotting applications.

In this series of two posts, we would like to share our impressions and takeaways from this very instructive event. Part 1 focuses on our experience attending virtual conferences, which are probably going to keep happening in the near future, and discusses the advantages, disadvantages and opportunities we see in this new format.

Part 2 of this series describes some scientific highlights in the field of speech recognition drawn from papers presented at the conference. We also take this opportunity to tell you more about the paper we presented this year, entitled “Predicting detection filters for small footprint open-vocabulary keyword spotting”.

Despite the current global pandemic, Interspeech is the fifth machine learning conference we have attended this year, after ICASSP (international Conference on Acoustics, Speech, & Signal Processing) in May, ICML (International Conference on Machine Learning) in July, ISMIR (the annual conference of the International Society for Music Information Retrieval) in October, and just before ADC (Audio Developer Conference) in November. All of them have been converted to fully virtual events. Adjusting to a virtual conference is surely not an easy task. While conferences are great occasions to take some time to present and receive detailed information about the latest research in a given field, they also offer several other opportunities for career growth. With the extensive use of arXiv in Machine Learning, conferences are no longer the epicenter of discovery for original research. Instead, physical conferences offer, or rather offered, a space to ask questions, network, have informal chats, bump into former colleagues, and even make friends who share similar professional interests.

Virtual conferences represent a wonderful opportunity to reinvent and rethink the sharing of scientific knowledge and the interaction between researchers they generate. After attending a few this year, we would like to share our main takeaways, hoping this will be useful to the community. Here are the advantages and disadvantages of the virtual conferences we have seen so far, with specific examples from Interspeech 2020.


Make your own program - Anyone who has previously attended a multi-track conference knows the feeling of reviewing the program, reading the abstract of each paper that will be presented in a given session, and making a mental time table in order to appropriately schedule for each talk that peaks one's interest. Inevitably, you also know the feeling of missing out on interesting presentations, and going home with a list of papers to check later. With online conferences, you can virtually attend all the presentations and not miss anything. Alternatively, you may choose to attend single paper presentations instead of whole sessions. One of the biggest benefits of this online system is that you can rearrange the program to customize your sessions.

Take your time - The presentations are given in videos that you can play, pause, fast-forward screenshot, etc. In other words, the note-taking has been made a lot easier, you can attend the presentation at your own pace. You also choose when the next presentation starts, so you might decide to go read their paper, or check out that nice reference the presenter just talked about.

No time and space limitation - It goes without saying, in this online setup, you are no longer limited to the times defined in a fixed program. Some of the content (keynotes, Q&A sessions, ... ) is of course still scheduled at a given hour, which should now be compatible with many time zones. To attend the paper presentations, though, you may choose to wake up at 6AM or to watch them in the afternoon. It is also possible to attend the conference from anywhere in the world, in the comfort of your home or in the office. It has several advantages. Think of conferences like NeurIPS (Conference on Neural Information Processing Systems), which have recently become so popular that the tickets sell out faster than for a pop-star’s show, and have therefore resulted in a lottery system for securing these tickets. With online conferences, not limited by the size of a physical venue, anyone can attend from anywhere. This might be easier for people with disabilities, caring responsibility, or visa issues for some venues, let alone the cost savings for travel and accommodation. One last important thing to note is that scientific conferences come with an environmental cost: reducing air travel is also a responsible thing to do in the context of climate change

How to make the most of it

FOMO is time-consuming - Perhaps the biggest downside of being able to see everything is that you are tempted to do so. Without a schedule, there is nothing limiting the number of 15-minute presentations you watch or the number of times you press the "pause" button in the middle of it. You're only limited by the amount of time you are ready to allocate, which, if you're passionate, can be a lot more than at a physical conference. With everything available asynchronously, the temptation is high to try to see everything, which, of course, is not humanly possible in one week. That is why we recommend first deciding on the time you want to allocate to attending the conference before browsing through the program -- and try to stick to it. Adding around 50% of the time to each presentation might be a good idea -- because let’s face it, you will hit pause! Of course it requires information and resources to be readily available on online platforms, which only scratched the surface of what could be done.

Delayed Q&A and interactions - Since you are watching the presentations whenever you want, the authors may not be available to answer your questions right away. Although the organizers (of both ICASSP and Interspeech) did make sure there were dedicated Q&A sessions, the interaction becomes less natural and you potentially have to keep a lot of information in mind to be ready to engage in the conversation during the dedicated slot. Moreover, in old Q&A session formats, the author could jump back to a given slide, present backup slides, or point to a specific part of a poster. This might be more difficult to do online. The live interaction you would get at a poster or a coffee break is also naturally a lot more difficult to achieve in an online setting. In that regard, there is much room for improvement for online conference platforms. ISMIR 2020 and ADC 2020 for instance had set up chat rooms on Slack and Discord with one channel for each presentation where participants were able to ask questions directly to the authors during the whole week. A couple days ago, NeurIPS 2020 (from December 6th to 12th) announced their efforts towards addressing the known issues with virtual conferences. These include hosting poster sessions on Gather Town, a video chat where people are given an avatar and can walk around in a simulated 2D interactive world.

How to (randomly) meet someone - The most prominent drawback of a virtual conference is the lack of live interaction, reducing the opportunities to network easily. In physical conferences, it was easier to get someone's attention at a coffee break, to get a chance to chat with that researcher you admire, to be introduced to people you might have an interest in talking to, to randomly bump into a former colleague that you didn't know would be here at this conference, or even to make friends that you will meet again at the next conference. This is, of course, also the most difficult point to address, since live and random interactions have also vanished from all the aspects of our locked-down lives. That being said, there was some effort at ICASSP to organize social events over Zoom. ADC used a platform called Remo to recreate the experience of being at a physical event by having random video chats around "rooms" and "tables". Besides, if you've been to NeurIPS before the pandemic, you probably noticed it was becoming difficult to meet someone in particular even in physical venues with thousands of attendees.

How was Interspeech 2020?

In addition to the conference proceedings, Interspeech attendees had access to the 15-minute video presentation of every accepted paper, to live keynotes and Q&A sessions on Zoom and to a 90-second highlight video for each paper.

Presentation videos - Having a long oral presentation for every paper, as opposed to physical conferences where most papers only have a poster presentation, is actually rather nice. For the author, it is an opportunity to explain their thesis and contributions in a structured and organized way, and to reach a larger audience. For the attendee, it provides better access to more content and information, by allowing one to avoid the missed sessions or over-crowded poster sessions. On the Interspeech website, there was a page with all the videos of the conference, which was quite convenient to minimize the number of clicks to get to a particular one, and the number of opened tabs in the browser.

Live Q&A sessions - The Q&A sessions were live Zoom webinars of about an hour each, organized in parallel tracks. The chairs played the highlight video and allowed for a few minutes of questions for each of the ten papers in turn. This was a great way to remind the audience of the content of the paper -- which nicely addresses one of the delayed Q&A issues mentioned above. However, it was still necessary to read the paper or watch the full presentation beforehand to get the most out of these sessions, which was hard to do, especially since they were scheduled early in the week. Notifying attendees in advance would have been a good idea, knowing that everything was available online one week before the conference. Same goes for the authors, who were not informed of how highlight videos would be used. To go a bit further, it would have been nice to have a format more closely resembling that of poster sessions. For instance, one can imagine adding a virtual conference room for each paper, allowing authors to share their screen and attendees to drop in and out and receive a brief presentation.

Highlight videos - 90-second highlight videos for papers selected for (what used to be) poster presentations were accessible on the online platform, but unlike the long presentations, you had to click on each paper to see it. Highlight videos are a very good idea. They allow you to quickly know whether you want to know more about a given paper and help you select which long presentations to watch. They are a good assistance to the planning of your conference week, and allow you to do the kind of window-shopping you would do in a poster session. A concatenated video containing all the highlights of one session would have been truly awesome in order to help you to decide whether you attend a session or not, or the select which papers in that session you are more interested in. The main issue, as mentioned previously, was the lack of clear information given to the authors about how these videos would be used. Some of the highlight videos were merely the outline of the full talk, or an introduction, and did not always highlight the main contributions or findings. With clearer instruction and some education on how to make a good highlight video for the presenters, these videos are probably a good feature to keep in future virtual conferences.

This concludes Part 1 of our series on Interspeech 2020: attending virtual conferences in the pandemic era. We hope it was useful! Part 2 will focus on the scientific highlights of the conference, feel free to have a look! It is going to take a while before we can physically attend conferences again, and the way scientists interact and share knowledge needs to be completely reinvented. While there is much room for improvement, future conferences are thinking hard about these issues and are already proposing innovative solutions. We look forward to attending more virtual events in the future and witnessing our community’s progress!


Continue reading in Machine Learning:

© 2024 by Sonos. Inc.
All rights reserved. Sonos and Sonos product names are trademarks or registered trademarks of Sonos, Inc.
All other product names and services may be trademarks or service marks of their respective owners. Sonos, Inc.