hi everyone and welcome today’s webinar on the state of voice in the contact center industry just to let everyone know we’re recording the webinar and we’ll be sending out a link to it later on this week so my name is Alex Fleming I’m the marketing manager at speech Mattox along with Haley from red box who’ll introduce introduce herself in a second we’ll be giving an overview on the state of voice in the contact center market and pulling out a few of the findings from a recent report we’ve published for those of you who aren’t familiar with speech Mattox we’re a machine learning company that specializes in speech recognition in its simplest form we take voice and transform it into text we do this with fantastic levels of accuracy in over 30 different languages on premise or in the cloud to unlock the value of voice this can be used in a variety of use cases from straight up transcription to captioning and especially in the contact center organizations can use speech Mattox as a tool to extract the value of voice and enrich their business processes and I’ll just hand over to Haley to introduce herself from her books hi hi everybody as Alex said I’m product marketing manager here at red box just a brief introduction to red box we’re a leading dedicated voice specialist the red box platform enables the capture of organized that wide voice conversations for 55-plus UC and telephony platforms and we ensure that our customers have full data sovereignty and through our open api is and a best abri partner ecosystem we’re really helping organizations to maximize their voice data so in today’s webinar we’ll be looking at the rise in AI and how voice technologies become more mainstream and becoming more more relevant able to add value in real world applications but there are a number of challenges that the contact centers need to overcome and we’ll also discuss these points so speech tech has been talked about a lot in recent years but I think it’s now reached a point where it can deliver on the promises made by applications like sentiment analysis and voice bots we want to talk about how speech Mattox speech to text and read boxes advance call recording can really drive a shift in the market and our predictions for the future in the industry in regards the applications of speech tag the areas it can deliver a real difference and the benefits that it can offer please feel free throughout the webinar to submit any questions that you might have we’ll have a segment at the end where we’ll try and address these questions so if voice is the lifeblood of the contact center we’re going to look at how important that it is within this organization so the rise in AI means that voice technologies become more mainstream Gartner recently reported that automatic speech recognition is now a level it can be used to drive productivity and has transcended the hype that was previously associated with this technology it’s reached a level of maturity where organizations can use SR reliably and accurately to transcribe voice interactions to unlock the insight held within voice this means that they’re able to use this rich source of data and the information that can be derived from it for things like supporting agents and augmenting engagement and workload done through the deployment of AI enabled tools and solutions capable of dealing with natural language interactions aka reacting responding with voice these can take the load off agents enabling them to focus more on interactions that can benefit solely from human engagement this can be done through transcribing calls and utilizing the data to fuel these capabilities the delivery of a deeper level of understanding about products services and the voice of the customer so speech tech can ultimately deliver insight in any environment that voice is involved with doing this automatically and seamlessly empowers existing technologies within a solution stack with the intention of reducing cost and customer churn ASR is now a practical tool that can be used to underpin business objectives for example contact center organizations have huge amounts of core data and see the potential that this data can deliver to them as part of the recent research that we put together and published in a number of places and you can find on our website we found that 78% of contact center professionals that have adopted speech technology find it to be the valuable of very valuable to their organization and 80% of contact center professionals have either considered or are currently implementing a voice strategy for their

business either now or in the next five years but it can be a challenge for companies to extract the value that’s held within this resource you thank you Alex and as Alex was just touching their voices very rarely stored in one cohesive place across organizations data and voice data in particular tends to be held in very siloed environments and historically this has made it very difficult to analyze especially at scale in an unstructured format we’re finding that call centers are increasingly looking at differentiation they’re trying to drive business improvements to maximize their revenue to reduce customer churn and improve employee satisfaction and productivity through personalizing customer journeys tasks automation and utilizing voice technology and AI tools many organizations are already seeing encouraging results voice is a very unique and rich data set and by combining this voice data with ASR tools a AI and analytics you can really start to deliver highly valuable and actionable insights through data sovereignty organizations are able to utilize the tools best suited to achieving their own personal goals without the restriction of being locked into one product or one application stack so with the rise of AI and Mo tools they really do rely on high quality audio coupled with high quality highly accurate transcriptions to deliver on the promises of those actionable insights many organizations have already tried to adopt these tools but through compressed or compromised audio quality they’re unable to realize the true value within their voice data text based and omni channels deliver a level of insight but voice is an especially salient daily data point as it conveys intent sentiment emotion action and context so as Haley said high quality recording of calls and transcriptions should really be at the core of any solution high quality audio optimizes the accuracy of transcription while ASR solutions can be engineered to deal with challenging audio the the honest fact is that the better of the audio quality the better the chance of delivering quality transcription and in turn the outcomes of other tools that are fueled by this transcription quality of the early stages of voice capture is essential to empower downstream tools and processes like analytics to get the best possible outcome from NLP tools you need the best possible income in a lot of cases organizations have already heavily invested in NLP tools and are already able to derive a great deal of insight from the Omni channel these channels our aim to better replicate the way in which consumers interact with each other and have attempted to and the contact centers have tried to an attempted to replicate this in the way that people should be able to interact with their businesses text-based channels like text BOTS email social media and instant message messaging well delivering new and more convenient ways to engage don’t have the richness of data like Haley said the voice can deliver and so insight that can be derived from these sources is less than their voice based equivalents the recent adoption of NLP tools has demonstrated that organizations have made both a technical and financial commitment to better understand the interactions of their customers and ASR enables these tools to leverage voice in a text-based format high quality core recording as I said is essential deliver some of these important touch points that was highlighted in our research these points included highly accurate transcription at high speed and in real-time at a competitive price in addition it was also identified that for contact centers real-time delivers more value than transcribing pre-recorded media files this is totally understandable when you think that a lot of time that there’s a great deal of value that can be leveraged from delivering services on the fly which is the way that a lot of these calls happen with agents on-premise deployments also strengthen real time use cases while data transiting to third-party cloud solutions is of course rapid with today’s internet this process opens up a potential risk of interception and means that real-time services could be affected by latency this also requires the contact centers to trust the third party cloud provider that they’re investing in on-premise solutions micro mitigate both of these issues as well as

ensuring that data remains within the organization’s secure and trusted environment the nation of voice-to-text means that contact centers can drive the performance and the capabilities of the existing tools within their solution stag it’s not enough these days to just record conversations anymore these are expensive to store due to their size hard to interact with and extract insight from due to the format and in discovery use cases where specific content of the call is needed this discovery is time-consuming and expensive as Haley said the storage of recordings is also very disparate and so can be very challenging to interact with with the adoption of speech tech becoming more and more popular organizations are focused on unlocking the value of these recordings to not only better understand their customers and how to improve their services but also leverage new speech based interfaces to deliver innovative solutions to maintain relevance in their market at the moment speech technology and speech solutions do enable organizations to differentiate from their competition but personally in my opinion I think this will be quite short-lived in this space and I think we’ll soon approach a time where speech technology within these markets is a standard and not a point of difference our research discovered that voice technology helps to uplift the following use cases in particular IVR transcription has the potential to eliminate touch-tone menu navigation instead utilizing the voice for navigation journey contact centers can all make their augment their workforce through the utilization of synthetic agents powered by automatic speech recognition delivering a hybrid of both man and machine this then delivers operational efficiencies to the contact center with a capability to deflect high volume low skilled jobs from agents and while also delivering improved customer service the reduction of call wait time a perfect use case of this would be password resets compliance the discoverability and interaction and exposing of content within cause is key to monitor compliance organizations are increasingly looking for tools and solutions not only capture calls but also can transform into a text to enable them to be included and used by existing NLP tools speech tech isn’t a one-size-fits-all standard language models coupled with the ability to customize vocabulary and terms enables accurate recognition of industry specific technology the ability to add words quickly and easily is also essential to compliance use cases finally one I just want to highlight analytics I think it’s relatively popular one in this space and when it comes to the benefits of speech tech organizations need to be able to gather triage understand and then action data analytics and NLP tools are key to this process with contact centers and organizations already seeing the value in NLP and I’ve already developed these solutions for text-based inputs like BOTS email social etcetera like we mentioned knew they’re looking for new ways to involve voice in the solutions that are already deployed from the research the benefits of speak tech also cover operational efficiencies contact centers are focused more and more on their ability to uplift their customer experience while minimizing cost real-time prompting of agents and the easy accessibility of repositories of content enable this moreover this is it’s this real-time capability of speech tech and subsequent identification of keywords within transcripts that enable relevant topics to be delivered to agents automatically so having a dynamic knowledge base sometimes the simplest applications of speech tech can actually be the most effective just by transcribing calls means the human agents just don’t have to this reduces the time spent away from the callers and means that agents can process more time and more calls and add value where only they can all these elements have an impact on customer experience the ASR can add value to you thanks Alex and and just moving forward with that looking at how to leverage voice data to truly understand the voice of your customers organizations need to analyze every conversation from every touch point within that customers journey the powerful combination of voice ASR and AI can support multi disciplines in any organization these tools can be utilized to underpin compliance initiatives and scripted Arents voice data can really shine a

light on why some sales representatives outperform the rest of the team highlight upsell or cross selling opportunity through keywords or product trends and as Alex you’re spoken about it can help to underpin customer experience strategies can start to reduce churn increase the average sales value by customer increased loyalty and of course reduce complaints as the old saying goes the data you get out is only as good as that data that you put in so high quality audio coupled with high-quality speech recognition have to be at the core of any organizations voice strategy you so with organizations focused on gaining insight and let uplift the experience of their customers speech tech is able to add value wherever voice exists and like Haley mentioned the ability to have all of that voice available to you in a single location to be able to provide all of the insight available that’s contained within it is hugely important voice is a rich and vibrant source of insight but unlocking that insight is really key even before this it’s essential to preserve the quality of the cause to ensure there an optimal condition this then optimizes the accuracy that ASR and the quality of these tools that can be fed by the text it’s for this reason that the ability to capture calls in high quality from anywhere is vital to get a heuristic and comprehensive picture of interactions contact centers are looking to voice technology to unlock previously untouched voice data from cause to provide agents with training opportunities that they require transcripts can be used to identify how issues can be sold in the past to improve the customer journey and reduce interaction times transcripts of calls also enable easy access to data and use this in compliance auditing to accelerate dispute resolutions and ultimately achieve better customer experience the financial sector is one of the most regulated in the world and organizations are motivated to understand and protect the data that is exposed to them for this reason it’s never been a better time to invest and deploy tools that deliver better insight into data provided by customers securely when it comes to data often the focus is what it contains but what’s also important is where it’s stored and the location of tools like ASR on-premise deployments capability of speech text means that organizations are not abstracted from the data which remains in their and their secure environment further protecting both the customer and the organization you thanks Alex so the way organizations are looking to use their voice data has changed significantly many organizations particularly those that are regulated and have compliance that they have to adhere to have already invested heavily in order to meet those regulations remaining compliant but they are now looking at how they can utilize this investment initially organizations were looking at optimizing their workforce to start to drive efficiencies and performance improvements customer experience has risen to the top of organizations agendas as consumers have become much more savvy with higher expectations in the service levels that they’re receiving detailed business intelligence and analytics coupled with understanding the conversations across the enterprise and all touch points within a customer journey are now key to organizations not only maintaining but also looking to extend their market positions organizations are proactively looking to move to data-driven processes and decision making and ultimately moving through into digital transformation so we’ve had a number of questions come through please feel free to keep adding to the questions as we kind of keep keep addressing the questions that we’ve got so far but we’ll start off I hope hey leeway mind but I’ll take the first one that’s come through which is what about a different non UK English accents so so it’s a really really good question so English is a is a huge language that’s being you know very broadly adopted throughout the world but you’re right there are there are multiple accents and

dialects so when it comes to providing speech recognition on those it can be quite challenging so speech mastics has actually taken quite a unique approach to this which is we have a single English language pack we call it so we call it global English and what we’ve done is as part of that language pack we’ve trained on a great number of accents and dialects what this enables us to do is be able to provide you know best quality accuracy when it comes to transcription on a broad range of accents and dialects there’s a number of benefits that it offers actually especially in the contact centre space where you have multiple people you know having conversation if one person has a different accent to another you know previously you might have had to deploy multiple English language packs to get the accuracy which of course means that you have to deploy two separate instances of those language models you then have to you have the cost and time of doing that twice but with the speech Mattox global English it means you only have to do that once which you know provides great optimization within these use cases so it’s a great question and you know we would encourage anyone who is interested in our global English to to have a go and get in touch with us and if you want to if you want to try it out please put as many accents to it as you like and would be would be very interested to hear your feedback you so we also had a number of questions that came in ahead of the webinar Haley as I took the first one have you got any questions that came in that you would like to answer yeah once one came in before beforehand and that was talking about concerns around legacy telephony and I think koala thinks to highlight as I did at the beginning is at red box we are quite unique and we’re able to capture voice from a vast number of telephony integrations legacy and new so I think I gave the figure you know it’s over fifty-five telephony integrations that were able to capture that voice from and then through the way our red box solution works our open api’s and our partnership with with speech Mattox we’re able to push that data to the a IML analytics tools that you’re looking to utilize so certainly please don’t feel that you’re limited or can’t adopt this technology to help improve your processes within within your contact center environment and you know by all means get in touch and we can we can have a more detailed conversation regarding the telephony integration that you’re you’re utilizing yeah I think it’s a it’s a great point you know it’s you know for people who you know feel there might be a barrier to adoption you know with working with red box it just means that they’re you know that barrier just doesn’t exist so that’s you know it’s a great way of providing those operational efficiencies which is so important another question that came in talked about kind of the growth rate and adoption of this technology within the contact center industry so we kind of addressed it a little bit earlier so 82% of comments contact center professionals who’ve either considered or currently implementing voice strategy for their businesses either an hour in the next five years so you know I think that the growth rate is is increasing especially with people like Gartner talking about how you know speech recognition is has hit that kind of level of maturity I suppose for for me ASR has been a really high quality for actually a couple of a couple of years now specifically in English it’s it’s a very the adopted language as I mentioned speech Messick’s actually supports over 30 languages now so we can deliver these kind of services in in a massive range of languages and it really means that contact centres have this fantastic tool now that they can they can trust and utilize to really extract the value that they’ve got within within their within their call data it also means I think we found that there was a stat came back that said 46 percent said that speech tech gave them a competitive advantage but as I mentioned that I think really we’re getting to a point where the maturity of speech tech technology is getting to get to a point where really I think it’s there it’s at a level it can add huge amounts of value and so we’re in that situation where soon I think it won’t almost offer a level of differentiation it will be a must-have for organisations to really extract that data and I’ll throw it back to Haley so she can she can have a look at any other questions she’s got yeah and I think just to touch on that point really Alex that that

voice is and always will be kind of the most powerful form of communication and going back to my last slide it’s all about that evolution we’ve come so far from just recording calls to for compliance and for for trying to minimize risk and looking at surveillance and we’re looking at how we can proactively use our voice conversations and conversations across organizations to really help inform us of what’s happening and what conversations are taking place within those organisations and I’m share the same opinion of Alex I certainly feel that you know this is definitely becoming the norm people are actually wanted to really leverage their voice data now thanks Haley yeah you know I think it’s a you know it’s a growing growing market and people are really coming around to understanding you know and leveraging these kind of tools which is really really powerful for them another question that’s just come in actually is about what you do in the case of multiple languages are used you know in contact centers so an example given in India where you might have English and then a local language like Hindi so the benefit is that speech Matic supports both of those languages so so that’s good there currently isn’t I suppose a streamline solution to flip automatic speech recognition from one language to another you can run both of those language packs in parallel which means that that after you’ve recorded the call that those those can be kind of knit together to give a holistic transcript it kind of goes back to you know using global English we’ve got you know a number of accents Indian accents within there so you can get quite good recognition from that but yeah it is a kind of a challenge of using kind of automatic speech recognition from a deployment and usage point-of-view but there’s nothing stopping running those those recordings in parallel transcribing them and then adding them together Haley do you have any others I’ve got another one that kind of leads on from that yeah you go ahead that’s great cool so another question that kind of come through is is there a way to name speakers in a transcript and the short answer to that is yes so we have a feature called channel dilation it means that if you are recording on multiple channels you might have an agent on channel one and you might have the caller on channel two then what you can do is you can label those and in a transcript that data can feed through and then you can do what we call child ization so you can identify those two speakers hey did you have any more I’ve got a few more questions my end I think that was all that had come in specifically around the voice capture piece so yeah feel free to delve a little bit deeper into two ASR okay cool so one of the other questions that we had come through was how close to real-time transcription can we really get so it’s a really interesting question and there will always be a trade-off when it comes to ASR and real-time between latency and accuracy but when I talk about latency I’m talking in their single single seconds and normally under five seconds but as part of our real-time solution we have that as a configurable figureA bolometer which means that you can set that latency to whatever you want so obviously there will be a trade-off if you have an increased amount of latency you’ll get better accuracy but as I said you know less than five seconds you’ll get really good really good accuracy and what this enables you to do by uplifting accuracy and what I mean by that is we we provide contextual kind of understanding within our speech recognition so a really good example that we tend to use quite often is if I said the words I read a red book then by configuring a certain amount of latency into that what the ASR engine would do is go back and understand okay Alex is talking about reading but he’s also talking about the color so I can go back and I can what we call rescore that transcript output to spell read re ad as in read the book but also understand that he’s talking about a read re d book so you can use that latency which as I

said is totally configurable to uplift your accuracy so it’s a really really powerful tool when using real-time and we understand that you know real-time is really really fantastic in the contact center right so you’re having agents are having conversations in real time they might need support when it comes to training or understanding and being able to have real-time ASR which is going into a text-based format that can feed other the tools within the workflow to identify key words that could be a new product or service that they’ve got to identify up this customer is talking about this product or service I’ve only just rolled this out my agents might not have all the information I can automatically prompt them with it so to have that real-time is really really powerful but at the same time it’s not to say that the pre-recorded content isn’t equally valuable you know we we did a bit of research to found like 30% of agent time is spent you know typing up notes that have the possibility to be inaccurate or you know people are only human right they might miss something that actually could be really important to that conversation and so being able to populate an interaction history autonomously means that you can free up your agents to you know add value or any they can or you know deal with more calls and that’s really important a question came in about how do you measure accuracy so speech Mattox we’ve done quite a lot of work in regards to trying to educate kind of the market about what do we call accuracy or what do you call accuracy because I expect people on this call will probably have different understanding of accuracy and what that means to them I think that word error rate is a big thing but it’s not nothing so it is a kind of globally understood metric that we try to use to benchmark and it is used by kind of a lot of people so it means you’re on a level playing field so in regards to accuracy we often mean word error rate but there are lots of other things to consider I was just talking about interaction history so if you take that use case now we’ve rolled out in the last year advanced punctuation now the benefit that that gives is it means that you can add full stops commas question mark exclamation marks which means that if I’m an agent and I’m reviewing the call history or interaction history of a caller it means I can very easily understand and read that that piece of text because it’s properly punctuated having every single word perfect but having no punctuation actually makes it very difficult to engage with and so when it comes to accuracy it might be that it’s an accurate representation of speech through having proper punctuation and so I think that every everyone who’s using ASR needs to think what are the KPIs what are the things that I’m really that are really important to me how do I deliver this and when evaluating ASR providers have those really strongly in your mind to understand what is it that adds most value to me and not just think about the lowest word our array is the one that wins hi Alex just bejust before you continue as a couple of attendees that have actually raised their hands it’s just to say if you do have a question please type it into the question pane so that we can see it and then we can we can answer that for you you you sorry Alex don’t know if you’ve got any more that came in before just buy these these attendees / perhaps type their questions in yeah I’m gonna so I don’t have any questions okay it’s just a follow-up story about running multiple languages though you would in a case where you had if you want to run multiple language parks you you would have to set those up beforehand I think this is an it sounds like kind of a specific use case so if you’ve raised the question about running multiple language packs and probably best in this forum to take it offline there should be some links to the speech Mattox how to get in touch with speech Matic so if you want to raise that outside of the webinar that’s cool we’ll get back to you as soon as we can and have a more tailored and specific answer to your question I’ve just been informed the best email address to get asan is Hello at speech Mattox calm Oh how accurate is speech Mattox so this this is always an interesting question and the question the way that I always you know would

would go after this question is I don’t want to tell you how accurate speech Mattox is I want you to test it and tell me so we know that every use case is different every application of the language is different what we do do is we try and give you all the tools you need to increase the accuracy as best you can now one of the things that we have is custom dictionaries so if you have specific vocabulary that might not be included in our language packs what we try and do is we make that accessible to you very very easily it’s you can add it add those specific terms within our API and it will mean that you can quite easily bias the model towards towards those those terms and languages so one of the examples I use quite often it’s um it’s a brexit you know when that term started kind of popping up in the early stages a couple of years ago then obviously it didn’t fall within a of ASR models as it got more readily adopted it did and it fell into those models but when you know when people wanted to have accurate transcription of that it means that you can use custom dictionary to add that word and it’ll automatically give you that recognition it’s also really good for round names so if you’ve got to put your name in that for your your company or brand you want to make sure that that that that falls within the the transcription then it’s important that you can use those kind of tools to deliver that but yeah we deliver all of the capabilities hopefully to enable you to uplift your accuracy dependent on your use case okay and just I’ve seen a couple of questions come in around gdpr and privacy and I think actually in this forum that’s it’s actually quite a detailed conversation because there’s different different areas you know there’s there’s the red box side in terms of you know sensitive data and making sure that’s not that’s not captured and then how information goes through into the air so also again as as Alex said to the person that’s asked that question it may be a very specific use case that that you’re looking at so let’s let’s take it out of this where we can actually have more of a more of a conversation with each other and we can certainly go into that into that detail with you thanks Haley and I just want to address one of the questions that came in in regards to differentiate between multiple people so I talked about channeler ization so we support taldor ization up to eight different channels so it means if you’ve got a conference call with with eight different inputs then you can tag all of those and yet so we support up to eight at the moment I believe so does anyone else have any questions and they want to they want to raise at the moment no okay so I think if there are no more questions so ones coming really quickly in regards to can speech romantics understand English language irrespective of the accent so I wouldn’t say irrespective of the accent our global English has a huge amount of accents and dialects that we train on to give the best possible chance of delivering the best possible accuracy in as many accents and dialects as possible I think that if you have a specific accent or dialect that is important to you or that it’s very relevant to you then I would say please get in touch we can enable you to try out our solution and see how it works for a use case and I think actually that’s a a really nice place to end today I’d like to thank everyone who has joined the webinar I really like to say thank you to Haley from Redbox for joining me today on the call it was fantastic to have you Haley and hopefully this is the start of many more webinars together but I very much appreciate your time today that’s great thanks for having me I really enjoyed the session and like you say some great questions so yeah hopefully speaked you all again very soon brilliant well as I said thank you to everyone who joined and we’ll leave it there for today thanks very much and everyone have a wonderful day thanks from speech I take some from Redbox you