Artificial Intelligence: Speaker separation and content summarization can be achieved, and the audio-video-to-text software "Magic Assistant" needs to deepen the meeting scene

Moyin Assistant can not only complete the shorthand of the voice content.

" Magic Assistant " is an efficient tool for converting audio and video to text. It can help companies make audio and video conference records, with conference content retrieval, automatic conference summary, and audio and video editing functions. Suitable for scenarios such as study, conference, interview, etc.

Moyin Assistant can not only complete shorthand for voice content. In the Moyin Assistant software, users can record conference sounds or videos, and the software can automatically generate text content and distinguish different speakers. The mentioned text is displayed by the cursor , and the user can do simple editing work according to the time point of the text. In addition, Moyin Assistant has a text search function and an automatic summary function, which is convenient for users to select the core content of the recording and conduct conference review.

According to the Foresight Research Institute, my country's smart voice industry market scale was 4.86 billion yuan in 2018, and the annual growth rate of the market scale has exceeded 25%. It is predicted that the market size will exceed 10 billion yuan in 2021. This industry has a good market prospect. Moreover, due to the outbreak of the epidemic, people are gradually adapting to smart office learning. The Moyin Assistant team believes that voice and video conferencing are new market opportunities, and started to develop transcription software for work and learning scenarios in February this year.

In terms of speech recognition accuracy , Moyin Assistant has built a cloud self-training model. The training data is mostly specific scenarios such as public meetings, learning courses, and voice conferences, which are customized for work and learning scenarios with high recognition rates. Most of the recording-to-text recording tools on the market use general models, which need to be adapted to various scenarios and lack pertinence. In addition, Moyin Assistant uses a Personalized Speech Recognition Engine (PASR) to form a different voice model structure for each account, adaptively learn the professional terms and accents commonly used by users, and become smarter in the process of user proofreading. With the continuous use of users, the recognition accuracy will become higher and higher.

The transcription of Moyin Assistant can realize the function of speaker separation. The adopted voiceprint technology can convert each sound into a fixed-dimensional voiceprint vector to achieve voiceprint comparison. In the industry, vocal recognition is a cross-discipline that requires algorithms such as voiceprint recognition and semantic recognition to be implemented together. The technical barriers are high. Therefore, there are fewer speech-to-text tools with this function on the market . Moyin Assistant combines voiceprint recognition with semantic content, and the correct rate of speaker recognition can reach 70%-80%. In addition, the natural language processing model adopted by Moyin Assistant can understand semantics and correct text errors, automatically modify bad sentences and verbal errors, and improve text readability.

Auxiliary functions, the assistant can sound the magic semantic extracted by analyzing the context off the letter information automatically extracted conference abstract. In addition, Moyin Assistant supports the transcription text search function, which can search for text content to locate the voice location. Moyin Assistant can also implement video voice transcription and generate subtitles for the video. The user can edit the corresponding video according to the subtitle content.

At present, Moyin Assistant's revenue mainly comes from corporate payments. The product is currently free for individual users. In the future, Moyin Assistant may launch a personal paid membership version to provide individual member users with exclusive functions such as real-time transcription. At present, there are about a dozen technicians in the team. Founding team members from pea pods, deft , will be a small two other Internet companies, all with experience in products and business services. The backbone of AI technology comes from first-line institutions such as iFLYTEK , Chinese Academy of Sciences, Baidu , and ByteDance.

The major Internet companies have also discovered this market opportunity and exerted their efforts on the online meeting recording function. On November 18th, Feishu launched the "Feishu Miao Ji" function at the "2020 Feishu Unlimited Future Conference", which can generate conference speech transcripts, distinguish between speakers, search and abstract extraction functions.

Faced with the enthusiasm of the online conference market, Moyin Assistant said that he is an open software and is willing to become partners with online office and online learning platforms.

Artificial Intelligence

Saturday, November 28, 2020

Speaker separation and content summarization can be achieved, and the audio-video-to-text software "Magic Assistant" needs to deepen the meeting scene

No comments:

Post a Comment

Introducing AIrbq.com: Your Go-To Source for the Latest in AI Technology News

Labels

Blog Archive