Tumgik
#CognitiveServices
Text
Teksun's Cognitive Services & Solutions includes a wide set of tools and frameworks that allow businesses to operationalize AI services quickly, and at scale. To know more about browse: https://teksun.com/ Contact us ID: [email protected]
0 notes
peterorneholm · 4 years
Text
Solving "The recordings URI contains invalid data." in Azure Speech - Batch Transcription
... or "How to remove a thumbnail from an mp3 using ffmpeg?"
Update 2020-12-10: Since writing this post, the team at Microsoft have rolled out an update of the transcription service that allows embedded media, so the workaround described in this blog post won’t be necessary any more. They have also introduced a new, more powerful API (v3) for Batch transcription; an introduction is available here.
While building the site RadioText.net (which I've blogged about here) I found that a few of the files I tried to transcribe returned an undocumented error: "The recordings URI contains invalid data." and I'll describe the solution to the (at least my) problem below.
The problem
So, the problem appeared when using Azure Speech Services - Batch transcription. Speech Services is one of the AI services in Azure provided through Azure Cogntitive Services. Batch Transcription allows for large scale, asyncronous transcription of audio files.
Initiating the transcription went fine is done using a HTTP POST like this (which of course can vary depending on your preferences):
HTTP POST: https://westeurope.cris.ai/api/speechtotext/v2.0/Transcriptions/
{ "Name":"RadioText - Episode 1482554", "Description":"RadioText", "RecordingsUrl":"https://STORAGEACCOUNT.blob.core.windows.net/media/Audio.mp3", "Locale":"sv-SE", // ... }
The error is returned to you when you are querying for the transcription status:
HTTP GET: https://westeurope.cris.ai/api/speechtotext/v2.0/Transcriptions/TRANSCRIPTION-GUID
{ "recordingsUrl": "https://STORAGEACCOUNT.blob.core.windows.net/media/Audio.mp3", // ... "statusMessage": "The recordings URI contains invalid data.", "status": "Failed", "locale": "sv-SE", // ... }
Video stream / JPG
After some investigation, the issue is seems undocumented but is related to that the batch service requires that the audio file being transcribed need to contain audio streams exclusively. The mp3 files I were trying to transcribe had an embedded video stream containing a jpg file - basically, the thumbnails used in media players. At the moment, batch transcription can't handle audio files containing such image/video stream (confirmed with the team at Microsoft).
Solution
Until support for these media files is built into the service, it's easy for us to extract only the audio channel ourselves using ffmpeg. The following command will extract the first audio channel from Input.mp3 and output it as Output.mp3 - ready to use for batch service. It's documented under Stream selection in the ffmpeg documentation.
ffmpeg -i Input.mp3 -map 0:a -codec:a copy Output.mp3
If we break it down:
ffmpeg: The utility
-i Input.mp3: Use Input.mp3 as input, could also be a URL (https://STORAGEACCOUNT.blob.core.windows.net/media/Audio.mp3).
input docs
-map 0:a: Use the audio (a) from the first input file (0)
map docs
-codec:a copy: Set codec option for audio (a) to only copy, this will make it very efficient as it does not need to encode anything.
Stream copy
Output.mp3: Output the new file that only contains the audio stream(s) into Output.mp3
There might be other options or tweaks that work for you, but I found this to be fast and work well to solve my problem.
Calling ffmpeg from CSharp
In my case, I wanted to run the above ffmpeg command as part of my transcription pipeline and therefore call it from C#. The below code is a minimal approach to do so:
const string ffMpegLocation = "PATH_TO_FFMPEG.EXE"; var inputFile = "Input.mp3"; var outputFile = "Output.mp3"; var processStartInfo = new ProcessStartInfo { FileName = ffmpegLocation, Arguments = $"ffmpeg -i \"{inputFile}\" -map 0:a -codec:a copy \"{outputFile}\"", UseShellExecute = false, RedirectStandardOutput = true, CreateNoWindow = true }; var process = Process.Start(processStartInfo); process.WaitForExit();
Contribute
I hope this helps, and if you have any tweaks you want to share, please let me know. I'm at @PeterOrneholm.
0 notes
fboucheros · 5 years
Photo
Tumblr media
Reading Notes #415 https://ift.tt/2wUQrfQ
0 notes
incegna · 5 years
Photo
Tumblr media
Cognitive Intelligence is the ability to handle reasoning,solving problems,applying tricks think abstractly,comprehend complex ideas,learn quickly and learn from experience.It is not merely book learning, a narrow academic skill, or test-taking smarts. https://www.incegna.com/post/cognitive-intelligence Check our Info : www.incegna.com Reg Link for Programs : http://www.incegna.com/contact-us Follow us on Facebook : www.facebook.com/INCEGNA/? Follow us on Instagram : https://www.instagram.com/_incegna/ For Queries : [email protected] #cognitiveintelligence,#artificialintelligence,#naturallanguage,#machinelearning,#deeplearning,#neuralnetworks,#nlp,#robotics,#cognitiveservices,#emotionalintelligence,#cognitivecomputing,#humanlearning,#python https://www.instagram.com/p/B8dHLB_ArrW/?igshid=1vf4kc9dq7itp
0 notes
curelet-blog · 5 years
Photo
Tumblr media
Microsoft Cognitive Services defined as simple to use Artificial Intelligence on top of Machine Learning. @MarkArteaga talk at #GlobalAzure @MicrosoftCanada. #ML #MachineLearning #AI #CognitiveServices #microservices #serverless #Xamarin #faas #functionsAsAService #EventGrid (at Microsoft in Education Canada) https://www.instagram.com/curelet1/p/Bwx8hgkAg3p/?utm_source=ig_tumblr_share&igshid=p9vxcs3asz01
0 notes
heathenstorm · 6 years
Photo
Tumblr media
Dichotomy. Sometimes the Rock’n’Roll lifestyle takes a backseat to the more serious business of earning a crust. In London for a few days to brush up on the latest tech advances and press flesh with programming peers from around the world. . I hope I packed enough hand sanitiser. . #microsoft #microsoftignite #conference #excel #london #machinelearning #artificialintelligence #cognitiveservices #azure #programmer #geek #nerd #bloodceremony (at ExCeL London Exhibition and Convention Centre) https://www.instagram.com/p/BuVuM5cF0ZB/?utm_source=ig_tumblr_share&igshid=ef636owv88io
0 notes
rupicjp · 6 years
Photo
Tumblr media
来週のふくてん資料をMacbook proで作成中(*´-`)♡ Officeの操作感もwindows版と変わらないから使いやすい! 大好きなCognitive Servicesについてお話しさせて頂きますm(_ _)m 30分におさまるか不安になってきたw #ふくてん #fukuten #cognitiveservices #office365 #macbookpro2018 https://www.instagram.com/p/BtGHJlIH2lxWFDDhsbYE81ajOPu_4NhzRq5k3A0/?utm_source=ig_tumblr_share&igshid=1axo0vjp4k5g7
0 notes
yosoyfedert · 7 years
Photo
Tumblr media
Geeking 🤓 #laptop #stickers #Microsoft #Linux #Channel9 #MSBuild #HoloLens #API #CognitiveServices #azurecosmosdb #SubPop #MicrosoftAdvocate #Windows10
1 note · View note
hummli · 6 years
Photo
Tumblr media
Visiting the headquarter of Microsoft Munich and taking part at the Imagine Cup. Amazing projects and high innovation. #microsoft #imaginecup2018 #munichHack #munich #innovation #ai #networking #studySmarter #it #hackathon #worldFinals #azure #cognitiveServices #deepLearning (hier: Microsoft Deutschland GmbH)
0 notes
fadyanwar · 4 years
Link
#iot #ai #aiot #azure #raspberrypi #cognitiveservices #nodejs #faceapi 
1 note · View note
ngeorgeault · 7 years
Photo
Tumblr media
Was great to speak about our vision of #collectiveintelligence and present our project. Great feedbacks on #writeit4me #Microsoft #cognitiveservices (at WeWork Place Ville Marie)
0 notes
elbrunoc · 4 years
Text
#Unity3D - Making a CustomVision.ai HTTP Post call to have a better #MRTK experience with #CognitiveServices @ivanatilca
#Unity3D – Making a CustomVision.ai HTTP Post call to have a better #MRTK experience with #CognitiveServices @ivanatilca
Tumblr media
Hi !
Quick post today, with mostly sample code. And, it’s all about a scenario that we faced with Ivana a couple of days ago while we were using MRTK and we were trying to use some Cognitive Services.
As of today, not all the services in Cognitive Services are supported and have official Unity3D support. At the end, it’s not a problem, we can just make an HTTP Post call, and that’s it.…
View On WordPress
0 notes
peterorneholm · 5 years
Text
Introducing RadioText.net - Transcribing news episodes from Sveriges Radio
RadioText.net is a site that transcribes news episodes from Swedish Radio and makes them accessible. It uses multiple AI-based services in Azure from Azure Cognitive Services like Speech-to-Text, Text Analytics, Translation, and Text-to-Speech.
By using all of the services, you can listen to "Ekot" from Swedish Radio in English :) Disclaimer: The site is primarily a technical demo, and should be treated as such.
Background
Just to give you (especially non-Swedish) people a background. Sveriges Radio (Swedish Radio) is the public service radio in Sweden like BBC is in the UK. Swedish Radio does produce some shows in languages like English, Finnish and Arabic - but the majority is (for natural reasons) produced in Swedish.
The main news show is called Ekot ("The Echo") and they broadcast at least once every hour and the broadcasts range from 1 minute to 50 minutes. The spoken language for Ekot is Swedish.
For some time, I've been wanting to build a public demo with the AI Services in Azure Cognitive Services, but as always with AI - you need some datasets to work with. It just so happens that Sveriges Radio has an open API with access to all of their publically available data, including audio archive - enabling me to work with the speech API:s.
Architecture
The site runs in Azure and is heavily dependant on Cognitive Services. It's split into two parts, Collect & Analyze and Present & Read.
Collect & Analyze
The collect & analyze part is a series of actions that will collect, transcribe, analyze and store the information about the episodes.
It's built using .NET Core 3.1 and can be hosted as an Azure function, Container or something else that can run continuously or on a set interval.
The application periodically looks for a new episode of Ekot using the Sveriges Radio open API. There is a NuGet-package available that wraps the API for .NET (disclaimer, I'm the author of that package...). Once a new episode arrives, it caches the relevant data in Cosmos DB and the media in Blob Storage.
JSON Response: https://api.sr.se/api/v2/episodes/get?id=1464731&format=json
The reason to cache the media is that the batch version of Speech-to-text requires the media to be in Blob Storage.
Once all data is available locally, it starts the asynchronous transcription using Cognitive Services Speech-to-text API. It specifically uses the batch transcription which supports transcribing longer audio files. Note that the default speech recognition only supports 15 seconds because it is (as I've understood it) more targeted towards understanding "commands".
The raw result of the transcription is stored in Blob-storage, and the most relevant information is stored in Cosmos DB.
The transcription contains the combined result (a long string of all the text) the individual words with timestamps. A sample of such a file can be found below:
Original page at Sveriges Radio: Nyheter från Ekot 2020-03-20 06:25
Original audio: Audio.mp3
Transcription (Swedish): Transcription.json
This site only uses the combined result but could improve the user experience by utilizing the data of individual words.
All of the texts (title, description, transcription) are translated into English and Swedish (if those were not the original language of the audio) using Cognitive Services Translator Text API.
A sample can be found here: https://radiotext.net/episode/1464731
All texts mentioned above are analyzed using Cognitive Services Text Analytics API, which provides sentiment analysis, keyphrases and (most important) named entities. Named entities are a great way to filter and search the episodes by. It's better than keywords, as it's not only a word but also what kind of category it is. The result is stored in Cosmos DB.
The translated transcriptions are then converted back into audio using Cognitive Services Text-to-Speech. It produces one for English and one for Swedish. For English, there is support for the Neural Voice and I'm impressed by the quality, it's almost indistinguishable from a human. The voice for Swedish is fine, but you will hear that it's computer-generated. The generated audio is stored in Blob Storage.
Original audio: Audio.mp3
English audio (JessaNeural, en-US): Speaker_en-US-JessaNeural.mp3
Swedish audio (HedvigRUS, sv-SE): Speaker_sv-SE-HedvigRUS.mp3
Last but not least, a summary of the most relevant data from previous steps are denormalized and stored in Cosmos DB (using Table API).
Present & Read
The site that presents the data is available at https://radiotext.net/. It's built using ASP.NET Core 3.1 and is deployed as a Linux Docker container to Dockerhub and then released to an Azure App Service.
Currently, it lists all episodes and allows for in-memory filtering and search. From the listing, you can see the first part of the transcription in English and listen to the English audio.
By entering the details page, you can explore the data in multiple languages as well as the original information from the API.
Immersive reader
Immersive Reader is a tool/service that's been available for some time as part of Office, for example in OneNote. It's a great way to make reading and understanding texts easier. My wife works as a speech- and language pathologist and she says that this tool is a great way to enable people to understand texts. I've incorporated the service into Radiotext to allow the user to read the news using this tool.
Primarily, it can read the text for you, and highlight the words that are currently being read:
It can also explain certain words, using pictures:
And if you are learning about grammar, it can show you grammar details like what verbs are nouns, verbs, and adjectives:
I hadn't used this service before, but it shows great potential for making texts more accessible. Combined with Speech-to-text, it can also make audio more accessible.
Cost
I've tried to get a grip on what the cost would be to do run this service and I estimate that to run all services for one episode of Ekot (5 minutes) the cost is roughly €0,2. That includes transcribing, translating, analyzing and generating audio for multiple languages.
Speech pricing
Translation pricing
Text analytics pricing
Also, there will be a cost for running the web, analyzer, and storage.
Ideas for improvement
The current application was done to showcase and explore a few services, but it's not in any way feature complete. Here are a few ideas on the top of my mind.
Live audio transcription: Speech to text supports live audio transcription, so we could transcribe the live radio feed. Could be comined with subtitles idea below.
Improve accuracy with Custom Speech: Using Custom Speech we could improve the accuracy of the transcriptions by training it on some common domain-specific words. For example, the jingle is often treated as a words, while it should not.
Enable subtitles: Using the timestamp data from the transcription subtitles could be generated. That would enable a scenario where we can combine the original audio with subtitles.
Multiple voices: A natural part of a news episode are interviews. And naturally, in interviews, there are multiple people involved. The audio I'm generating now is reading all texts with the same voice, so in scenarios when there are conversations it sounds kind of strange. Using conversation transcription it could find out who says what and generate the audio with multiple voices.
Improve long audio: The current solution will fail when generating audio for long texts. The Long Audio API allows for that.
Handle long texts: Both translation and text analytics has limitations on the length of the texts. At the moment, the texts are cut if they are too long, but they could be split into multiple chunks and then analyzed and concatenated again.
Search using Azure Search: At the moment the "search" and "filtering" functionality is done in memory, just for demo purposes. Azure Search allows for a much better search experience and could be added for that. Unfortunately, it does not allow for automatic indexing of Cosmos DB Table API at the moment.
Custom Neural Voice: I've always wanted to be a newsreader, and using Custom Neural Voice I might be able to do so ;) Custom Neural Voice can be trained on your voice and used to generate the audio. But, even if we could to this, it doesn't mean we should. Custom Neural Voice is one (maybe the only?) service you need to apply for to be able to use. In the world of fake news, I would vote for not implementing this.
Disclaimer
This is an unofficial site, not built or supported by Sveriges Radio. It's based on the open data in their public API. It's built as a demo showcasing some technical services.
Most of the information is automatically extracted and/or translated by the AI in Azure Cognitive Services. It's based on the information provided by Swedish Radio API. It is not verified by any human and there will most likely be inaccuracies compared to the source.
All data belongs is retrieved from the Swedish Radio Open API (Sveriges Radios Öppna API) and is Copyright © Sveriges Radio.
Try it out and contribute
The Source code is available at GitHub and Docker image available at Dockerhub.
Hope you like it. Feel free to contribute :)
0 notes
fboucheros · 5 years
Photo
Tumblr media
Reading Notes #382 http://bit.ly/2Zf51Yy
0 notes
Photo
Tumblr media
Delivering integrated augmented intelligence with natural language processing, video/image analytics, & rising technologies like AR & VR to assist businesses with immersive customer experiences and beat the competition. Get the next generation of intelligent systems. Our AI & Cognitive Services -> Chat Bots -> Machine Learning -> Cognitive Services -> MR, AR & VR #artificialintelligence #machinelearning #ai #technology #datascience #bigdata #deeplearning #tech #innovation #python #iot #blockchain #chatbot #chatbotmessenger #chatbots #digital #CognitiveService #intelligence #business #teksun #teksuninformation #teksunindia #teksunusa
0 notes
curelet-blog · 5 years
Photo
Tumblr media
The technologies landscape of mobile applications development today. From @MarkArteaga talk at #GlobalAzure @MicrosoftCanada. #ML #MachineLearning #AI #CognitiveServices #microservices #serverless #Xamarin #faas #functionsAsAService #EventGrid (at Microsoft in Education Canada) https://www.instagram.com/curelet1/p/Bwx5iIOAGnV/?utm_source=ig_tumblr_share&igshid=1bied97qzbwzk
0 notes