openai-cookbook
228 строк · 6.9 Кб
1{
2"cells": [
3{
4"attachments": {},
5"cell_type": "markdown",
6"metadata": {},
7"source": [
8"# Azure audio whisper (preview) example\n",
9"\n",
10"The example shows how to use the Azure OpenAI Whisper model to transcribe audio files.\n"
11]
12},
13{
14"cell_type": "markdown",
15"metadata": {},
16"source": [
17"## Setup\n",
18"\n",
19"First, we install the necessary dependencies and import the libraries we will be using."
20]
21},
22{
23"cell_type": "code",
24"execution_count": null,
25"metadata": {},
26"outputs": [],
27"source": [
28"! pip install \"openai>=1.0.0,<2.0.0\"\n",
29"! pip install python-dotenv"
30]
31},
32{
33"cell_type": "code",
34"execution_count": null,
35"metadata": {},
36"outputs": [],
37"source": [
38"import os\n",
39"import openai\n",
40"import dotenv\n",
41"\n",
42"dotenv.load_dotenv()"
43]
44},
45{
46"cell_type": "markdown",
47"metadata": {},
48"source": [
49"### Authentication\n",
50"\n",
51"The Azure OpenAI service supports multiple authentication mechanisms that include API keys and Azure Active Directory token credentials."
52]
53},
54{
55"cell_type": "code",
56"execution_count": 4,
57"metadata": {},
58"outputs": [],
59"source": [
60"use_azure_active_directory = False # Set this flag to True if you are using Azure Active Directory"
61]
62},
63{
64"cell_type": "markdown",
65"metadata": {},
66"source": [
67"#### Authentication using API key\n",
68"\n",
69"To set up the OpenAI SDK to use an *Azure API Key*, we need to set `api_key` to a key associated with your endpoint (you can find this key in *\"Keys and Endpoints\"* under *\"Resource Management\"* in the [Azure Portal](https://portal.azure.com)). You'll also find the endpoint for your resource here."
70]
71},
72{
73"cell_type": "code",
74"execution_count": 5,
75"metadata": {},
76"outputs": [],
77"source": [
78"if not use_azure_active_directory:\n",
79" endpoint = os.environ[\"AZURE_OPENAI_ENDPOINT\"]\n",
80" api_key = os.environ[\"AZURE_OPENAI_API_KEY\"]\n",
81"\n",
82" client = openai.AzureOpenAI(\n",
83" azure_endpoint=endpoint,\n",
84" api_key=api_key,\n",
85" api_version=\"2023-09-01-preview\"\n",
86" )"
87]
88},
89{
90"cell_type": "markdown",
91"metadata": {},
92"source": [
93"#### Authentication using Azure Active Directory\n",
94"Let's now see how we can autheticate via Azure Active Directory. We'll start by installing the `azure-identity` library. This library will provide the token credentials we need to authenticate and help us build a token credential provider through the `get_bearer_token_provider` helper function. It's recommended to use `get_bearer_token_provider` over providing a static token to `AzureOpenAI` because this API will automatically cache and refresh tokens for you. \n",
95"\n",
96"For more information on how to set up Azure Active Directory authentication with Azure OpenAI, see the [documentation](https://learn.microsoft.com/azure/ai-services/openai/how-to/managed-identity)."
97]
98},
99{
100"cell_type": "code",
101"execution_count": null,
102"metadata": {},
103"outputs": [],
104"source": [
105"! pip install \"azure-identity>=1.15.0\""
106]
107},
108{
109"cell_type": "code",
110"execution_count": 5,
111"metadata": {},
112"outputs": [],
113"source": [
114"from azure.identity import DefaultAzureCredential, get_bearer_token_provider\n",
115"\n",
116"if use_azure_active_directory:\n",
117" endpoint = os.environ[\"AZURE_OPENAI_ENDPOINT\"]\n",
118" api_key = os.environ[\"AZURE_OPENAI_API_KEY\"]\n",
119"\n",
120" client = openai.AzureOpenAI(\n",
121" azure_endpoint=endpoint,\n",
122" azure_ad_token_provider=get_bearer_token_provider(DefaultAzureCredential(), \"https://cognitiveservices.azure.com/.default\"),\n",
123" api_version=\"2023-09-01-preview\"\n",
124" )"
125]
126},
127{
128"cell_type": "markdown",
129"metadata": {},
130"source": [
131"> Note: the AzureOpenAI infers the following arguments from their corresponding environment variables if they are not provided:\n",
132"\n",
133"- `api_key` from `AZURE_OPENAI_API_KEY`\n",
134"- `azure_ad_token` from `AZURE_OPENAI_AD_TOKEN`\n",
135"- `api_version` from `OPENAI_API_VERSION`\n",
136"- `azure_endpoint` from `AZURE_OPENAI_ENDPOINT`\n"
137]
138},
139{
140"attachments": {},
141"cell_type": "markdown",
142"metadata": {},
143"source": [
144"## Deployments\n",
145"\n",
146"In this section we are going to create a deployment using the `whisper-1` model to transcribe audio files."
147]
148},
149{
150"cell_type": "markdown",
151"metadata": {},
152"source": [
153"### Deployments: Create in the Azure OpenAI Studio\n",
154"Let's deploy a model to use with whisper. Go to https://portal.azure.com, find your Azure OpenAI resource, and then navigate to the Azure OpenAI Studio. Click on the \"Deployments\" tab and then create a deployment for the model you want to use for whisper. The deployment name that you give the model will be used in the code below."
155]
156},
157{
158"cell_type": "code",
159"execution_count": 6,
160"metadata": {},
161"outputs": [],
162"source": [
163"deployment = \"whisper-deployment\" # Fill in the deployment name from the portal here"
164]
165},
166{
167"cell_type": "markdown",
168"metadata": {},
169"source": [
170"## Audio transcription\n",
171"\n",
172"Audio transcription, or speech-to-text, is the process of converting spoken words into text. Use the `openai.Audio.transcribe` method to transcribe an audio file stream to text.\n",
173"\n",
174"You can get sample audio files from the [Azure AI Speech SDK repository at GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/sampledata/audiofiles)."
175]
176},
177{
178"cell_type": "code",
179"execution_count": 7,
180"metadata": {},
181"outputs": [],
182"source": [
183"# download sample audio file\n",
184"import requests\n",
185"\n",
186"sample_audio_url = \"https://github.com/Azure-Samples/cognitive-services-speech-sdk/raw/master/sampledata/audiofiles/wikipediaOcelot.wav\"\n",
187"audio_file = requests.get(sample_audio_url)\n",
188"with open(\"wikipediaOcelot.wav\", \"wb\") as f:\n",
189" f.write(audio_file.content)"
190]
191},
192{
193"cell_type": "code",
194"execution_count": null,
195"metadata": {},
196"outputs": [],
197"source": [
198"transcription = client.audio.transcriptions.create(\n",
199" file=open(\"wikipediaOcelot.wav\", \"rb\"),\n",
200" model=deployment,\n",
201")\n",
202"print(transcription.text)"
203]
204}
205],
206"metadata": {
207"kernelspec": {
208"display_name": "venv",
209"language": "python",
210"name": "python3"
211},
212"language_info": {
213"codemirror_mode": {
214"name": "ipython",
215"version": 3
216},
217"file_extension": ".py",
218"mimetype": "text/x-python",
219"name": "python",
220"nbconvert_exporter": "python",
221"pygments_lexer": "ipython3",
222"version": "3.10.0"
223},
224"orig_nbformat": 4
225},
226"nbformat": 4,
227"nbformat_minor": 2
228}
229