Elasticsearch:如何把 OpenAI 的代码修改为 Azure OpenAI
2024年11月6日 | by mebius
我们知道除了 OpenAI 提供数据嵌入及 Chat Completion 功能之外,Azure 也提供 OpenAI 类似的服务。这两个都是经常需要的平台。在我们的 Elasticsearh labs 里有很多代码是使用 OpenAI 来完成的,那么我们该如何把它们修改为使用 Azure 所提供的 OpenAI 呢?
我们先以一个之前做过的例子来进行展示。
Azure OpenAI embeddings
我们选择之前的文章 “Elasticsearch:使用 Open AI 和 Langchain 的 RAG – Retrieval Augmented Generation (三)”。在这里文章中,我们使用 OpenAI 来生成相应的 embeddings。我在 Azure OpenAI 上创建一个 text-embedding-ada-002 的模型。
在 Azure 网站上,我们需要知道这个 embedding 模型的如下参数:
MODEL_NAME=text-embedding-ada-002
AZURE_ENDPOINT=https://embeddings-testing1.openai.azure.com/
AZURE_API_KEY="YourEmbeddingModelKey"
AZURE_OPENAI_API_VERSION=2023-05-15
你需要得到如上所示的信息。有了这些信息,我们需要对代码做如下的修改:
# from langchain.embeddings import OpenAIEmbeddings
from langchain_openai import AzureOpenAIEmbeddings
如上所示,我们使用AzureOpenAIEmbeddings 而不是之前的OpenAIEmbeddings。我们需要使用如下的方法来创建 embeddings:
# embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
model_name = os.getenv('MODEL_NAME')
azure_endpoint = os.getenv('AZURE_ENDPOINT')
azure_api_key = os.getenv('AZURE_API_KEY')
azure_openai_api_version = os.getenv('AZURE_OPENAI_API_VERSION')
embeddings = AzureOpenAIEmbeddings(
model=model_name,
azure_endpoint=azure_endpoint,
api_key=azure_api_key,
openai_api_version=azure_openai_api_version
)
请注意和之前的注释掉的那个代码的区别。
最终的完整代码在地址https://github.com/liu-xiao-guo/semantic_search_es/blob/main/ElasticsearchStore_azure.ipynb可以进行下载。
def generate_embeddings(text):
client = AzureOpenAI(
api_key=AZURE_OPENAI_API_KEY,
api_version=AZURE_OPENAI_API_VERSION,
azure_endpoint=AZURE_OPENAI_ENDPOINT,
)
response = client.embeddings.create(
input=text,
model=AZURE_DEPLOYMENT_ID,
)
return response.data[0].embedding
sample_text = "India generally experiences a hot summer from March to June, with temperatures often exceeding 40C in central and northern regions. Monsoon season, from June to September, brings heavy rainfall, especially in the western coast and northeastern areas. Post-monsoon months, October and November, mark a transition with decreasing rainfall. Winter, from December to February, varies in temperature across the country, with colder conditions in the north and milder weather in the south. India's diverse climate is influenced by its geographical features, resulting in regional"
embeddings = generate_embeddings(sample_text)
Azure OpenAI chat completion
在 RAG 的应用场景中,我们还必须使用 chat completion 来完成我们的操作。我们可以参考链接。我们使用之前的文章 “Elasticsearch:使用 OpenAI、LangChain 和 Streamlit 的基于 LLM 的 PDF 摘要器和 Q/A 应用程序” 为例。在之前的版本里,我使用了 OpenAI 来完成。为了把代码转换为 Azure OpenAI,我们可以经过如下的改动:
model_name = os.getenv('MODEL_NAME')
azure_embedding_endpoint = os.getenv('AZURE_EMBEDDING_ENDPOINT')
azure_embedding_api_key = os.getenv('AZURE_EMBEDDING_API_KEY')
azure_embedding_api_version = os.gettgcodeenv("AZURE_EMBEDDING_API_VERSION")
# create embeddings
embeddings = AzureOpenAIEmbeddings(
model=mtgcodeodel_name,
azure_endpoint=azure_embedding_endpoint,
api_key= azure_embedding_api_key,
openai_api_version=azure_embedding_api_version
)
这个是针对我们的嵌入模型来进行的。我们可以在 .env 里定义我们所需要的变量:
MODEL_NAME=text-embedding-ada-002
AZURE_EMBEDDING_ENDPOINT=https://embeddings-testing1.openai.azure.com/
AZURE_EMBEDDING_API_KEY="YourEmbeddingKey"
AZURE_EMBEDDING_API_VERSION=2023-05-15
AZURE_API_KEY="YourChatCompletionKey"
AZURE_EDNPOINT="YourEndPoint"
AZURE_API_VERSION="2023-03-15-preview"
AZURE_DEPLOYMENT_ID="YourDeploymentId"
除此之外,我们还需要针对 chat completion 部分的代码进行修改:
azure_api_key = os.getenv('AZURE_API_KEY')
azure_endpoint = os.getenv('AZURE_EDNPOINT')
azure_api_version = os.getenv('AZURE_API_VERSION')
azure_deployment_id = os.getenv('AZURE_DEPLOYMENT_ID')
llm = AzureChatOpenAI(
api_key=azure_api_key,
api_version=azure_api_version,
azure_endpoint=azure_endpoint,
azure_deployment=azure_deployment_id,
)
运行完代码,我们可以看到如下的画面:
streamlit run azure.py
使用 Azure OpenAI 来实现 embeddings 及 chat completion
我们采用之前的例子 “使用 Elasticsearch 和 OpenAI 构建生成式 AI 应用程序”。在这个应用中在查询的时候,查询的文字也是需要转换为向量来进行查询的。我们修改 app.py 的代码如下:
.env
MODEL_NAME=text-embedding-ada-002
AZURE_EMBEDDING_ENDPOINT=https://embeddings-testing1.openai.azure.com/
AZURE_EMBEDDING_API_KEY="YourEmbeddingKey"
AZURE_EMBEDDING_API_VERSION=2023-05-15
AZURE_API_KEY="YourChatCompletionKey"
AZURE_EDNPOINT="YourEndPoint"
AZURE_API_VERSION="2023-03-15-preview"
AZURE_DEPLOYMENT_ID="YourDeploymentId"
import os
import streamlit as st
# import openai
from openai import AzureOpenAI
from elasticsearch import Elasticsearch
from dotenv import load_dotenv
# from openai import OpenAI
# openai = OpenAI()
load_dotenv()
azure_api_key = os.getenv('AZURE_API_KEY')
azure_endpoint = os.getenv('AZURE_EDNPOINT')
azure_api_version = os.getenv('AZURE_API_VERSION')
azure_deployment_id = os.getenv('AZURE_DEPLOYMENT_ID')
chat = AzureOpenAI(
api_key = azure_api_key,
api_version = azure_api_version,
azure_endpoint = azure_endpoint
)
model_name = os.getenv('MODEL_NAME')
azure_embedding_endpoint = os.getenv('AZURE_EMBEDDING_ENDPOINT')
azure_embedding_api_key = os.getenv('AZURE_EMBEDDING_API_KEY')
azure_embedding_api_version = os.getenv("AZURE_EMBEDDING_API_VERSION")
embeddings = AzureOpenAI(
api_key=azure_embedding_api_key,
api_version=azure_embedding_api_version,
azure_endpoint=azure_embedding_endpoint,
)
elastic_user=os.getenv('ES_USER')
elastic_password=os.getenv('ES_PASSWORD')
elastic_endpoint=os.getenv("ES_ENDPOINT")
# openai_api_key=os.getenv('OPENAI_API_KEY')
# openai.api_type = "azure"
url = f"https://{elastic_user}:{elastic_password}@{elastic_endpoint}:9200"
client = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)
# Define model
EMBEDDING_MODEL = "text-embedding-ada-002"
def openai_summarize(query, response):
context = response['hits']['hits'][0]['_source']['text']
summary = chat.chat.completions.create(
model = azure_deployment_id,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Answer the following question:" + query + "by using the following text: " + context},
]
)
print(summary)
return summary.choices[0].message.content
def search_es(query):
# Create embedding
question_embedding = embeddings.embeddings.create(input=query, model=EMBEDDING_MODEL)
# Define Elasticsearch query
response = client.search(
index = "wikipedia_vector_index",
knn={
"field": "content_vector",
"query_vector": question_embedding.data[0].embedding,
"k": 10,
"num_candidates": 100
}
)
return response
def main():
st.title("Gen AI Application")
# Input for user search query
user_query = st.text_input("Enter your question:", "what is football?")
if st.button("Search"):
if user_query:
st.write(f"Searching for: {user_query}")
result = search_es(user_query)
# print(result)
openai_summary = openai_summarize(user_query, result)
st.write(f"OpenAI Summary: {openai_summary}")
# Display search results
if result['hits']['total']['value'] > 0:
st.write("Search Results:")
for hit in result['hits']['hits']:
st.write(hit['_source']['title'])
st.write(hit['_source']['text'])
else:
st.write("No results found.")
if __name__ == "__main__":
main()
最终的代码在链接下载:semantic_search_es/openai_rag_streamlit_azure.ipynb at main liu-xiao-guo/semantic_search_es GitHub
运行完代码后,我们可以看到如下的页面:
为 AI assistant 配置 Azure OpenAI
我们在 Kibana 的配置文件 kibana.yml 里做如下的配置:
xpack.actions.preconfigured:
azure-open-ai:
actionTypeId: .gen-ai
name: Azure OpenAI GPT-4
config:
apiUrl: https://xxxx.openai.azure.com/openai/deployments/gpt-4-32k/chat/completions?api-version=2023-07-01-preview
apiProvider: Azure OpenAI
secrets:
apiKey: YourOwnApiKey
请注意上面的配置是 Azure OpenAI 的 chat completion 配置。
为了能够使得 Kibana 正常工作,我们还必须使用如下的命令来生成 keys:
./bin/kibana-encryption-keys generate
$ ./bin/kibana-encryption-keys generate
Kibana is currently running with legacy OpenSSL providers enabled! For details and instructions on how to disable see https://www.elastic.co/guide/en/kibana/8.15/production.html#openssl-legacy-provider
## Kibana Encryption Key Generation Utility
The 'generate' command guides you through the process of setting encryption keys for:
xpack.encryptedSavedObjects.encryptionKey
Used to encrypt stored objects such as dashboards and visualizations
https://www.elastic.co/guide/en/kibana/current/xpack-security-secure-saved-objects.html#xpack-security-secure-saved-objects
xpack.reporting.encryptionKey
Used to encrypt saved reports
https://www.elastic.co/guide/en/kibana/current/reporting-settings-kb.html#general-reporting-settings
xpack.security.encryptionKey
Used to encrypt session information
https://www.elastic.co/guide/en/kibana/current/security-settings-kb.html#security-session-and-cookie-settings
Already defined settings are ignored and can be regenerated using the --force flag. Check the documentation links for instructions on how to rotate encryption keys.
Definitions should be set in the kibana.yml used configure Kibana.
Settings:
xpack.encryptedSavedObjects.encryptionKey: f6d17019af61a5724b314b59bada9bbc
xpack.reptgcodeorting.encryptionKey: d2f7ad59a387e2893f04e456c0752d7c
xpack.security.encryptionKey: 00c17aa28e4d37e49925ee4fdee589f3
把上面生成的 keys 拷贝到 kibana.yml 文件中保存,并重新启动 Kibana:
我们可以看到 Azure OpenAI GPT-4 的配置已经做好了。
更多阅读,请参考 “Elastic AI Assistant for Observability 和 Microsoft Azure OpenAI 入门”。
在 Dev Tools 下创建如下的一个 log:
POST /logs-elastic_agent-default/_doc
{
"message": "Status(StatusCode="FailedPrecondition", Detail="Can't access cart storage. nSystem.ApplicationException: Wasn't able to connect to redis n at cartservice.cartstore.RedisCartStore.EnsureRedisConnected() in /usr/src/app/src/cartstore/RedisCartStore.cs:line 104 n at cartservice.cartstore.RedisCartStore.EmptyCartAsync(String userId) in /usr/src/app/src/cartstore/RedisCartStore.cs:line 168").",
"@timestamp": "2024-02-22T11:34:00.884Z",
"log": {
"level": "error"
},
"service": {
"name": "cartService"
},
"host": {
"name": "appserver-1"
}
}