Elasticsearch:如何把 OpenAI 的代码修改为 Azure OpenAI

2024年11月6日   |   by mebius

我们知道除了 OpenAI 提供数据嵌入及 Chat Completion 功能之外,Azure 也提供 OpenAI 类似的服务。这两个都是经常需要的平台。在我们的 Elasticsearh labs 里有很多代码是使用 OpenAI 来完成的,那么我们该如何把它们修改为使用 Azure 所提供的 OpenAI 呢?

我们先以一个之前做过的例子来进行展示。

Azure OpenAI embeddings

我们选择之前的文章 “Elasticsearch:使用 Open AI 和 Langchain 的 RAG – Retrieval Augmented Generation (三)”。在这里文章中,我们使用 OpenAI 来生成相应的 embeddings。我在 Azure OpenAI 上创建一个 text-embedding-ada-002 的模型。

%title插图%num

在 Azure 网站上,我们需要知道这个 embedding 模型的如下参数:

MODEL_NAME=text-embedding-ada-002
AZURE_ENDPOINT=https://embeddings-testing1.openai.azure.com/
AZURE_API_KEY="YourEmbeddingModelKey"
AZURE_OPENAI_API_VERSION=2023-05-15

你需要得到如上所示的信息。有了这些信息,我们需要对代码做如下的修改:

# from langchain.embeddings import OpenAIEmbeddings
from langchain_openai import AzureOpenAIEmbeddings

如上所示,我们使用AzureOpenAIEmbeddings 而不是之前的OpenAIEmbeddings。我们需要使用如下的方法来创建 embeddings:

# embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)

model_name = os.getenv('MODEL_NAME')
azure_endpoint = os.getenv('AZURE_ENDPOINT')
azure_api_key = os.getenv('AZURE_API_KEY')
azure_openai_api_version = os.getenv('AZURE_OPENAI_API_VERSION')

embeddings = AzureOpenAIEmbeddings(
    model=model_name,
    azure_endpoint=azure_endpoint, 
    api_key=azure_api_key,
    openai_api_version=azure_openai_api_version
)

请注意和之前的注释掉的那个代码的区别。

最终的完整代码在地址https://github.com/liu-xiao-guo/semantic_search_es/blob/main/ElasticsearchStore_azure.ipynb可以进行下载。

更多代码,请参考https://github.com/elastic/elasticsearch-labs/blob/213d5af0fe919277087fe7e80793eb35ad88fa82/notebooks/integrations/azure-openai/vector-search-azure-openai-elastic.ipynb

def generate_embeddings(text):
    client = AzureOpenAI(
        api_key=AZURE_OPENAI_API_KEY,
        api_version=AZURE_OPENAI_API_VERSION,
        azure_endpoint=AZURE_OPENAI_ENDPOINT,
    )

    response = client.embeddings.create(
        input=text,
        model=AZURE_DEPLOYMENT_ID,
    )

    return response.data[0].embedding


sample_text = "India generally experiences a hot summer from March to June, with temperatures often exceeding 40C in central and northern regions. Monsoon season, from June to September, brings heavy rainfall, especially in the western coast and northeastern areas. Post-monsoon months, October and November, mark a transition with decreasing rainfall. Winter, from December to February, varies in temperature across the country, with colder conditions in the north and milder weather in the south. India's diverse climate is influenced by its geographical features, resulting in regional"
embeddings = generate_embeddings(sample_text)

Azure OpenAI chat completion

在 RAG 的应用场景中,我们还必须使用 chat completion 来完成我们的操作。我们可以参考链接。我们使用之前的文章 “Elasticsearch:使用 OpenAI、LangChain 和 Streamlit 的基于 LLM 的 PDF 摘要器和 Q/A 应用程序” 为例。在之前的版本里,我使用了 OpenAI 来完成。为了把代码转换为 Azure OpenAI,我们可以经过如下的改动:

      model_name = os.getenv('MODEL_NAME')
      azure_embedding_endpoint = os.getenv('AZURE_EMBEDDING_ENDPOINT')
      azure_embedding_api_key = os.getenv('AZURE_EMBEDDING_API_KEY')
      azure_embedding_api_version = os.gettgcodeenv("AZURE_EMBEDDING_API_VERSION")
      
      # create embeddings
      embeddings = AzureOpenAIEmbeddings(
        model=mtgcodeodel_name,
        azure_endpoint=azure_embedding_endpoint, 
        api_key= azure_embedding_api_key,
        openai_api_version=azure_embedding_api_version
      )

这个是针对我们的嵌入模型来进行的。我们可以在 .env 里定义我们所需要的变量:

MODEL_NAME=text-embedding-ada-002
AZURE_EMBEDDING_ENDPOINT=https://embeddings-testing1.openai.azure.com/
AZURE_EMBEDDING_API_KEY="YourEmbeddingKey"
AZURE_EMBEDDING_API_VERSION=2023-05-15

AZURE_API_KEY="YourChatCompletionKey"
AZURE_EDNPOINT="YourEndPoint"
AZURE_API_VERSION="2023-03-15-preview"
AZURE_DEPLOYMENT_ID="YourDeploymentId"

除此之外,我们还需要针对 chat completion 部分的代码进行修改:

      azure_api_key = os.getenv('AZURE_API_KEY')
      azure_endpoint = os.getenv('AZURE_EDNPOINT')
      azure_api_version = os.getenv('AZURE_API_VERSION')
      azure_deployment_id = os.getenv('AZURE_DEPLOYMENT_ID')
      
      llm = AzureChatOpenAI(
          api_key=azure_api_key,  
          api_version=azure_api_version,
          azure_endpoint=azure_endpoint,
          azure_deployment=azure_deployment_id,
      )       

运行完代码,我们可以看到如下的画面:

streamlit run azure.py 

%title插图%num

你可以在链接下载代码:PDF-Summarizer-End-to-End-Project/azure.py at main liu-xiao-guo/PDF-Summarizer-End-to-End-Project GitHub

使用 Azure OpenAI 来实现 embeddings 及 chat completion

我们采用之前的例子 “使用 Elasticsearch 和 OpenAI 构建生成式 AI 应用程序”。在这个应用中在查询的时候,查询的文字也是需要转换为向量来进行查询的。我们修改 app.py 的代码如下:

.env

MODEL_NAME=text-embedding-ada-002
AZURE_EMBEDDING_ENDPOINT=https://embeddings-testing1.openai.azure.com/
AZURE_EMBEDDING_API_KEY="YourEmbeddingKey"
AZURE_EMBEDDING_API_VERSION=2023-05-15

AZURE_API_KEY="YourChatCompletionKey"
AZURE_EDNPOINT="YourEndPoint"
AZURE_API_VERSION="2023-03-15-preview"
AZURE_DEPLOYMENT_ID="YourDeploymentId"
import os
import streamlit as st
# import openai
from openai import AzureOpenAI
from elasticsearch import Elasticsearch
from dotenv import load_dotenv

# from openai import OpenAI

# openai = OpenAI()

load_dotenv()

azure_api_key = os.getenv('AZURE_API_KEY')
azure_endpoint = os.getenv('AZURE_EDNPOINT')
azure_api_version = os.getenv('AZURE_API_VERSION')
azure_deployment_id = os.getenv('AZURE_DEPLOYMENT_ID')

chat = AzureOpenAI(
  api_key = azure_api_key,  
  api_version = azure_api_version,
  azure_endpoint = azure_endpoint
)

model_name = os.getenv('MODEL_NAME')
azure_embedding_endpoint = os.getenv('AZURE_EMBEDDING_ENDPOINT')
azure_embedding_api_key = os.getenv('AZURE_EMBEDDING_API_KEY')
azure_embedding_api_version = os.getenv("AZURE_EMBEDDING_API_VERSION")

embeddings = AzureOpenAI(
        api_key=azure_embedding_api_key,
        api_version=azure_embedding_api_version,
        azure_endpoint=azure_embedding_endpoint,
    )

elastic_user=os.getenv('ES_USER')
elastic_password=os.getenv('ES_PASSWORD')
elastic_endpoint=os.getenv("ES_ENDPOINT")

# openai_api_key=os.getenv('OPENAI_API_KEY')

# openai.api_type = "azure"

url = f"https://{elastic_user}:{elastic_password}@{elastic_endpoint}:9200"
client = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)

# Define model
EMBEDDING_MODEL = "text-embedding-ada-002"

def openai_summarize(query, response):
    context = response['hits']['hits'][0]['_source']['text']
    summary = chat.chat.completions.create(
    model = azure_deployment_id,
    messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Answer the following question:" + query + "by using the following text: " + context},
        ]
    )

    print(summary)
    return summary.choices[0].message.content


def search_es(query):
    # Create embedding
    question_embedding = embeddings.embeddings.create(input=query, model=EMBEDDING_MODEL)

    # Define Elasticsearch query
    response = client.search(
    index = "wikipedia_vector_index",
    knn={
        "field": "content_vector",
        "query_vector":  question_embedding.data[0].embedding,
        "k": 10,
        "num_candidates": 100
        }
    )
    return response


def main():
    st.title("Gen AI Application")

    # Input for user search query
    user_query = st.text_input("Enter your question:", "what is football?")

    if st.button("Search"):
        if user_query:

            st.write(f"Searching for: {user_query}")
            result = search_es(user_query)

            # print(result)
            openai_summary = openai_summarize(user_query, result)
            st.write(f"OpenAI Summary: {openai_summary}")

            # Display search results
            if result['hits']['total']['value'] > 0:
                st.write("Search Results:")
                for hit in result['hits']['hits']:
                    st.write(hit['_source']['title'])
                    st.write(hit['_source']['text'])
            else:
                st.write("No results found.")

if __name__ == "__main__":
    main()

最终的代码在链接下载:semantic_search_es/openai_rag_streamlit_azure.ipynb at main liu-xiao-guo/semantic_search_es GitHub

运行完代码后,我们可以看到如下的页面:

%title插图%num

为 AI assistant 配置 Azure OpenAI

我们在 Kibana 的配置文件 kibana.yml 里做如下的配置:

xpack.actions.preconfigured:
  azure-open-ai:
    actionTypeId: .gen-ai
    name: Azure OpenAI GPT-4
    config:
      apiUrl: https://xxxx.openai.azure.com/openai/deployments/gpt-4-32k/chat/completions?api-version=2023-07-01-preview
      apiProvider: Azure OpenAI
    secrets:
      apiKey: YourOwnApiKey

请注意上面的配置是 Azure OpenAI 的 chat completion 配置。

为了能够使得 Kibana 正常工作,我们还必须使用如下的命令来生成 keys:

./bin/kibana-encryption-keys generate
$ ./bin/kibana-encryption-keys generate
Kibana is currently running with legacy OpenSSL providers enabled! For details and instructions on how to disable see https://www.elastic.co/guide/en/kibana/8.15/production.html#openssl-legacy-provider
## Kibana Encryption Key Generation Utility

The 'generate' command guides you through the process of setting encryption keys for:

xpack.encryptedSavedObjects.encryptionKey
    Used to encrypt stored objects such as dashboards and visualizations
    https://www.elastic.co/guide/en/kibana/current/xpack-security-secure-saved-objects.html#xpack-security-secure-saved-objects

xpack.reporting.encryptionKey
    Used to encrypt saved reports
    https://www.elastic.co/guide/en/kibana/current/reporting-settings-kb.html#general-reporting-settings

xpack.security.encryptionKey
    Used to encrypt session information
    https://www.elastic.co/guide/en/kibana/current/security-settings-kb.html#security-session-and-cookie-settings


Already defined settings are ignored and can be regenerated using the --force flag.  Check the documentation links for instructions on how to rotate encryption keys.
Definitions should be set in the kibana.yml used configure Kibana.

Settings:
xpack.encryptedSavedObjects.encryptionKey: f6d17019af61a5724b314b59bada9bbc
xpack.reptgcodeorting.encryptionKey: d2f7ad59a387e2893f04e456c0752d7c
xpack.security.encryptionKey: 00c17aa28e4d37e49925ee4fdee589f3

把上面生成的 keys 拷贝到 kibana.yml 文件中保存,并重新启动 Kibana:

%title插图%num

%title插图%num

我们可以看到 Azure OpenAI GPT-4 的配置已经做好了。

更多阅读,请参考 “Elastic AI Assistant for Observability 和 Microsoft Azure OpenAI 入门”。

在 Dev Tools 下创建如下的一个 log:

POST /logs-elastic_agent-default/_doc
{
	"message": "Status(StatusCode="FailedPrecondition", Detail="Can't access cart storage. nSystem.ApplicationException: Wasn't able to connect to redis n  at cartservice.cartstore.RedisCartStore.EnsureRedisConnected() in /usr/src/app/src/cartstore/RedisCartStore.cs:line 104 n  at cartservice.cartstore.RedisCartStore.EmptyCartAsync(String userId) in /usr/src/app/src/cartstore/RedisCartStore.cs:line 168").",
	"@timestamp": "2024-02-22T11:34:00.884Z",
	"log": {
    	"level": "error"
	},
	"service": {
    	"name": "cartService"
	},
	"host": {
    	"name": "appserver-1"
	}
}

%title插图%num

%title插图%num

%title插图%num

%title插图%num

文章来源于互联网:Elasticsearch:如何把 OpenAI 的代码修改为 Azure OpenAI

Tags: , , , ,