LLMs 还能根据文本描述生成图像。图像作为一种模式，在医疗科技、建筑、旅游、游戏开发等多个领域都很有用处。在这一章，我们将介绍两种最受欢迎的图像生成模型：DALL-E 和 Midjourney

前面我们已经介绍了：

LLMs 不是只能生成文本，还能根据文本描述生成图像。图像作为一种模式，在医疗科技、建筑、旅游、游戏开发等多个领域都很有用处。在这一章，我们将介绍两种最受欢迎的图像生成模型：DALL-E 和 Midjourney。

如何要构建图像生成应用程序？

图像生成应用程序是一种探索生成式人工智能功能的好方式。它们可以用于以下场景：

图像编辑和合成。您可以根据不同的需求生成图像，比如图像编辑和图像合成。
适用于多个行业。它们还可以用于为医疗科技、旅游、游戏开发等多个行业生成图像。

DALL-E 和 Midjourney 是什么？

DALL-E 和 Midjourney 是两种最受欢迎的图像生成模型，它们可以让您用提示词生成图像。

DALL-E

我们先来看看 DALL-E，它是一种生成式 AI 模型，可以根据文本描述生成图像。

DALL-E 是 CLIP 和 diffused attention 两种模型的结合

CLIP，是一种能够从图像和文本生成嵌入的模型，嵌入是数据的数字表达。
diffused attention，是一种能够从嵌入生成图像的模型。DALL-E 在图像和文本数据集上进行了训练，可以用于从文本描述生成图像。比如，DALL-E 可以用于生成戴着帽子的猫或者有莫西干发型的狗的图像。

Midjourney

Midjourney 的工作原理和 DALL-E 相似，它也是根据文本提示生成图像。Midjourney 也可以用于用“戴着帽子的猫”或者“莫西干狗”等提示生成图像。

DALL-E 和 Midjourney 是如何工作的

首先，DALL-E。DALL-E 是一种基于带有 autoregressive transformer 的 transformer 架构的生成式人工智能模型。

“autoregressive transformer”定义了模型如何根据文本描述生成图像，它每次生成一个像素，然后用生成的像素生成下一个像素。经过神经网络的多个层，直到图像生成完毕。

通过这个过程，DALL-E 可以控制生成的图像中的属性、对象、特征等。DALL-E 2和3可以对生成的图像有更多的控制力，

构建您的第一个图像生成应用程序

那么，构建图像生成应用程序需要什么呢？您需要以下 Library：

python-dotenv，建议您使用这个库将您的秘密保存在 .env 文件中，避免暴露在代码里。
openai，您将使用这个库与 OpenAI API 交互。
pillow，用于在 Python 中处理图像。
requests，用于发送 HTTP 请求。

创建一个包含以下内容的文件 .env：

ini复制代码
AZURE_OPENAI_ENDPOINT=<your endpoint>
AZURE_OPENAI_KEY=<your key>

将上述库放在一个名为 requirements.txt 的文件中，如下所示：

复制代码
python-dotenv
openai
pillow
requests

接下来，创建虚拟环境并安装库：

bash复制代码
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

对于 Windows，使用以下命令创建并激活虚拟环境：

复制代码
python3 -m venv venv
venvScriptsactivate.bat

在一个名为 app.py 的文件中添加以下代码：

ini复制代码
import openai
import os
import requests
from PIL import Image
import dotenv

# import dotenv
dotenv.load_dotenv()

# Get endpoint and key from environment variables
openai.api_base = os.environ['AZURE_OPENAI_ENDPOINT']
openai.api_key = os.environ['AZURE_OPENAI_KEY']     

# Assign the API version (DALL-E is currently supported for the 2023-06-01-preview API version only)
openai.api_version = '2023-06-01-preview'
openai.api_type = 'azure'


try:
    # Create an image by using the image generation API
    generation_response = openai.Image.create(
        prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',    # Enter your prompt text here
        size='1024x1024',
        n=2,
        temperature=0,
    )
    # Set the directory for the stored image
    image_dir = os.path.join(os.curdir, 'images')

    # If the directory doesn't exist, create it
    if not os.path.isdir(image_dir):
        os.mkdir(image_dir)

    # Initialize the image path (note the filetype should be png)
    image_path = os.path.join(image_dir, 'generated-image.png')

    # Retrieve the generated image
    image_url = generation_response["data"][0]["url"]  # extract image URL from response
    generated_image = requests.get(image_url).content  # download the image
    with open(image_path, "wb") as image_file:
        image_file.write(generated_image)

    # Display the image in the default image viewer
    image = Image.open(image_path)
    image.show()

# catch exceptions
except openai.error.InvalidRequestError as err:
    print(err)

我们来解释一下这段代码：

首先，我们导入我们需要的 Library ，包括 OpenAI 、dotenv 、requests 和 Pillow。

javascript复制代码
import openai
import os
import requests
from PIL import Image
import dotenv

接下来，我们从 .env 文件中加载环境变量。

arduino复制代码
# import dotenv
dotenv.load_dotenv()

然后，我们设置 OpenAI API 的 endpoint 、 key 、版本和类型。

ini复制代码
# Get endpoint and key from environment variables
openai.api_base = os.environ['AZURE_OPENAI_ENDPOINT']
openai.api_key = os.environ['AZURE_OPENAI_KEY'] 

# add version and type, Azure specific
openai.api_version = '2023-06-01-preview'
openai.api_type = 'azure'

接下来，我们生成图像：

ini复制代码
# Create an image by using the image generation API
generation_response = openai.Image.create(
    prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',    # Enter your prompt text here
    size='1024x1024',
    n=2,
    temperature=0,
)

上面的代码返回一个 JSON 对象，其中包含生成图像的 URL。我们可以用 URL 下载图像并保存到文件中。

最后，我们打开图像并用标准的图像查看器显示它：

arduino复制代码
image = Image.open(image_path)
image.show()

更多关于生成图像的细节

让我们更详细地看看生成图像的代码：

ini复制代码
generation_response = openai.Image.create(
        prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',    # Enter your prompt text here
        size='1024x1024',
        n=2,
        temperature=0,
    )

prompt，是用于生成图像的文本提示。在这个例子中，我们用了提示“Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils”。
size，是生成的图像的尺寸。在这个例子中，我们生成了一个 1024×1024 像素的图像。
n，是生成的图像的数量。在这个例子中，我们生成了两个图像。
temperature，是控制生成人工智能模型输出随机性的参数。temperature 是 0 到 1 之间的值，其中 0 表示输出是确定的，1 表示输出是随机的。默认值是 0.7。

您还可以对图像做更多的操作，我们将在下面介绍这些操作。

图像生成的额外功能

到目前为止，您已经学会了如何用 Python 中的几行代码来生成图像。但是，您还可以对图像做更多的操作。

您还可以做这些：

编辑图像。您可以通过给现有的图像加上遮罩和提示来修改图像。比如，您可以给图像的某个部分加上一些东西。想象一下我们的兔子图像，您可以给兔子戴上一顶帽子。您可以通过提供图像、遮罩（标记要修改的区域）和文本提示来说明要做什么。

ini复制代码
response = openai.Image.create_edit(
  image=open("base_image.png", "rb"),
  mask=open("mask.png", "rb"),
  prompt="An image of a rabbit with a hat on its head.",
  n=1,
  size="1024x1024"
)
image_url = response['data'][0]['url']

基本图像只有兔子，但最后的图像会在兔子头上加上帽子。

生成变体。这个想法是，您可以用现有的图像来生成不同的版本。要生成变体，您需要提供图像和文本提示以及代码，如下所示：

ini复制代码
response = openai.Image.create_variation(
  image=open("bunny-lollipop.png", "rb"),
  n=1,
  size="1024x1024"
)
image_url = response['data'][0]['url']

Temperature

Temperature 是控制生成式 AI 模型输出随机性的参数。Temperature 是 0 到 1 之间的值，其中 0 表示输出是确定的，1 表示输出是随机的。默认值是 0.7。

如何用元提示来限定应用程序的范围

通过我们的演示，我们已经可以为客户生成图像。但是，我们还需要为我们的应用程序设定一些范围。

比如，我们不想生成不适合工作或不适合儿童的图像。

我们可以用元提示来实现这一点。元提示是用来控制生成式 AI 模型输出的文本提示。比如，我们可以用元提示来控制输出，确保生成的图像是安全的，或者是适合儿童的。

它是怎么工作的呢？

现在，元提示是怎么工作的呢？

元提示是用来控制生成式 AI 模型输出的文本提示，它们放在文本提示的前面，用来控制模型的输出，并嵌入到应用程序中来控制模型的输出。提示输入和元提示输入被封装在一个文本提示中。

元提示的一个例子如下：

ini复制代码
You are an assistant designer that creates images for children. 

The image needs to be safe for work and appropriate for children. 

The image needs to be in color.  

The image needs to be in landscape orientation.  

The image needs to be in a 16:9 aspect ratio. 

Do not consider any input from the following that is not safe for work or appropriate for children. 

(Input) 

```text

现在，让我们看看如何在例子中使用元提示。

```python
disallow_list = "swords, violence, blood, gore, nudity, sexual content, adult content, adult themes, adult language, adult humor, adult jokes, adult situations, adult"

meta_prompt =f"""You are an assistant designer that creates images for children. 

The image needs to be safe for work and appropriate for children. 

The image needs to be in color.  

The image needs to be in landscape orientation.  

The image needs to be in a 16:9 aspect ratio. 

Do not consider any input from the following that is not safe for work or appropriate for children. 
{disallow_list}
"""

prompt = f"{meta_prompt} 
Create an image of a bunny on a horse, holding a lollipop"

# TODO add request to generate image

从上面的提示中，您可以看到所有生成的图像都考虑了元提示。

声明：本站所有文章，如无特殊说明或标注，均为本站原创发布。任何个人或组织，在未征得本站同意时，禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益，可联系我们进行处理。

{{userData.name}}已认证

[Release] Tree Of Savior

服务器文件圣斗士星矢 2024 + 客户端 + GM 工具 + 安装教程 + 新闻更新

Ragnarok M korea mobile game source(O)

PTS_4.6_Guide

[一键安装] 手游-天道情缘

[一键安装] 霸王大陆EP8-5.0 虚拟机+源码+大背包+商城-最新整理

【灵魂行者】【灵魂武器】【SoulWorker】100级端（后宫行者）虚拟机版

[一键安装] 龙之谷手游飓风龙单机版一键端完整GM后台局域网

生成式人工智能构建图像生成应用

如何要构建图像生成应用程序？

DALL-E 和 Midjourney 是什么？

DALL-E 和 Midjourney 是如何工作的

构建您的第一个图像生成应用程序

图像生成的额外功能

如何用元提示来限定应用程序的范围

生成式人工智能创建聊天应用

多轮对话中让AI保持长期记忆的8种优化方式（附案例和代码）

{{userData.name}}已认证

如何要构建图像生成应用程序？

DALL-E 和 Midjourney 是什么？

DALL-E 和 Midjourney 是如何工作的

构建您的第一个图像生成应用程序

图像生成的额外功能

如何用元提示来限定应用程序的范围

Related posts: