背景
这周关注到github比较火的两个仓库
看了下源码,并在本地run了一下,研究了一下
draw-a-ui官方效果
screenshot-to-code官方效果
user-images.githubusercontent.com/23818/28300…
效果是不是很炸裂,身为前端er的我直呼woc
原理
核心原理有两点
gpt-4-vision-preview 的强大能力
gpt4 的 vision 接口能力,将图片塞给 openai gpt api 去进行图形识别,配合特定的prompt来让gpt输出含有tailwindcss的单html文件,来快速进行生成和效果展示
tailwindcss的单文件特性
这点很有意思,tailwindcss的单文件特性让其在gpt时代大放光彩,假象如果是需要html文件和css/less/sass文件 多文件配合,gpt吐出的效果肯定大大不如tailwindcss的单html文件。再加上gpt官方的资料更新到2023年4月后,对tailwindcss的解析和理解更加全面
prompt
draw-a-ui的prompt
js复制代码const systemPrompt = `You are an expert tailwind developer. A user will provide you with a
low-fidelity wireframe of an application and you will return
a single html file that uses tailwind to create the website. Use creative license to make the application more fleshed out.
if you need to insert an image, use placehold.co to create a placeholder image. Respond only with the html file.`;
screenshot-to-code的prompt
js复制代码SYSTEM_PROMPT = """
You are an expert Tailwind developer
You take screenshots of a reference web page from the user, and then build single page apps
using Tailwind, HTML and JS.
You might also be given a screenshot of a web page that you have already built, and asked to
update it to look more like the reference image.
- Make sure the app looks exactly like the screenshot.
- Pay close attention to background color, text color, font size, font family,
padding, margin, border, etc. Match the colors and sizes exactly.
- Use the exact text from the screenshot.
- Do not add comments in the code such as "<!-- Add other navigation links as needed -->" and "<!-- ... other news items ... -->" in place of writing the full code. WRITE THE FULL CODE.
- Repeat elements as needed to match the screenshot. For example, if there are 15 items, the code should have 15 items. DO NOT LEAVE comments like "<!-- Repeat for each news item -->" or bad things will happen.
- For images, use placeholder images from https://placehold.co and include a detailed description of the image in the alt text so that an image generation AI can generate the image later.
In terms of libraries,
- Use this script to include Tailwind: <script src="https://cdn.tailwindcss.com"></script>
- You can use Google Fonts
- Font Awesome for icons: <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.3/css/all.min.css"></link>
Return only the full code in <html></html> tags.
Do not include markdown "```" or "```html" at the start or end.
"""
USER_PROMPT = """
Generate code for a web page that looks exactly like this.
"""
可以通过看到 draw-a-ui 的prompt善于草图还原,screenshot-to-code的prompt善于已有的ui图片图进行还原,使用下来感觉draw-a-ui的场景更加丰富,因为他本身是基于一个开源的图片编辑平台**tldraw** 来实现的,用户可以在平台上画草图,粘贴图片,某种程度上来讲,screen-to-code能做的 draw-a-ui都能做,但反之不行
站在巨人的肩膀上
在draw-a-ui的基础上,结合screen-to-code, 进行了一些优化, 并通过vercel发布到线上,欢迎大家来体验,当然,要自备gpt-4-vision-preview的key 哈哈 调用一次0.1-0.2$的额度。。
体验地址 ai-code.akong.fun/
效果如下
具体优化点如下
prompt优化
体验了一些draw-a-ui的系统,感觉很不错,但是根据前面所说的 只能画草图,所以结合screen-to-code的prompt,将两者的prompt进行了整合优化,优化后的prompt既能还原草图,也能还原真实UI图,优化后的prompt如下:
js复制代码
const systemPrompt =
`
# Role: Tailwind CSS Developer
## Task
- Input: Screenshot(s) of a reference web page or Low-fidelity
- Output: Single HTML page using Tailwind CSS, HTML
## Guidelines
- Utilize Tailwind CSS to develop the website based on the provided screenshot or Low-fidelity
- Achieve an exact visual match to the provided screenshot or Low-fidelity
- Pay close attention to:
- Background color
- Text color
- Font size
- Font family
- Padding
- Margin
- Border
- Use the precise text from the screenshot
- Avoid placeholder comments; write the full code
- Repeat elements as shown in the screenshot (e.g., if there are 15 items, include 15 items in the code)
- Use placeholder images from `https://placehold.co` with descriptive `alt` text for future image generation
## Libraries
- Include Tailwind CSS via: `<script src="https://cdn.tailwindcss.com"></script>`
## Deliverable
- Respond with the complete HTML code within `<html>` tags
- Respond with the HTML file content only
`
接口改为前端调用
draw-a-ui原仓库是基于nextjs开发的,调用放到了service层,但是部署到vercel之后,普通的用户部署后接口超时时间最大10s,所以会有调用timeout的情况出现,为了解决这个问题,直接把调用gpt的逻辑放到了前端
gpt密钥的动态配置
界面右下角添加了密钥设置的界面,用户可自行配置gpt-4-vision-preview key和api_base_url来进行使用
如果是openai官方的key,代理地址就填官方的即可,这里没做默认值
历史记录功能
draw-a-ui原仓库没有写历史记录功能
毕竟调用一次接口那么贵! 历史记录高低得加上。。 不枉每次0.2$的额度 哈哈
这里简单的使用了 indexedDB来进行数据存储
完善了PC Mobile的预览
在代码预览界面,添加了 mobile 尺寸预览,方便查阅效果
总结
在draw-a-ui的基础上进行了点改动,以快速的在线用起来
在aigc时代,一个prompt,就能真切的用于生产力的提示,真的是让人兴奋又焦虑。
在这个新技术新能力迸发的时代,什么在变,什么又始终不变,如何剖析出其中不变的元知识和内核框架,保持竞争力,是个令人深思的问题,目前我还没有找到答案