OpenAI อัปเกรดเครื่องมือสร้างรูปภาพด้วย AI ผลลัพธ์สมจริงขึ้น ระบุรายละเอียดได้ดีกว่าเดิม

by arjin

26 March 2025 - 00:37

OpenAI ประกาศอัปเกรดเครื่องมือสร้างรูปภาพขั้นสูงบนโมเดล GPT-4o ที่บอกว่าไม่เพียงแต่ได้รูปที่สวยงามกว่าเดิม แต่สามารถกำหนดรายละเอียดให้ตรงกับความต้องการยิ่งกว่าเดิม

เนื่องจาก GPT-4o เป็นโมเดลที่ค่อย ๆ คิดเป็นขั้นตอน ทำให้การสร้างรูปภาพบนโมเดลนี้จึงสามารถกำหนดรายละเอียด หรือสั่งแก้ไขเป็นส่วนได้ดีกว่า DALL·E ที่เป็นเครื่องมือสร้างรูปภาพตัวเดิม ตัวอย่างที่ OpenAI นำเสนอ เช่น สามารถระบุข้อความที่ปรากฎในรูปภาพอย่างละเอียดแต่ละตำแหน่งได้, สามารถกำหนดหรือแก้ไขภาพที่มีทั้งข้อความและคนในรูปได้, กำหนดรายละเอียดตามลำดับสูงถึง 10-20 รายการใน 1 prompt, สามารถเรียนรู้จากรูปที่อัปโหลดเข้าไปได้, มีความรู้จับคู่ข้อความกับภาพที่สามารถสร้าง Infographic ได้ เป็นต้น (ตัวอย่างที่น่าสนใจอยู่ท้ายข่าว)

OpenAI บอกว่าข้อมูลที่นำมาฝึกฝนเครื่องมือสร้างรูปภาพนี้ใช้ข้อมูลที่มีเผยแพร่แบบสาธารณะ รวมทั้งเป็นข้อมูลจากพาร์ตเนอร์เช่น Shutterstock

เครื่องมือสร้างรูปภาพใหม่บนโมเดล GPT-4o นี้ เริ่มอัปเดตให้ใช้งานตั้งแต่วันนี้สำหรับลูกค้า Plus, Pro, Team และลูกค้าฟรี ผ่าน ChatGPT โดยมีจำนวนจำกัดต่อวันสำหรับบางแผนเหมือนกับ DALL·E ส่วน Enterprise และ Edu จะตามมาในภายหลัง นอกจากนี้ยังสามารถเรียกใช้งานผ่าน Sora ได้ ส่วนคนที่ยังต้องการใช้ DALL·E เดิม ให้เรียกผ่านคัสตอม DALL·E GPT แทน สำหรับนักพัฒนาจะสามารถใช้งานผ่าน API ได้ในอีกไม่กี่สัปดาห์ข้างหน้า

ที่มา: OpenAI

ตัวอย่าง ภาพที่มีข้อความ ระบุรายละเอียดแต่ละคำ และมีคนในภาพ

magnetic poetry on a fridge in a mid century home:

Line 1: "A picture"
Line 2: "is worth"
Line 3: "a thousand words,"
Line 4: "but sometimes"Large gapLine 5: "in the right place"
Line 6: "can elevate"
Line 7: "its meaning.

"The man is holding the words "a few" in his right hand and "words" in his left.

ตัวอย่างภาพที่มีป้าย ระบุข้อความจำนวนมาก

Create a photorealistic image of two witches in their 20s (one ash balayage, one with long wavy auburn hair) reading a street sign.

Context:
a city street in a random street in Williamsburg, NY with a pole covered entirely by numerous detailed street signs (e.g., street sweeping hours, parking permits required, vehicle classifications, towing rules), including few ridiculous signs at the middle: (paraphrase it to make these legitimate street signs)"Broom Parking for Witches Not Permitted in Zone C" and "Magic Carpet Loading and Unloading Only (15-Minute Limit)" and "Reindeer Parking by Permit Only (Dec 24–25)\n Violators will be placed on Naughty List." The signpost is on the right of a street. Do not repeat signs. Signs must be realistic.

Characters:
one witch is holding a broom and the other has a rolled-up magic carpet. They are in the foreground, back slightly turned towards the camera and head slightly tilted as they scrutinize the signs.

Composition from background to foreground:
streets + parked cars + buildings -> street sign -> witches. Characters must be closest to the camera taking the shot

กรณีนี้เริ่มต้นจากการสร้างภาพแมว ใส่รายละเอียดต่าง ๆ จากนั้นให้แมวตัวนี้ตามรายละเอียดที่กำหนดไว้ก่อนหน้า ไปปรากฎในฉากวิดีโอเกมได้

อธิบายสถานการณ์ของภาพเป็นขั้นตอนเพื่อสร้างรูปขึ้นมาได้

We need evidence there is a currently present invisible elephant. Consider what an elephant is and does in the environment, then show us that, perhaps mid-process - but the elephant itself is not shown at all

สร้าง Infographic โดยระบุรายละเอียดที่ต้องการ

Make me a professionally shot photorealistic diagram of the top selling cocktails in my bar with recipes labeled on each drink.

put the recipes on handwritten cards in front of each drink.

the cards are brown, and the text is black.

background is white

Title is "4 most popular cocktails"

สร้างรูปภาพที่สมจริงกว่าเดิม

Generate a photorealistic image of farmer's market in toronto on a saturday in summer 2006, it's a beautiful late june day, people are shopping and eating sandwiches. in focus should be a young asian girl wearing denim overalls and sipping on a strawberry banana smoothie - rest can be blurred. the photo should be reminiscent of that a digital camera from 2006 would take, with a timestamp like a printed photo would have. aspect ratio should be 3:2

Read on Full Site

Blognone Jobs Premium