OpenAI Releases GPT-5 and it is State of the Art (SOTA) across Key Coding Benchmarks

OpenAI released GPT‑5 today and it is now state-of-the-art (SOTA) across key coding benchmarks, scoring 74.9% on SWE-bench Verified and 88% on Aider polyglot.

SWE-bench Verified: (Tests AI models on real-world GitHub issues, evaluating their ability to generate accurate code patches)

GPT-5 with Thinking (High) scored highest with 74.9%, followed closely by Claude Opus 4.1 at 74.5%.  For comparison, OpenAI o3 High scored at 69.1%

Aider Polyglot: (Evaluates code editing across multiple programming languages (e.g., Java, Rust, Python):

GPT-5 dominates with 88%, a substantial lead over competitors.  Grok 4 comes in next at 79.6%.  For comparison, OpenAI o3 came in at 81%.

This new era of AI Assisted coding LLMs just keeps getting better and better each day!

Using OpenAI GPT-OSS Open Weight Local LLM Model to Develop a Moon Landing Simulation Using C# on my Alienware Aurora R11 RTX-3080 10GB Video Card

OpenAI released their new Open Weights local LLM model today under the Apache License that you can download and run it locally on your own hardware without the need of a cloud subscription.  The larger 120 billion parameter model will require a system with an 80GB GPU (who has that!?) or a MAC M3/M4 system with at least 128GB of Integrated/shared memory. The new AMD Ryzen AI 395 Max + 395 with 128GB can also run the 120B parameter model locally.    You can definitely run the smaller 4-bit quantization 20 billion parameter model locally using a NVidia GeForce 3090, 40×0 or 50×0 video card with at least 16GB of VRAM.  I personally downloaded the smaller 20B parameter model and got around 11 tokens per second using my RTX 3080 with 10GB using LM Studio.  Some of the GPT-OSS 20b model had to be loaded into my local system ram on my Alienware A11 which made it run much slower.

On a system with 16GB VRAM or 128GB of Integrated memory, the free GPT-OSS LLM performs pretty well on the Codeforces Competiton code benchmark tests against OpenAI’s other subscription-based cloud LLMs:

Source: Codeforces Benchmark from OpenAI

Since I only have 10 GB of VRAM on my RTX-3080, I tested GPT-OSS-20b with LM Studio and offloaded some of the model into my 64GB of system memory.  I first tested it with a simple Hello World C++ application, and I got an average of 11 tokens per second dividing the model between high-speed Ge-Force VRAM and slow system RAM.

 

 

I then tested it by building a much larger C# Moon Landing simulation and it ran out of context memory, and it did not complete the coding task.

 

 

I definitely need to have at least 16 of VRAM, or better yet, 128GB of shared integrated memory in my development system to get faster local GPT-OSS LLM performance along with a bigger context window to complete this Moon Landing C# AI Assisted coding task.

Microsoft also announced today that you can also try it out in Foundry Local or AI Toolkit for VS Code (AITK) and start using it to build your Microsoft applications today using the GPT-OSS local LLM models. Let the free local GPT-OSS LLM Open Weights AI Assisted App & Game Development begin!

My next blog post with go over LLM testing using my NVIDIA Orin Jetson Nano Super 8GB Developer Kit AI Edge device using smaller Open Source LLM models that will fit into an 8GB AI Edge device with a much lower power draw than my RTX-3080.

 

Ludus AI Agent Assisted Game Development Tools for Unreal Game Engine

In preparation for my presentation for my July 2025 CGDC presentation on AI Assisted Game Development Tools, I spent a lot of time beta testing the latest beta version of the Ludus AI Blueprints Toolkit for the Unreal Game Engine using Unreal 5.6.

I did my Blueprints testing using their 14 Day free Pro Subscription Trial with 20,00 credits.  They also provided additional testing credits for the closed beta period to help with the closed beta, which I really appreciated!

I was particularly focused on their new AI Agent Blueprint Generation Features.   They currently provide support for: Actor, Pawn, Game Mode, Level Blueprint, Editor Utility Blueprints & Function Library Blueprints.   They also offer partial support for Widget (UMG).  They are planning to provide support for Material, Niagara, Behaviour Tree, Control Rig, and MetaSound in the near future.  Their goal is to fully integrate Blueprint Generation into the main plugin interface. Once the Open Beta phase concludes, activation via a special beta code will no longer be necessary.

For my beta testing, I asked the Ludus AI agent to identify all the load issues I was getting after upgrading an old Kidware Software Unreal Engine 4.26 RPG academic game project to the latest version of Unreal 5.6.  I asked the new Ludus AI Blueprints agent to identify and fix the load issues I was getting after I updated the old project from Unreal 4.26 to 5.6.

The Ludus AI Agent examined my updated Unreal Engine 5.6 project and determined which Blueprints were broken and it gave me a systematic plan to fix them. I then asked Ludus AI (in Agent mode) to apply all the recommended fixes and test them for me. You can watch the 5-minute video below showing the Ludus AI agent in action:

In Agent mode, Ludus AI made all the fixes (without my help) and my RPG game project worked flawlessly using Unreal Engine 5.6.  I was genuinely impressed!   Below is the video I took of the fixed gameplay after Ludus AI did its Unreal Engine 5.6 Blueprint repair magic:

Back in 2022, it took me many hours to identify and fix all those Unreal 5 Upgrade errors that I was getting back then doing it all manually myself.  The Ludus AI Blueprints beta agent did all the repair work for me in approximately 5 minutes.

I was so impressed by the Beta Ludus AI Agent Blueprints results; I personally purchased a subscription for the Ludus AI plugin for myself.  They offer several tiers of pricing for Indies, Pros and Enterprise customers:

They also offer a Free 400 Credit per month subscription so you can take a look at their plugin.

They also sell additional credit packages just in case you burn through your credit allotment during a given month.

The Ludus AI Blueprints 0.6.0 beta agent support moved into “Open Beta” on July 29, 2025, so you can now try out their new AI Blueprint beta features yourself at https://ludusengine.com/ using a Ludus AI Pro 14 Day Free Evaluation Subscription.

I plan to give you another update on my Ludus AI Assisted Unreal Game Engine Blueprint testing in September.

Local LLM Moon Landing Simulation C# Co-Development Performance Results using my Alienware Aurora R11 RTX-3080 10GB using LM Studio

For the past several months, I have been co-developing a Moon Landing simulation game using several different open source LLM models running “locally” on my Alienware Aurora 11 GeForce RTX-3080 10GB VRAM video card.   Running LLMs locally using my RTX 3080 is completely free compared to using cloud based LLMs.

In my last blog post I shared the performance results using the Open Source Ollama local LLM management system that runs under a “text-based” command window.  In this blog post, I will share the C# Moon Landing Simulation C# Co-Development performance results using the GUI based LM Studio application. LM Studio is not open source but is free for individuals to use and can be licensed for companies or schools to use.

 

In my personal coding tests, I loaded the google/gemma-3-12b model locally into the GeForce RTX-3080 10GB of VRAM on my Alienware Aurora 11 PC which also has 64GB of system RAM.

I tested building a C# Lunar Lander simulation game using the open source google/gemma-3-12b available from their website to see how the performance results compared to the Lunar Lander game that I previously built using the “text-based” Ollama software.

Building the Lunar Lander simulation game with LM Studio using google/gemma-3-12b came in at 8.45 tokens per second:

Building the Lunar Lander simulation game using deepseek-r1-0528-qwen3-8b with LM Studio came in at 74.94 tokens per second:

As you can see below my GeForce RTX-3080 got pretty hot running these tests.  Fortunately, Dell designed the Alienware Aurora 11 to handle this kind of GPU computing heat!

I also tested the qwen2.5-coder-14b specific local model using LM Studio and got 5.77 tokens per second.

I also tried loading a larger LLM like Meta’s llama-3.3-70b (70 billion parameters) using LM Studio by running part of the model on the RTX-3080 10GB VRAM while offloading rest of the model into the 64GB of slower System RAM on my Alienware Aurora 11.

LM Studio did load the model, but I only got 0.67 Tokens per second which was totally useless.  I actually canceled the Moon Landing C# build as it was pegging all my resources and was taking way too long to actually code the Lunar Lander C# program.

 

I definitely need a Nvidia GeForce GPU with a lot more memory (like the RTX-5090) or an AI specific CPU/APU/NPU with at least 96GB of Integrated RAM to use the larger llama-3.3-70b “70 billion” parameter model on a consumer development PC or laptop!

I’ll either need to upgrade to an Apple MAC M4 with 96GB of shared memory or an AMD Ryzen AI Max + 395 laptop like the new Asus Flow Z13 with 128GB of Integrated memory to run larger models like this.  On the other hand, it may be best to wait until the rumored ARM based MediaTek – Nvidia SOC N1X based 128GB laptops hit the market in 2026.

The new Nvidia DGX Spark workstation, based on the Nvidia GB10 Superchip, can definitely handle these large models but it is a little too expensive for me right now and I want to be able to still play games on my AI development system.

Until then, I will continue my AI Co-Development testing throughout the summer and let you know what my conclusions are by the time I start teaching my next Unreal Engine course in-person at Northwest University this Fall.

By the way, here is the list of DeepSeek, Google, Meta and Quen LLMs that I tested locally using LM Studio (sorted by size):

My next set of blog posts will be dedicated to using various AI tools and LLMs within popular Game Engines like Unity, Unreal Engine and GoDot.

OpenAI o3-mini Reasoning Model Released for AI Assisted Coding

OpenAI has released o3-mini  (“o” for “Omni, not zero”) which is a step up in performance from o1-mini.  I have been reviewing OpenAI’s performance charts their new o3 reasoning model and it is scoring pretty well on CodeForces, Software Engineering (SWE-bench verified), and on LiveBench coding.  We will continue testing their o1 & o3 reasoning models against our Kidware Software code base to see how well it performs here at Kidware Software.   The performance charts below are from OpenAI o3 Announcement Blog.

 

 

 

All Images and Notes abaove are From OpenAI’s January 1, 2025 o3 Announcemennt on their Blog here