Can GenAI Write Better Outcomes Than You or I? I Put It to the Test

PHIL JACKLIN
Dec 6, 2024
4 min read

Updated: Mar 10

Many projects are not particularly outcome focused and they struggle with it. I wondered if the new saviour of the world - GenAI - could help. So here’s what I did.

I ran this experiment twice, once on ChatGPT and once on Gemini. It would be interesting to see if there were differences.

I wrote a VERY detailed prompt for both AIs, defining what an outcome was and giving examples of good outcomes and bad outcomes. I also defined benefits and outputs and suggested the AI stays away from those. I used standard definitions for benefits, outcomes and outputs as defined by the Association for Project Management. I set up the prompts so that the AI would give recommendations and ask questions, encouraging me to write better outcomes. Each time, the GenAI system would score the outcome out of 100%.

All I had to do, was write a proposed outcome and the GenAI would recommend and ask questions to get me to write the outcome better.

I tested this on two fictitious outcomes - 1 which was actually a benefit and 1 which was actually an output. When the score reached 85% or over, I asked the AI systems to accept the outcome and stop offering advice.

Here’s how they did.

The benefit

I started with ChatGPT and proposed my outcome to be “Increase market share of the home loan market by 10% in the next 12 months”. This is a benefit, not an outcome.

ChatGPT gave me a score of 80% straight off the starting blocks and didn’t pick up this was a benefit. It did suggest that it could be more customer-focused and should consider adding customer pain points that were being solved into the definition.

My next iteration developed to “Give 10% more customers access to the best home loan rates on the market over the next 12 months.” ChatGPT said this was excellent at a score of 92%.

I think it’s pretty average.

Let’s see if Gemini was any better.

We started at the same place - “Increase market share of the home loan market by 10% in the next 12 months”.

Gemini also gave me a high score of 75%, but its recommendations and questions were not as strong. It recommended I segment the outcome to a particular segment of the home loan market. It also missed the fact that this is a benefit.

“Increase market share of the first time buyer home loan market by 10% in the next 12 months”. I now scored 85%. Gemini did ask me how I was going to measure the baseline so that I could be confident when I had achieved the outcome.

Overall, neither system performed well here. Both missed that this was a benefit, not an outcome. The advice both gave was interesting, but in neither case helped me to craft a better outcome.

Gemini actually moved me away from what I wanted to achieve - I didn’t want to achieve a 10% improvement in a segment. Whilst ChatGPT wasn’t good, Gemini made my outcome worse than where I started.

I went a step further and asked Gemini if it thought the proposed outcome was an outcome or a benefit. It was pretty clear that we had an outcome.

Let’s see if they did any better on starting with an output.

The output

We’re back in ChatGPT and the prompt I gave it was “Upgrade my knowledge management system to the latest version”.

I got a score of 60% and advice that “this is more of an output than an outcome”. I was encouraged to think about what efficiency improvements I expected as a result of upgrading.

“Upgrade my knowledge management system to increase the number of people who use it”. Score of 75%. Advice to think about what specific problem or opportunity I was addressing.

“Improve productivity by increasing the number of questions resolved through the knowledge management system” scored 85%. ChatGPT did recommend I start thinking about what improvements can encourage this outcome.

Overall, I don’t think the outcome is particularly good. There’s no date set, how productivity will be measured is not defined, it’s not particularly memorable. I liked that I was encouraged to think in more output-centric ways, but the end result, whilst an improvement on where we started, is still not great.

Gemini, the title is yours for the taking - how did you do?

The same prompt of “Upgrade my knowledge management system to the latest version” achieved the same score of 60%. The advice was a little strange and recommended that I think about what benefits I can expect to achieve by upgrading the system.

“Achieve a 1,000,000,000% improvement in organisational knowledge retention by upgrading the knowledge management system.” - 75%. No mention of the fact that this is probably not achievable or realistic. Gemini did ask, again, how I will measure the baseline.

“Achieve 1,000,000,000,000 more questions answered first time from upgrading the knowledge management system” - score 85%.

Hmmmm.

A sad summary

Neither system did a great job.

Both systems were able to prompt me to think about outcomes in a way that was positive.

But neither system was able to pick up the nuances of the proposed outcomes and make strong recommendations on how to improve them.

Both did better starting with outputs than starting with benefits.

This stuff is hard and GenAI isn’t at the level where it contributes well - yet. There’s a lot of subjectivity in crafting good outcomes and it feels like a human touch is still going to get us the best result.

As an aside, I gave both my personal favourite outcome at the moment - “Put People on Mars by 2030”. ChatGPT gave this 40% and Gemini 65%. Gemini suggested that Achievability was the key component that needed work on to make the outcome better.

If you want help getting more outcomes into your projects, register for the outcome webinar.

Can GenAI Write Better Outcomes Than You or I? I Put It to the Test

The benefit

The output

A sad summary

Recent Posts

Comments