12 Jul 2025 2 min read

Structured Summary: Future of Work with AI Agents

I used the version 2 of my custom GPT/Gemini Gem to analyze the paper https://arxiv.org/pdf/2506.06576. This version (paper analyst 2) identifies key questions from the paper around specific themes, rather than starting with the questions and looking for the answers within the paper.

ChatGPT provided more concise answers. But I liked the GEM version better. I also liked the interface of Gem that allowed you to click on the source link which would open the specific section of the pdf from which it pulled the answers.

Both of them did a decent job of explaining Figure 5. So till this point, it was advantage Gem.

Then I asked both to explain Table 1 in the appendix (of regression coefficients from the model predicting automation desire ratings). Gem's output was way off target: For example, it gave the value of the intercept to be 0.081 (instead of 2.736 as in the paper). I asked the Gem to recheck, it claimed to have detected a mistake, but reproduced the same wrong value.

On the other hand, I could not find any glaring mistake in the explanation of the table produced by ChatGPT. I suppose ChatGPT won this round overall (comparing between GPT-4o and Gemini 2.5 Flash)?

Instead of giving up on Gemini at this point, I decided to try the same prompt and the same Gem with 2.5 Pro. I get a limited number of free credits every day to use, and this enough to process this paper. This model did a very good job – clear explanation, and no mistake in the numbers that I could detect.

So perhaps 2.5 Flash is just not good enough for this type of analysis containing numbers, or it's a subtle hint from Google pushing their free users to upgrade. I don't know.