Skip to main content

Stress Testing Microsoft Copilot vs Claude vs ChatGPT

AI Bakoff with MS CoPilot

This weekend, I did a real world Microsoft Copilot vs Claude vs ChatGPT bakeoff while wrapping up a lead magnet calculator. In preparation for a Microsoft call to discuss an AI Copilot rollout, I wanted some hands on experience.

The Bakeoff Workflow

  1. Take a detailed calculator requirements doc (AI generated from source code).
  2. Recreate a simplified version in Excel via prompt.
  3. Document the structure.
  4. Translate the workflow into an executive ready PowerPoint story.
  5. Use the output as preparation for a Copilot rollout conversation.

This would be a day of work for multiple people. The project was complete in less than an hour.


Phase One: Translating the App into Excel

The spreadsheet needed:

  • Clear input structure supplied by a 400 line markdown file.
  • Clean calculation logic
  • Organized output summary
  • Executive ready formatting for review and sign off
Microsoft CopilotClaudeChatGPT
Copilot fragmented the logic across multiple tabs. Inputs and outputs were not logically grouped. Structural coherence was inconsistent. If an AI tool creates cleanup work, the productivity gain erodes immediately.Claude generated a tight, single page spreadsheet. Inputs were grouped cleanly. Calculations were centralized. Outputs were summarized clearly. It felt intentional and the result was the best of the group.ChatGPT produced a multi tab structure with clear separation between inputs, logic, and results. It was operationally sound and logically organized.
It required slightly more navigation than Claude’s single page approach, but the structure held.
Microsoft CoPolot ExcelOpen AI Excel

Phase Two: Explaining the Build in PowerPoint

I have never been a fan of PowerPoint. It is a corporate time and knowledge sinkhole. My hope is one day data / knowledge management tools paired with LLMs will force PowerPoint to evolve or go away.

PowerPoint exists as a corporate knowledge artifact that memorializes a point in time. In concept that would be a great thing if the real story and context wasn’t lost in meetings and presentations where PowerPoints are delivered. Microsoft has all of the pieces to the puzzle, so I am blown away they haven’t put it all together.

Clearly this stress test wasn’t going to be transformational to my way of working… At minimum, I wanted to produce a single slide that would explain my app design workflow, to highlight how I was using AI.

  • How the idea evolved
  • How AI accelerated development
  • Where structure improved
  • Where friction was eliminated
Microsoft CopilotClaudeChatGPT
Copilot generated an image instead of an editable diagram. My issue is when shapes cannot be modified, it becomes static decoration. Even the text was encoded as text which is annoying.Claude produced a comprehensive diagram with strong narrative flow. It mapped the journey clearly and felt cohesive. Text was editable.ChatGPT generated a simpler diagram, fully editable in PowerPoint. Less polished, more modular.
Microsoft CoPilot PowerPointClaude PowerPointOpenAI PowerPoint

My findings with Microsoft Copilot so far..

Copilot’s core advantage is integration within Microsoft 365. Outlook was not part of this evaluation but I am praying when I get to the proof of value, it is the star of the show. The Excel and PowerPoint experience was underwhelming for creation. However, I did use co-pilot to evaluate and edit my Claude produced Excel. It did a great job with that task.

Adoption fails when cognitive load remains unchanged. Frustration happens when more time and cognitive load are required than the previous solution.. Without a major payoff in the form of pain reduction or value creation, its tough to recover.

Bottom line: Claude felt magical, and Copilot felt like something I experienced 18 months ago in ChatGPT. Enterprise platform alignment with Office and Azure, security, and wider distribution are real value drivers. That feeling of being behind could make this an acceptable solution.

My Strategic Criteria for Evaluating Copilot

Productivity and Communications Compression Across Microsoft 365

Copilot’s primary strategic function is to compress knowledge work inside the Microsoft ecosystem. Copilot is not designed to replace core application functions; it is designed to accelerate them.

When it comes to communication (Email and Teams), my hope is Copilot will clearly increase the velocity of information consumption and delivery. If not, upcoming proof of value exercise could be short lived.

My primary objectives as I evaluate CoPilot…:

  • Streamline Email search (Gemini in GMail has been a game changer).
  • Speed to response via email
  • Shorten drafting cycles.
  • Consolidation of meeting summary tools into 1 repository.
  • Speed spreadsheet modeling
  • Automate presentation generation

Provide a secure, standardized AI layer across the organization

Security is a major concern for every operator and executive when it comes to these AI models. Copilot provides at least one controlled AI entry point with potential access to confidential data.

My Biggest Concerns as I Continue exploring

  • Training focused on value creation – Understanding the span of capabilities is important but connecting business challenges to tech is where we will create value.
  • Clear use case alignment- The gap between expectations of possibilities, and real feature availability is a concern I want to remove early.
  • Adoption Management – If users do not adopt, it is a failure. If Co-Pilot fails, we are going to do it fast and move on to the next alternative.

Without high value use cases, adoption, and education AI becomes just another data tool that blames bad data or process rather than becoming an enabler that reduces operational drag.


Final Take

AI productivity is not about who generates prettier demos. Real AI success requires distribution of knowledge and experience across a team. Data alignment and influence are about getting a group of people rowing at the same speed and in the same direction. AI is the same data activation and knowledge delivery exercise as analytics, so I feel well equipped to take it on!

There are many other bright spots for Microsoft and AI including the work I have done in Azure and recently with Power BI MCP Server at BIChart.

In this test, Claude shined the brightest. I am still excited to do a proper CoPilot proof of value and see how it goes!

Adventures with Snowflake MCP and Semantic Views

Snowflake MCP and Claude

Last month, I had an opportunity to roll up my sleeves and start building analytics with Snowflake MCP and Snowflake Semantic Views. I wanted to see how far I could push real-world analyst and quality assurance scenarios with Tableau MCP and DataTools Pro MCP integration. The results gave me a glimpse of the future of AI/BI with real, production data. My objective was to deliver a correct, viable analysis that otherwise would have been delivered via Tableau.

The time spent on modeling my data, providing crystal clear semantics, and using data with 0 ambiguity helps. My results delivered great results, but I ended the lab with serious concerns over governance, trust, and quality assurance layers. This article highlights my findings and links to step-by-step tutorials.

Snowflake MCP and Claude

Connecting Claude, Snowflake MCP, and Semantic Views

The first step to connect all of the components was building my Snowflake Semantic views. Snowflake MCP gave me the framework to orchestrate queries and interactions, and using Snowflake Semantic Views gave me the lens to apply meaning. All of my work and experimentation occurred in Claude. This gave me the AI horsepower to analyze and summarize insights. To connect Snowflake to Claude, I used the official Snowflake MCP Server, which is installed on my desktop and configured in Claude.

Together, these tools created a working environment where I could ask questions, validate results, and build confidence in the answers I got back.


Creating Snowflake Semantic Views

With my Snowflake Semantic View setup, I spent some time researching and reading other folks’ experiences on semantic views. I highly recommend having a validated and tested Semantic view before embarking on AI labs. If you don’t know what metadata to enter into your Semantic View, seek additional advice from subject matter experts. AI can fill in blanks, but it shouldn’t be trusted to invent meaning without human oversight: Why AI-Generated Meta-Data in Snowflake Semantic Views Can Be Dangerous

Bottom line… Begin with a simple and concise Snowflake semantic model. Build clearly defined dimensions and measures. Use real-world aliases and refrain from using AI to fill in the blanks, unless your objective. Layer on complexity once you’re comfortable with the results.


What Worked Well

  • Control over data access
    Thankfully, the Snowflake MCP is limited to semantic views and Cortex search. The opportunity and value of Cortex search cannot be understated. I will cover that in another post. The idea of unleashing an AI agent with elevated permissions to write SQL on your entire data warehouse is a governance nightmare. Semantic Views gave me the ability to scope exactly what Claude could see and query.
  • Accuracy of results
    The top questions I get during AI labs: “Is this information correct?” I had a validated Tableau dashboard on my other monitor to validate the correctness of every answer.
  • Simple to complex questioning
    My recommendation with any LLM-powered tool is to start with high-level aggregate questions. Use these to build a shared understanding and confidence. Then, grounded on validated facts, you can drill down into more detailed questions with confidence. This approach kept me in control when the analysis moved beyond existing knowledge and available analysis.

Where I Got Stuck

Three challenges slowed me down:

  1. Metadata gaps – When the semantic layer lacked clarity, Claude produced ambiguous answers. It isn’t garbage in, garbage out problem…. It is me having a level of subject matter expertise that was not captured in my semantic layer or in a feedback loop to make the AI system smarter. LLM analysts feel less magical when you know the answers. That is where adding Tableau MCP allowed a pseudo peer review to occur.
  2. Over-scoping – When I got greedy and exposed too many columns, ambiguity crept in. AI responses became less focused and harder to trust. Narrower scope = better accuracy.
  3. Context Limits– I had Claude do a deep analysis dive. I also had it code a custom funnel dashboard that perfectly rendered a visual funnel with correct data. At some point, Claude explained that my context limit had been reached. My analysis hit a brick wall, and I had to start over. Claude is a general-purpose AI chatbot, but it was still disappointing to hit a stride and have to stop working.

Risks You Should Know

If you’re using AI to build your semantic layer, you need to be aware of the risks:

  • AI-generated semantics can distort meaning. It’s tempting to let an LLM fill in definitions, but without context, you’re embedding bad assumptions directly into your semantic layer: Why AI-Generated Meta-Data in Snowflake Semantic Views Can Be Dangerous
  • Do not give LLMs PII or Sensitive PII. As a rule of thumb, I do not add PII or sensitive PII into semantic models. I hope that at some point we can employ Snowflake aggregation rules or masking rules.
  • Governance blind spots. Connecting the Snowflake MCP requires access from your desktop. For governance, we use a personal access token for that specific Snowflake user’s account. That ensures all requests are auditable. Beyond a single user on a desktop, it’s unclear how to safely scale the MCP.
  • False confidence. Good syntax doesn’t equal good semantics. Always validate the answers against known results before you scale usage.

Final Take

Snowflake MCP and Semantic Views are still very much experimental features. They provide a glimpse of what will be possible when the barrier and access to governed, semantically correct data are removed.

In my case, I employed DataTools Pro for deeper metric glossary semantics and a writeback step via Zapier to capture learnings, re-directions, and insights for auditing purposes. If you would like assistance setting up a lab for testing, feel free to contact us to set up a complimentary session