An interactive guide to understanding, using, and thinking critically about Generative AI. Based on training materials by Pavol Chvala, Director, AI Innovation & Systems Development.
12 hrs
Today's agentic task horizon (50% success) — Claude Opus 4.6
4–7mo
Doubling time for agentic task capability (METR, 2026)
9–900×
Annual reduction in LLM inference cost per token
Module 1–2
How It Works
AI hierarchy, training, inference, model growth and cost trends.
Module 3
Capabilities
Agentic benchmarks, industry comparisons, and what's coming next.
Module 4
Use Cases
Create, Augment, and Collaborate — with live prompting examples.
Module 5
Prompting Techniques
Zero-shot, few-shot, role prompting, step-back, reusable prompts and more.
Module 6
Tools & Selection
Copilot, ChatGPT, Gemini, Claude, Ollama — choosing the right one.
Module 7
Adoption Journey
Learn → Iterate → Standardize. The path to an AI-enabled team.
Module 8
Agents & Automation
How agents work, real examples, risk areas and responsible principles.
Module 9
Challenges
Data awareness, Policy 46, bias, hallucinations, sustainability.
Key insight: Sharing use-cases amongst colleagues is the fastest way to gain efficiencies using GenAI tools. The technology is accelerating — nobody is an expert yet, and that's okay.
Module 1–2
How It Works
Understanding the AI landscape — from foundational concepts to how models generate content.
The AI Hierarchy
Generative AI sits inside a nested set of technologies. It's a type of AI capable of generating original content — commonly known as Gen AI, LLM, GPT, or just "the model."
Artificial Intelligence (AI) · Narrow AI · AGI
Machine Learning
Deep Learning
Generative AI · Diffusion · GANs
Large Language Models (LLMs) · BERT · GPT series
ChatGPT · Claude · Gemini · Copilot
How Does It Generate Content?
01
Training
Pattern Recognition
Machine learning identifies patterns in vast data (text, images, video, music). The result is a model.
"The cat sat on the ___" → "mat" "The cat ___ on the mat" → "sat" …do this trillions of times…
02
Preference Tuning
Alignment
The raw model is tweaked so that its outputs align with human preferences — making it helpful, harmless, and honest rather than just statistically likely.
03
Generation
Inference
"Hey model, I have the following data — what comes next?"
Every response is the model predicting the most probable continuation of your input.
Model Growth & Cost Trends
4.5×
Annual growth in training compute during the Deep Learning era (Epoch AI)
9–900×
Annual reduction in LLM inference prices, depending on task type
40×
Per-year cost reduction for mid-range capability models
2010
Start of the Deep Learning era — compute growth accelerated dramatically
Bottom line: Models are getting dramatically more capable while simultaneously getting far cheaper to use. This combination is what makes now such a pivotal moment for AI adoption.
Module 3
AI Capabilities
How capable are today's AI systems — and how fast are things moving?
Agentic Task Horizon (50% Success)
How long a software task can an LLM complete with 50% success? This measure has been doubling every 4–7 months.
12 hrs
Today — Claude Opus 4.6 (METR, Mar 2026)
2026
Projected: 2-hour tasks routinely
2027
Projected: 1 full work day of autonomous tasks
2029
Projected: 1 full work month of autonomous tasks
4–7 month doubling time. Today's best model (Claude Opus 4.6) can complete a 12-hour software task with 50% reliability. At 80% reliability the horizon is currently ~1 hour 10 minutes. Capabilities are growing exponentially.
Agentic Task Horizon — Timeline
2020 — GPT-2 / GPT-3
Task horizon: ~4 seconds. Could answer simple questions reliably.
2022 — GPT-3.5
Task horizon: ~36 seconds. Could count words, simple lookups.
2023 — GPT-4
Task horizon: ~6 minutes. Could fix bugs in small Python libraries.
2024 — o1-preview / GPT-4o
Task horizon: ~1 hour. Could implement a simple web server, find facts on the web.
2025 — o3 / GPT-5 / Claude Opus 4.5
Task horizon: ~5 hours. Could train adversarially robust image models.
Today (Mar 2026) — Claude Opus 4.6 / GPT-5.2
Task horizon: 12 hours. Can implement complex protocols from multiple RFCs.
AI Agent vs. Industry Professional
OpenAI's GDP-Val benchmark measures how often AI agents outperform human industry professionals on work tasks. The dashed line represents parity (50%).
GPT-5.4
83%
83%
Claude Opus 4.6
78%
78%
GPT-5.2 (wins+ties)
71%
71%
Claude Opus 4.5
60%
60%
Gemini 3 Pro
54%
54%
Claude Sonnet 4.5
50%
50%
GPT-4o
12%
12%
■ Wins only □ Wins + TiesSource: evals.openai.com/gdpval · Dec 2025
What Else Can It Do?
March 2025
Google Gemini Robotics
Physical robots powered by multimodal AI — sorting items, following verbal instructions, performing delicate manipulation tasks in unstructured environments.
May 2025
Google Veo 3
AI video generation with cinematic quality — photorealistic scenes, consistent characters, and natural motion from a text description alone. deepmind.google/models/veo
January 2026
Google Project Genie
Interactive world generation — AI creates playable, interactive environments from images or descriptions. Games and simulations without a developer. deepmind.google/models/genie
Application of AI today: Knowledgebase assistants · Complex reasoners · Deep research · Agents acting on your behalf · Medical imaging · Credit risk assessment · Legal research automation · Adaptive learning platforms · Student support agents · Personalized tutors · Course design assistants
Module 4
Use Cases
The hardest part of AI adoption is identifying the right use cases. There are three core patterns.
Pattern 1
Create ✨
Compose net-new content from scratch. Tell me a joke · draft an email · explain how AI works · generate an image · write a song.
Pattern 2
Augment 🔧
Improve or ground existing content using your own sources. Summarize a transcript · check grammar/tone · highlight anomalies in data · review an email.
Pattern 3
Collaborate 🤝
Co-think to brainstorm, compare options, and decide. Draft a business case · analyze sales data · write executive summaries · explore trade-offs.
Prompting Example: Welcome Email
See how the quality of a prompt transforms the output — starting with too little context, then with more specifics, then using augmentation.
Your Prompt
Write me an email welcoming a new hire
AI Output — Generic & Too Long
Subject: Welcome to Our Team!
Dear [New Hire's Name], I hope this email finds you well. On behalf of the entire team, I want to extend a warm welcome to you as our newest team member…
[Full of placeholder text, generic advice, far too verbose. Not useful without significant editing.]
Your Prompt (Better)
I'm the hiring manager for a software developer position and I've selected a candidate whom I've already interviewed. Please write me a concise welcome email for Cathy, this new hire. The start date is June 1st and she will be working in EC2.
AI Output — Better But Still Generic
Subject: Welcome to Our Team, Cathy!
Dear Cathy, I am thrilled to welcome you to our team as a software developer! Your start date is June 1st, and you'll be working with our EC2 team.
[Still uses some generic filler paragraphs about company culture. The format is imposed by the AI, not you.]
Your Prompt (Best — Augment your own draft)
Hi Cathy,
Welcome to the team! We are very excited to have you. As we discussed in the interview, the start date is June 1st and the team works out of EC2. Work hours are 8:30am – 4:30pm. We will do a full onboarding and orientation on your first day so you don't need to worry about preparing anything ahead of time.
Thank you and looking forward to seeing you on June 1st!
---- Here is a welcome email for a new hire. Can you please check it for grammar and tone?
AI Output — ✅ Polished, Your Voice, Right Length
Hi Cathy,
Welcome aboard! We're thrilled to have you join our team. As we discussed during the interview process, your start date is June 1st, and our team operates out of EC2. Your work hours will be from 8:30 am to 4:30 pm. On your first day, we'll provide a comprehensive onboarding and orientation, so there's no need to prepare anything in advance.
Feel free to reach out if you have any questions or need assistance before your start date. Looking forward to meeting you in person on June 1st!
Best regards,
The lesson: The V3 approach starts with Create (you write the draft) and ends with Augment (AI polishes it). GenAI tools do best when provided with examples and boundaries — and your voice is preserved. This is also why sharing use-cases with colleagues accelerates learning for your entire team.
⚠️ A Warning About Accuracy
Asked to generate a chronological list of influential GenAI tools, an AI confidently produced dates — some wrong, some out of order. It listed Claude 3.5 Sonnet (June 2024) after Claude 2 (Jul 2023), but placed Claude 3 (Mar 2024) after Claude 3.5 Sonnet. Models can confabulate plausible-sounding but incorrect information.
Always verify AI outputs — especially dates, statistics, names, and citations.
Module 5
Prompting Techniques
Getting better results from AI tools is a skill. Here are six techniques to use immediately.
Vague prompts get vague answers. Use positive instructions rather than negatives. Specify format, length, and audience.
❌ Bad
Tell me about climate change.
Write something about AI.
Don't make it too long or too technical. Don't use jargon.
✅ Good
Summarize the main causes and effects of climate change on coastal cities with recent scientific data.
Write a 150-word blog post explaining how generative AI is being used in university classrooms in 2025.
Create a 10-slide presentation for a general audience on AI basics. Keep language simple, use visuals, and aim for a story-driven flow.
Zero-Shot
No examples
Quick and easy. High reliance on the model's own knowledge. Best for straightforward tasks.
One-Shot
One example
A bit more effort, but useful for bounding the response type and format. Guides the model with a single reference.
Few-Shot
3–5 examples
Most powerful for complex tasks. Bounds context, format, patterns. Requires quality examples. Best for structured outputs.
Tell the model who it is and what situation it's in. The more context, the more tailored the response.
Role prompting
"You are a helpful writing assistant…" | "You are a senior software developer…" | "You are a travel agent…"
Context prompting
Context: You are building a website for the University of Waterloo which will promote an event for international students arriving in September.
Suggest some design ideas for the landing page layout, keeping accessibility and mobile-first design in mind.
Get the model to answer a general question about your ask first, then feed that context into your specific question.
Step 1 — Direct attempt (often not optimal)
"You are an HR assistant. Give me a job description for a senior web developer based on the following format {attached document}"
Step 2 — Step back to the general question
"What are some of the most important skills that a senior web developer should possess, both technical and non-technical?"
Step 3 — Use the general answer as context for your specific ask
"Context: {response from Step 2, maybe modified} Give me a job description for a senior web developer based on the following format {attached document}"
If you reuse a prompt often with slight changes, define a variables section at the top.
Template
VARIABLES
{project} = "student information system upgrade"
{stakeholder} = "faculty administrators"
PROMPT
You are a senior project manager.
Provide a status update for the {project} tailored to {stakeholder}.
Highlight current progress, key risks, and upcoming milestones for the {project}.
Then recommend communication strategies to keep {stakeholder} engaged and informed throughout the remainder of the {project}.
Prompt Generation
Ask AI to write your prompts
"I need to generate a new job description for a senior web developer. Give me 10 different prompts I can use with an LLM."
"Give me the strengths and drawbacks of these prompts."
Code
You don't need to be a developer
"I have an Excel table with columns: name, count, price. How do I highlight anomalous values?"
"I copy a file from a network share every morning. Help me write code to automate this. Now walk me through what I do with it."
Key principle: GenAI tools do well when provided with examples and boundaries (dos and don'ts). Don't just ask — show the model what success looks like.
Module 6
Tools & Selection
Choosing the right tool matters. Understanding tool types and service models will help you work smarter.
Tool Types
Assistant
Simple Q&A — text, images, music, video, code. Answers right away without extended thinking.
Reasoner
Complex multi-step thinking. "Think deeper" mode. Deep research. Slower but more accurate for hard problems.
Agent
Any combo of above + ability to act using tools. Can browse the web, run code, read files, send emails.
AI-Enabled Apps
Browsers, media generators, specialized productivity tools with AI built in to an existing workflow.
Service Types
Free SaaS
Good to get started. Typically restricted in message limits, model access, or features. Good for exploration.
Paid SaaS
More advanced features and/or higher capacities. Access to latest models and reasoning modes.
Open Source
Free with full control over infrastructure and software. High complexity — requires technical expertise to deploy and maintain.
The Main Tools
🪟
Microsoft Copilot
✓ UW Approved
office.com/chat · Enterprise data protection applies. University-approved with restricted data support. Auto, Quick response, and Think deeper modes.
AssistantReasonerAgentPublic ✓Restricted ✓
💬
OpenAI ChatGPT
Public Only
chatgpt.com · Current model: GPT-5.2. Modes: Instant (quick), Thinking (complex), Pro (research-grade). Limits reached on free tier.
AssistantReasonerAgent
✨
Google Gemini
Public Only
gemini.google.com · Current model: Gemini 3. Fast, Thinking (complex problems), and Pro (advanced math/code) modes. Multimodal: image and music creation.
AssistantReasonerAI-Enabled Apps
🤖
Anthropic Claude
Public Only
claude.ai · Models: Haiku 4.5 (fastest), Sonnet 4.6 (everyday tasks), Opus 4.6 (most capable). Extended thinking mode available. Currently #1 on agentic benchmarks.
AssistantReasonerAgent
🦙
Ollama (Open Source)
Self-hosted
ollama.com · Run open-source models locally: Qwen3, LLaMA 3.1, Phi4, and many more. Full control — your data never leaves your machine. High technical complexity to set up.
AssistantReasonerOpen Source
UW Policy 46 — Data Classification Reminder: Public information → any AI system. Confidential/Restricted → University-approved systems only (Copilot). Highly-Restricted (SIN, PHI, credit card) → no AI systems. When in doubt, use Copilot.
Real example: Curricular overlap analysis using ChatGPT Agent. Given two program curricula, the agent browsed the university's course catalog, compared learning outcomes across courses, identified overlaps, and produced a structured report — all autonomously from a single prompt.
→ View the ChatGPT Agent conversation
Risk Areas — Click to reveal mitigations
⚠️ Misaligned Goals
Agent optimizes the wrong objective or interprets intent incorrectly.
✅ Mitigation: Define clear objectives, constraints, and success criteria before deployment. Be specific about what the agent should not do.
⚠️ Uncontrolled Actions
Agent takes harmful or unauthorized actions without human approval.
✅ Mitigation: Limit permissions and require explicit human approval for high-impact actions. Use sandboxed environments for testing.
⚠️ Data Privacy & Security
Agent accesses or exposes sensitive information it shouldn't.
✅ Mitigation: Apply least-privilege data access principles. Continuous monitoring controls. Never feed restricted data to non-approved systems.
⚠️ Lack of Accountability
Unclear ownership when the agent causes harm or makes errors.
✅ Mitigation: Assign a named business owner, technical owner, and risk owner for every agent in production. Document decisions.
⚠️ Over-Trust & Reduced Oversight
Overreliance on the agent without sufficient human review or judgment.
✅ Mitigation: Design meaningful human review checkpoints. Train users on model limitations. Spot-check outputs regularly even when the system seems reliable.
Module 9
Challenges & Data Awareness
AI tools are exciting — but they come with real responsibilities. Understanding these challenges helps you use AI critically and safely.
🎯
Data & Accuracy
Models can hallucinate — generating confident-sounding but incorrect information. Training data reflects historical biases. Always verify outputs, especially dates, statistics, and citations.
It's often impossible to explain why a model produced a specific output. Training data may include copyrighted material. AI-generated content raises authorship questions.
AccountabilityValue Alignment
🌳
Resources & Sustainability
Training and running large models consumes significant energy and water. Cost of inference is falling, but aggregate demand is rising. Sustainability is a growing concern.
CostEnergy
UW Policy 46 — Information Classification
An AI-enabled tool is just another information system owned by a vendor. The same data rules apply.
🟢 Public
Use in ANY AI system ✓
University publications
Websites without authentication
Social media channels
University Calendar
Published RFPs
Salaries under Ontario PSSDA
🟡 Restricted / Confidential
University-approved systems only (Copilot)
Personal information (FIPPA/PIPEDA)
WatIAM username
Student/Employee Number
WatCard number
IP Address
Data that can identify a person when combined
🔴 Highly Restricted
NO AI systems — ever ✗
Social Insurance Numbers (SIN)
Protected Health Information (PHI)
Credit card / financial data
Passwords & secrets
Legal privilege information
Remember: An AI-enabled tool is just another information system owned by a vendor. When uncertain about data classification, default to Copilot for work tasks — it provides enterprise data protection.
Resources & Next Steps
For Work — Use Microsoft Copilot
Your university-approved, data-protected tool. Available at office.com/chat.