July 18, 2025 - AI Engineer Agent¶

🎯 Daily Goals¶

Review Anki Deck
Lumosity training
Started AI Engineering Agent Chapter

📝 What I learned:¶

Shipping Fast ¶

Key Takeaways:¶

Production is the only environment where your code, infrastructure, and customer come together to represent the objective reality. Only after that you can realize the problem. Shipping fast is very helpful for high performance team to iterate and learn fast.
Ship instrumentation and first. i.e. ship the code that observes and monitors other code first before you make changes. Thus you can be aware of the specific affect your code has over the entire system.
Be available after shipping. Get ready for on-call after shipping.
Use feature flag to support rollout and rollback. Basically it is using a boolean or config to keep track of what features is enabled. This is actaully quite powerful since it can enable gradual rollout, targeted rollout, A/B testing and act like a kill switch, even load balancing.
Document and share the plan and actions which means you need to create a documentation detailing how you are gonna rollout and rollback if the feature does not work out.

AI Engineer Book¶

Definition of Agent¶

Agent is anything that can perceive its environment and act based upon that environment. So agent is characterized by the environment and action it can perform.

Relationship between environment and action¶

The environment determines what tools a user agent can potentially use. On the other hand, the tool set that the user agent posses restricts the environment the agent can operate in.

Why agent workflow requires more powerful models?

Agent workflows usually involves multiple steps of reasoning and execution. The error for each step would compound causing a much higher error rate for the entire workflow. Additionally, with toolset AI would have more capabilities and thus its error would have much more impact.

What is the difference between AI agents and agentic AI?

The core differences between agentic AI and AI agents lies in their autonomy and decision making capabilities. AI agents usually has predefined rules and fixed routine to solve problems. It requires human input as well.

AI Agent Tools¶

Knowledge Augumentation

Examples: text, image retriever, SQL executor, web browsing

Capability Extension

Examples: calculator, code compiler, OCR, calendar, timezon converter, unit converter and translator.

Sepcial extention

If a model is text only, adding a text-to-image model as its tool can easily make it multimodal. In fact that is how ChatGPT generates pictures. It uses DALL-E internally to image generation queries.

Write Ability

Examples: Email APIs, database access, SQL executor, banking API.

As AI systems become more capable, ensuring their safety becomes increasingly important. It's essential to implement guardrails to ensure that AI agents remain both powerful and responsible.

Planning¶

The core of agent workflow is planning. Agent workflow are usually time consuming and costly. Executing the agent’s plan prematurely risks wasting time and money without delivering results. Hence, there should be a dedicated planner that takes in user queries and figure out what the user want to do and generate a plan. Then either a human judge or an AI judge could evaluate the plan and decide whether or not to proceed the costly operation. Actually Copilot agent mode has exactly the same behavior. It shows all the intended changes and asks you to validate them. You also need to decouple planning from execution for the same reason.

Intent classifier

To better help with planning, the AI agent needs to know the intent of the user query. More often than not, there would be an independent intent classifier trained to decide what the user intent is.

Gneral Process of Planning

Plan generation: come up with a detailed plan where each step is manageable.
Reflection and error correction: evalute the generated plan and fix the error if exists
Exeuction: take the action outlined in the plan.
Reflection and error correction: this time we reflection on the result of the executed plan and decide if the execution has reached our goal. If not we need to correct the error and generate a new plan.

Arguments about whether FM know planning at all

There has been ongoing debate over whether autoregressive models are truly capable of planning. Critics argue that planning fundamentally involves searching for an optimal path to a goal by evaluating and comparing different options—often requiring backtracking. Backtracking allows the system to recognize when the current path is suboptimal and to revise earlier decisions accordingly. In contrast, an autoregressive model generates output by predicting the next token based solely on previous ones, without the ability to revise earlier choices, which seems incompatible with true planning.

In response, some counterarguments suggest that autoregressive models can exhibit a form of planning by regenerating the entire sequence with different decisions if the initial plan is deemed ineffective. While this does not involve traditional backtracking, it can still lead to improved outcomes through iterative refinement.

🚀 Resources that Requires Further Study:¶

Solo developer workflow

Some useful tips:

design the UI first (use AI to speed up prototyping). For UI ideas look for competitors.
Build application that is market proven. (Already exists in the market).
Summarize competitors' pros and cons and create your own style based on your understanding.
Use MCP server to expand AI capability.
Prompt engineering: "Do not make any changes until you have 95% confidence that you know what to build ask me follow up questions until you have that confidence. Add this as cursor rule.
Be careful about creating new chat since you might loose context.
Kiro: Amazon's new AI IDE focuses on specification-driven development. It addresses a key pain point: when using prompts, it's difficult to understand the assumptions the AI makes during system design, making it hard to interpret or document the output. Kiro solves this by using specifications. Instead of relying on prompts, it guides the AI with clear, structured requirements to ensure accurate implementation. After each prompt, Kiro would first generate a spec in the .kiro folder. It would contain system design and a detailed round map of implementation. You can validate and modify it before Kiro actually modify the code. It also features hook, which is a event driven background agent. For example, you can create a hook so that when a new component is created it would generate a test file for that component. And whenever a file gets deleted, you can also create a hook to let kiro run existing tests to see if things broke.
Ken Wilder. Study his AQAL model sometime in the future.