AI in Your Cart: Ordering Groceries and the Future of Digital Assistants
I’ve been watching artificial intelligence do my grocery shopping. It starts with a list, then types each item into a supermarket website’s search bar, uses its cursor to click, and places the order. It’s a mundane task, but oddly captivating to watch a digital ghost at work.
“Are you sure it’s not a person in India?” my husband asked, peering over my shoulder.
I was trying out Operator, a new AI “agent” from OpenAI, the creators of ChatGPT. It has a similar text interface and conversational tone, but instead of just answering questions, it can perform actions like navigating the web.

Hot on the heels of large language models, AI agents are being touted as the next big thing. The appeal is clear: a digital assistant that can complete practical tasks is more compelling than one that can just respond with text. Anthropic introduced “computer use” capabilities to its Claude chatbot near the end of last year, and Perplexity and Google have released “agentic” features into their AI assistants. Other companies are developing agents for specific tasks like coding or research.
What Exactly Is an AI Agent?
There’s ongoing debate over what truly qualifies as an AI agent, but the general idea is that they need some degree of autonomy. “As soon as something is starting to execute actions outside of the chat window, then it’s gone from being a chatbot to an agent,” says Margaret Mitchell, the chief ethics scientist at AI company Hugging Face.
It’s early days. Most commercially available agents are still considered experimental—OpenAI describes Operator as a “research preview.” There are plenty of examples online of amusing errors, such as spending a lot on eggs or trying to return groceries to the store. Depending on your perspective, these agents are simply the next overhyped tech toy or the dawn of an AI future that could reshape the workforce, the internet, and how we live.
“In principle, they would be amazing because they could automate a lot of the drudgery,” says Gary Marcus, a scientist who is skeptical of large language models. “But I don’t think they will work reliably any time soon, and it’s partly an investment in hype.”
Putting Operator to the Test
I signed up for Operator to see for myself. Since there was no food in the house, grocery shopping seemed like a good first task. When asked about preferred shops or brands, I directed the agent to choose whatever was cheapest. A window then appeared, showing an internet browser. Operator started searching “UK online grocery delivery.” Its mouse cursor selected the first result, Ocado, and began searching and filtering by price. It selected products and clicked “Add to trolley.”
I was impressed with Operator’s initiative. When given only a brief item description, like “salmon” or “chicken”, it made decisions without asking tons of questions. When searching for eggs, it successfully ignored non-egg items in the special offers. My shopping list requested “a few different vegetables,” and it chose a head of broccoli, then asked if I’d like anything else. I asked it to pick two more, and it went for carrots and leeks.
Emboldened, I told it to add “a sweet treat” and watched it type “sweet treat” into the search bar. I wasn’t sure why it chose 70% chocolate—certainly not the cheapest option—but when I told it I didn’t like dark chocolate, it swapped it for a Galaxy bar.
The Limits of Autonomy
We hit a snag when Operator realized Ocado had a minimum spend. Because of this, I had to add more to the list. Next came logging in. The agent requested my help here. While users can take over the browser at any time, OpenAI says Operator is designed to ask for this “when inputting sensitive information into the browser, such as login credentials or payment information.” Although Operator usually takes constant screenshots in order to “see” what it’s doing, it is designed not to do this when a user takes control.
At checkout, I tested waters by asking Operator to complete the payment. However, I took back the reins when it asked for my card details. I’d already given OpenAI my payment info (Operator requires a ChatGPT Pro account, which costs a significant monthly fee), but I was uncomfortable sharing this directly with the AI. Order placed, I waited for the delivery the next day.
After that, I gave Operator another task: ordering a cheeseburger and chips from a local, highly rated restaurant. It asked for my postcode, loaded the Deliveroo website, and searched for “cheeseburger.” Again, I had to log in, but since Deliveroo already had my card details stored, Operator could proceed directly to payment.
The restaurant it chose was local and highly rated–as a fish and chip shop. I ended up with a passable cheeseburger and a bag of chippy-style chips. Though not exactly what I’d envisioned, it wasn’t completely off. I was mortified when I realized Operator skipped tipping the delivery rider. I sheepishly added a generous tip after the fact.

It soon became clear that using an AI agent, in its present form, might defeat time-saving purposes. Instead, you can leave it to perform tasks in the background while you focus on other tabs. While drafting this piece, I made another request: could it book me a gel manicure at a local salon?
Operator struggled with this task. Though it accessed the booking platform Fresha, when it prompted me to log in, I realized that it had chosen an appointment a week too late and over an hour’s drive from my home in east London. After I pointed out the issues, it found a slot for the correct date but in Leicester Square—still a distance away. Only then did it ask for my location, and I realized it must not have retained this knowledge between tasks. I could have made my own booking by this point.
Agents in the Workplace
Despite current flaws, my experience with Operator feels like a glimpse of things to come. As these systems improve and decrease in cost, I could easily see them becoming embedded in everyday life. You might already write your shopping list on an app; why wouldn’t it also place the order?
Agents are also infiltrating workflows beyond the realm of a personal assistant. OpenAI’s chief executive, Sam Altman, has predicted that AI agents could “join the workforce” this year.

Software developers are among the early adopters. Coding platform GitHub recently added agentic capabilities to its AI Copilot tool. GitHub’s CEO, Thomas Dohmke, notes that developers already use some level of automated assistance. The main difference with AI agents is the level of autonomy. He says, “Instead of you just asking a question and it gives you an answer, you give it a problem and then it iterates with the code that it has access to.”
GitHub is already working on an agent with greater autonomy, called Project Padawan (a Star Wars term for a Jedi apprentice). This will allow an AI agent to work asynchronously, without constant oversight. A developer can have teams of agents reporting to them, producing code for their review. Dohmke says he doesn’t believe developers’ jobs are at risk, as their skills will find increasing demand. “I’d argue the amount of work that AI has added to most developers’ backlog is higher than the amount of work it has taken over,” he says. AI agents could also make coding tasks, such as building an app, more accessible to non-technical people.
Outside software development, Dohmke envisions a future when everyone has a personal Jarvis, the talking AI in Iron Man. Your agent will learn your habits and become customized to your tastes, making it more useful. He’d use one to book holidays for his family.
However, the more autonomy agents have, the greater the risks they pose.

Mitchell, from Hugging Face, co-authored a paper warning against the development of fully autonomous agents. “Fully autonomous means that human control has been fully ceded,” she explains. Without set boundaries, a fully autonomous agent could gain information you don’t realize or behave in unexpected ways—especially if it can write its own code. It might not be a big deal if an AI agent gets your takeout order wrong, but what if it starts sharing your personal information with scam sites or posting horrific social media content under your name? High-risk workplaces could introduce particularly hazardous scenarios: imagine if it gained access to a missile command system.
Mitchell hopes technologists, legislators, and policymakers will incentivize guardrails to mitigate such incidents. For now, she foresees agentic capabilities becoming more refined for specific tasks. Soon, she says, we’ll see agents interacting with agents—your agent could work with mine to set up a meeting, for example.
This proliferation of agents could reshape the internet. Currently, a lot of information online is specialized for human language, but if AIs increasingly interact with websites, this could change. “We’re going to see more and more information available through the internet that is not directly human language, but is the information that would be necessary for an agent to be able to act on it,” Mitchell says.
Dohmke echoes this idea. He believes the concept of the homepage will lose importance, and interfaces will be designed with AI agents in mind. Brands might start competing for AI attention over human eyeballs.
One day, agents may even escape the confines of the computer. We could see AI agents embodied in robots, which would open up a world of physical tasks for them to help with. “My prediction is that we’re going to see agents that can do our laundry for us and do our dishes and make us breakfast,” says Mitchell. “Just don’t give them access to weapons.”