I’ve been pretty good at warding off the AI-scaries. But this week, I got pretty scared. Here’s how that happened.
Internal tools as the testbed
For some context, I’ve been playing around with internal tool builders lately — think Retool, or Salesforce’s ecosystem. The major downfall of these tools has been that the juice isn’t always worth the squeeze. Said less colloquially: they’re too hard to use relative to the power they give you to accomplish your task. For example, if a tool is easy to customize – like Excel – it ends up sacrificing some amount of power for that customizability. But Excel is just SO easy to use that it ends up being incredibly useful. Not all tools manage to make that tradeoff correctly. In between Excel (very customizable, powerful enough for that ease), and writing code (hard to do but very powerful!), you have a large swath of tools. When I use a tool like Greenhouse, for example, it’s much more powerful than the ATS I might make in a spreadsheet, but I’m forced to use Greenhouse’s data model that requires that each candidate can only be in the active state for a single job. It can be very frustrating to find a tool that is only 80% right — you will chafe at the 20% difference.
However, AI might improve this situation. AI might lessen the degree of the tradeoff between “power” and “customizability” by making it easier to “write code” (the best way to customize with power!)
Here’s a poor illustration of that concept:
Tools like Retool or Superblocks attempt to give less technical people more power for easy customization, but end up partially reneging on that promise. These tools put you into a very tortured interface that occasionally feels more frustrating than just writing code yourself. It’s not as approachable for the layperson, but feels limiting for developers.
To test all of these tools — from Coda to Salesforce Lightning to Retool, my friend Karan and I implemented Stripe’s internal HR app in each of these platforms. This app is the tool to manage the performance review, calibration, and compensation process at Stripe.
It was a very frustrating experience. Notion had the benefit of being easy to use, but I would never be able to implement the permissions and integrations I needed to make a true performance tool. I could imagine how someone might innocently fat finger the entire list of employees and delete it without my knowledge. With Coda, I couldn’t make heads nor tails of the Coda user experience until I found out that Shishir (CEO of Coda and PM celebrity) had made a template of Google’s perf process in Coda. It’s similar enough to Stripe’s perf process1 that I could simply “remix” what he had made. Even then, the tool lacked the power I might want from a perf tool. For example, I couldn’t have automatic reminders as reviews were due. The 9 box calibration process involves charting employees along the axis of “current performance” mapping to their performance rating, and then charting their potential. With Coda, it wasn’t obvious how I could write a little script to map each performance rating to a number (1-3) and have them auto-charted on the grid. It took a while to figure this out! And of course, I was still worried about data and permissions problems.
Retool was hard to use despite the fact that Karan’s a developer. We had to resort to ChatGPT to figure out how to make the “calibration 9 box” in Retool. We had to drag the components around to build the 9 box, which was extremely annoying (it took 10 minutes to get the 9 box drawn.)
DJs, not musicians
Ultimately, all of these tools are hard to use because people want to be DJs, not musicians. When using an open ended tool builder, users are often at a loss for what to do. Most people want to have some notion of a domain, a set of "default assumptions," and an opinionated data model as a starting point for customizable tools. We are better at "remixing" than whole-cloth building. For example, it’s easier to tweak the Salesforce data model and its apps to fit your needs than build something new from first principles.
While templates are great — all hail Coda’s templates! — they are insufficiently opinionated and discoverable to fully solve this problem. In the Coda example, it would have been great to have the performance review template auto-suggested to me via some product UI that also gave me the chance to describe what I wanted to do. In this case, the template happened to be eerily close to what I needed, but it’s hard to imagine that every user will know how to find it and use it before getting frustrated.
Using Claude
I heard about the launch of Claude Artifacts, and thought it was worth a try — after all, I wanted to “catch ‘em all” when it came to “low code app builders.” I was genuinely blown away.
I entered some basic requirements into Claude:
Make a demo of a performance review app for an 8000 person company. The demo can simplify the requirements when necessary so that I have a fully shareable app that can run in Artifacts, but please highlight those simplifications to me proactively so that I can approve them. Remember that you do not need a backend or to save the data, since this is just a demo app. Try to be impressive to get oohs and aahs from the audience! Acknowledge and await further instructions.
Claude acknowledged my message. I sent the full requirements:
Build three views:
Employee View:
Displays past performance reviews with dummy data.
Allows the employee to submit a self-review.
Manager View:
Shows a list of direct reports with their current and proposed ratings (Does Not Meet Expectations, Partially Meets Expectations, Successfully Meets Expectations, Exceeds Expectations, Greatly Exceeds Expectations)
Allows the manager to update proposed ratings and add notes.
Have a tickbox for: Proposed Uplevel (Y/N), Flag (Expert/Talent/None)
Calibration View for Managers:
Displays a simple bar chart showing the distribution of ratings across the team, with a guideline for the idealized numbers (5% GEE, 15% EE, 60% SME, 15% PME, 5% DNM)
Lists the proposed ratings for each team member.
View aggregate stats (in a table) for every person, outlining their level, ladder, tenure, current rating, promo, flags. I should see a notes field for each employee, labeled with the "round of calibrations” — e.g. “Calibration Round 1, July 4th 2024”Create a visual representation of the “talent grid” where managers can drag employees to different spots on the grid. See the attached image!
Use dummy data to populate the views, so no backend or data persistence is required.
The app allows switching between different views using buttons at the top. The prototypical user here is a manager, who has to write her own self review, write employee reviews, and attend calibration.
Don’t worry about auth, permissions, or security concerns
I attached a longer document explaining the full requirements for the HR app, including specifications around roles and permissions. I gave it this instruction:
In case it’s useful context as you design the SIMPLIFIED, DEMO version of the app, here’s the expectation for the more complex version. Remember, this is just for your context – in this design, optimize for a nice demo over capturing all of the complexity here.
Et voilá — here is my app!
It’s…remarkably close to what I need?! I will caveat that it is of course not as “functional” as what I got from Retool and Coda. It forgot about the backend integration (this is Retool’s value prop) and it certainly doesn’t begin to handle roles and permissions. Yet, it is a taste of the future. It gives me hope that the curve — the ease of customization vs. power curve — will indeed move out to the point where I can type in some natural language and get a working app out of it. It is remarkable that I could specify a set of requirements in natural language and get a real UI, with working code.
Step into the future with me. You can imagine how they might take this product further, by figuring out how to integrate with the backend. The hard part about building a very stateful little app like this is ensuring the data flows to the backend correctly, and that is a very tractable problem for Anthropic. If I were the PM on this area at Anthropic, I might consider what it would take to: 1) spin up a little Postgres database for me, akin to Retool Workflows 2) integration with most systems of record, like Salesforce, Netsuite, Workday, and more 3) implement some kind of RBAC and “enterprise team view” so that I have enterprise observability.
It’s probably not something they’d even want to do — it’s very small potatoes compared to the opportunity in front of Anthropic, with billions of dollars in revenue there for the taking if they can just get enterprise chat right. But I’d implement the inverse of that roadmap if I were Retool and Coda — let people describe in natural language what they want to do, with a few helpful prompts, and build them an app where they can see the working code all together if needed.
So what?
Part of me felt exhilarated — a PM’s job is kinda turning Slack messages into products2, and now I can turn messages into products in a new way. This is going to be amazing. That said, it’s so clearly a prototyping toy for now, and I think it would take deep, focused product development on Anthropic’s part to get this right. Most people aren’t going to write a detailed spec for this product the way I did. They’d need to know how to prompt their users the right way, via a product workflow, to get the most out of them.
The end product from Claude is far from being a working app. As such, maybe the models need some work too. I hear they’re working on it. 😛
All that said, while I was doing this, I also asked both Claude and ChatGPT to generate a to-do and supplies list for an Ocean Beach birthday bonfire that I’m throwing — neither did a good job. They forgot cake! They forgot wood!
The moral of this story is that AI will come for my job specifically, but I will become a party planner instead!
Thanks, posse of Google execs that joined Stripe!
I’m joking, love you, engineering partners
I think the ability to generate a demo app is super useful, even if it's a "dummy" app. You can tweak it until you think it's representative of your vision, ans hand it off to someone or a team to reverse engineer it into a full app. It seems like it could be a huge team saver between orgs with division between design and engineering.