OP here. It's kind of ironic that making the docs AI-friendly essentially just ends up being what good documentation is in the first place (explicit context and hierarchy, self-contained sections, precise error messages).
It's the same for SEO also. Good structure, correct use of HTML elements, quick loading, good accessibility, etc. Sure, there are "tricks" to improve your SEO, but the general principles are also good if you were not doing SEO.
And yet in practice SEO slop garbage is SEO slop garbage. Devoid of any real meaning or purpose other than to increase rankings and metrics. Nobody cares if it’s good or useful, but it must appease the algorithm!
It has changed how I structure my code. Out of laziness, if I can write the code in such a way that each step follows naturally from what came before, "the code just writes itself!" Except now it's literally true :D
Maybe everyone already discovered this but I find that if I include a lot of detail in my variables names, it's much more likely to autocomplete something useful. If whatever I typed was too verbose for my liking long term, I can always clean it up later with a rename.
Yeah, I've started to think AI smoke tests for cognitive complexity should be a fundamental part of API/schema design now. Even if you think the LLMs are dumb, Stupidity as a Service is genuinely useful.
Is this you have implemented in practice? Sounds like a great idea, but I have no idea how you would make it work it a structured way (or am I missing the point…?)
It's a good tool to use for code reviewing, especially if you don't have peers with Strong Opinions on it.
Which is another issue, indifference. It's hard to find people that actually care about things like API design, let alone multiple that check each other's work. In my experience, a lot of the time people just get lazy and short-circuit the reviews to "oh he knows what he's doing, I'm sure he thought long and hard about this".
From a docs-writing perspective, I've noticed that LLMs in their current state mostly solve the struggle of finding users who both want to participate in studies, are mostly literate, and are also fundamentally incompetent
Thank you for sharing this, it's really helpful to have this as top-down learning resource.
I'm in the process of learning how to work with AI, and I've been homebrewing something similar with local semantic search for technical content (embedding models via Ollama, ChromaDB for indexing). I'm currently stuck at the step of making unstructured knowledge queryable, so these docs will come in handy for sure. Thanks again!
We see a surprising number of folks who discover our product from GenAI solutions (self-reported). I'm not aware of any great tools that help you dissect this, but I'm sure someone is working on them.
The documentation is now not just for other people, but for your own productivity. If it weren't for the LLM, you might not bother because the knowledge is in your memory. But the LLM does not have access to that yet :)
It's a fortunate turn of events for people who like documentation.
A really effective prompt is created by developing an accurate “mental model” of the model, understanding what tools it does and doesn’t have access to, what gives it effective direction and what leads it astray
It's a bit different though; the soft skills you mention are usually realtime or a chore that people don't like doing (writing down specifications / requirements), whereas "prompt engineering" puts people in their problem solving mental mode not dissimilar to writing code.
Reminds me of that Asimov story where the main character was convinced that some public figure was a robot, and kept trying to prove it. Eventually they concluded that it was impossible to tell whether they were actually a robot "or merely a very good man."
1. Stuff that W3C already researched and defined 20 years ago to make the web better. Acessibility, semantic simple HTML that works with no JS, standard formats. All the stuff most companies just plain ignored or sidelined.
2. Suggestions to workaround obvious limits on current LLM tech (context size, ambiguity, etc).
There's really nothing to talk about category 1, except that a lot of people already said this and they were practically mocked.
Regarding category 2, it's the first stage of AI failure acceptance. "Ok, it can't reliably reason on human content. But what if we make humans write more dumb instead?"
As soon as some widget in the corner of a site wiggles to get my attention, I leave. If you/they want people to actually read their articles they shouldn't try to distract readers as soon as they start.
As soon as some widget in the corner of a site wiggles to get my attention, I leave.
Here's a bookmarklet I found on HN years and years ago. I have it bound to a hot key so whenever a web site does something stupid like that, I can dismiss it with a keystroke.
Works about 90% of the time, and doesn't require any installation of anything.
Thanks! I do have ublock origin and can typically get rid of these if I need to. It's just the frustration of going to websites that ostensibly want me to read something that see fit to destroy my focus as soon as I try to start.
I have a similar one that removes any fixed or sticky element, works for popovers like newsletter subscription prompts as well as long as they don't block scrolling:
I'd even argue AI can comprehend stuff better than the average human can, so if an AI can just about comprehend your documentation, it should be made even simpler for humans.
That isn't addressing my claim: what about "use semantic HTML," "provide text equivalents for visuals," or "keep layouts simple," for example, makes docs worse for humans?
It's a problem because it's coincidentally good now. Like saying "let's not do crime to please our supreme leader". No crime is better for people but what if tomorrow pleasing supreme leader will mean something else but we are already so accustomed to do things to please supreme leader, hey they used to be good for us before right?
If your software business relies on people coming to your site to read docs then you were cooked from the start. It's about enabling your users whether they're on your site, ChatGPT, or anywhere else.
An effect I've found of building agents that interacts with an API is that the bot serves as a smoke test for error handling. If your API doesn't return a clear and actionable error, or doesn't validate json fields clearly or correctly, the bot will end up finding it out and getting confused.
I can imagine a near future where crud endpoints are just entirely tested by an AI service that tries to read the docs and navigate the endpoints and picks up any inconsistencies and faults.
Documentation communicates intent, design decisions, and usage patterns that are often implicit or scattered throughout the source code, making it valuable even when the code is available.
Code is the final implementation, you / an AI can read the code and explain what it does, but it can't explain why it was written or when it should be used without more context.
Docs function both as a summary of code (functionality) and use cases. The day ai can infer both from code perfectly is the day ai can write the dependency on the fly.
Good docs don’t fix bad apps or APIs though. I get the sense that demand for docs is a signal that there’s a deeper problem with DX. Good docs generally only exist in places where they’ve given the rest of the DX enough love in the first place, so it’s more of a mark of quality than a means to quality.
> This guide provides best practices for creating documentation that works effectively for both human readers and AI/LLM consumption in RAG systems.
What I'd be interested in seeing is best practices for creating documentation intended only for consumption by RAG systems, with the assumption that it's much easier and cheaper to do (and corresponding best practices for prompting systems to generate optimal output for different scenarios).
I wish companies would invest more in docs. It's too hard to keep the quality high if it's just another thing for engineers to do. I've seen too many cases where a small group invests lots of time and effort bringing the docs up to standard and then another person or group comes along and starts dragging down the quality because they can't be bothered taking to time to see how and where their information fits and ensuring the formatting and styles are maintained.
Eventually the quality drops to such a level that some poor bastard spends their time bringing it all back up to standard - and the cycle repeats.
The most important characteristic of any internal documentation is trust. People need to trust it. If they trust it, they'll both read it and contribute to it. If they don't trust it they'll ignore it and leave it to rot.
Gaining that trust is really hard. The documentation needs to be safe to read, in that it won't mislead you and feed you stale information - the moment that happens, people lose trust in it.
Because the standard of internal docs at most companies is so low, employees will default to not trusting it. They have to be won over! That takes a lot of dedicated work, both in getting the documentation to a useful state and promoting it so people give it a chance.
OP here. It's kind of ironic that making the docs AI-friendly essentially just ends up being what good documentation is in the first place (explicit context and hierarchy, self-contained sections, precise error messages).
It's the same for SEO also. Good structure, correct use of HTML elements, quick loading, good accessibility, etc. Sure, there are "tricks" to improve your SEO, but the general principles are also good if you were not doing SEO.
And yet in practice SEO slop garbage is SEO slop garbage. Devoid of any real meaning or purpose other than to increase rankings and metrics. Nobody cares if it’s good or useful, but it must appease the algorithm!
It's similar for writing code. Suddenly people are articulating their problems to the LLM and breaking it down in smaller sub-problems to solve....
In other words, people are discovering the value of standard software engineering practices. Which, I think is a good thing.
It has changed how I structure my code. Out of laziness, if I can write the code in such a way that each step follows naturally from what came before, "the code just writes itself!" Except now it's literally true :D
Maybe everyone already discovered this but I find that if I include a lot of detail in my variables names, it's much more likely to autocomplete something useful. If whatever I typed was too verbose for my liking long term, I can always clean it up later with a rename.
Related: "If an AI agent can't figure out how your API works, neither can your users" (from my employer's blog)
https://stytch.com/blog/if-an-ai-agent-cant-figure-out-how-y...
Yeah, I've started to think AI smoke tests for cognitive complexity should be a fundamental part of API/schema design now. Even if you think the LLMs are dumb, Stupidity as a Service is genuinely useful.
Is this you have implemented in practice? Sounds like a great idea, but I have no idea how you would make it work it a structured way (or am I missing the point…?)
It's a good tool to use for code reviewing, especially if you don't have peers with Strong Opinions on it.
Which is another issue, indifference. It's hard to find people that actually care about things like API design, let alone multiple that check each other's work. In my experience, a lot of the time people just get lazy and short-circuit the reviews to "oh he knows what he's doing, I'm sure he thought long and hard about this".
From a docs-writing perspective, I've noticed that LLMs in their current state mostly solve the struggle of finding users who both want to participate in studies, are mostly literate, and are also fundamentally incompetent
Thank you for sharing this, it's really helpful to have this as top-down learning resource.
I'm in the process of learning how to work with AI, and I've been homebrewing something similar with local semantic search for technical content (embedding models via Ollama, ChromaDB for indexing). I'm currently stuck at the step of making unstructured knowledge queryable, so these docs will come in handy for sure. Thanks again!
Now people just have a better incentive :)
"GEO[0] has entered the chat."
We see a surprising number of folks who discover our product from GenAI solutions (self-reported). I'm not aware of any great tools that help you dissect this, but I'm sure someone is working on them.
0: Generative Engine Optimization
Honest question - what do you mean? What's the better incentive?
The documentation is now not just for other people, but for your own productivity. If it weren't for the LLM, you might not bother because the knowledge is in your memory. But the LLM does not have access to that yet :)
It's a fortunate turn of events for people who like documentation.
This is also the hilarious part of "prompt engineering".
It's just effective linguistics and speech; what people have called "soft skills" forever is now, obviously, trying to be a science for some reason.
A really effective prompt is created by developing an accurate “mental model” of the model, understanding what tools it does and doesn’t have access to, what gives it effective direction and what leads it astray
Otherwise known as empathy
It's a bit different though; the soft skills you mention are usually realtime or a chore that people don't like doing (writing down specifications / requirements), whereas "prompt engineering" puts people in their problem solving mental mode not dissimilar to writing code.
(assumption / personal theory)
Also it makes it human-accessible, too. There are now projects converting Apple's JS-heavy documentation sites to markdown for AI consumption.
Reminds me of that Asimov story where the main character was convinced that some public figure was a robot, and kept trying to prove it. Eventually they concluded that it was impossible to tell whether they were actually a robot "or merely a very good man."
I can divide the suggestions into two categories:
1. Stuff that W3C already researched and defined 20 years ago to make the web better. Acessibility, semantic simple HTML that works with no JS, standard formats. All the stuff most companies just plain ignored or sidelined.
2. Suggestions to workaround obvious limits on current LLM tech (context size, ambiguity, etc).
There's really nothing to talk about category 1, except that a lot of people already said this and they were practically mocked.
Regarding category 2, it's the first stage of AI failure acceptance. "Ok, it can't reliably reason on human content. But what if we make humans write more dumb instead?"
As soon as some widget in the corner of a site wiggles to get my attention, I leave. If you/they want people to actually read their articles they shouldn't try to distract readers as soon as they start.
As soon as some widget in the corner of a site wiggles to get my attention, I leave.
Here's a bookmarklet I found on HN years and years ago. I have it bound to a hot key so whenever a web site does something stupid like that, I can dismiss it with a keystroke.
Works about 90% of the time, and doesn't require any installation of anything.
javascript:(function()%7B(function%20()%20%7Bvar%20i%2C%20elements%20%3D%20document.querySelectorAll('body%20*')%3Bfor%20(i%20%3D%200%3B%20i%20%3C%20elements.length%3B%20i%2B%2B)%20%7Bif%20(getComputedStyle(elements%5Bi%5D).position%20%3D%3D%3D%20'fixed')%20%7Belements%5Bi%5D.parentNode.removeChild(elements%5Bi%5D)%3B%7D%7D%7D)()%7D)()
Thanks! I do have ublock origin and can typically get rid of these if I need to. It's just the frustration of going to websites that ostensibly want me to read something that see fit to destroy my focus as soon as I try to start.
I have a similar one that removes any fixed or sticky element, works for popovers like newsletter subscription prompts as well as long as they don't block scrolling:
Don't fall into the trap of the new SEO for AIs. LLM are just like users. https://passo.uno/writing-for-llms-ai-chatbots/
Most of the things described in this document are also good for humans. The justification is different, but the result is the same.
Or as I like to say, if current AI cannot explain your documentation well, then it's very likely humans can't either, your documentation sucks.
I'd even argue AI can comprehend stuff better than the average human can, so if an AI can just about comprehend your documentation, it should be made even simpler for humans.
If that was true people wouldn't title their articles "writing documentation for AI".
That’s just SEO for humans ;)
Would you care to make an actual argument about why these things aren’t good for humans too?
Because they encourage the newest way for .1% to rip off and monetize us. Write for humans not bots!
That isn't addressing my claim: what about "use semantic HTML," "provide text equivalents for visuals," or "keep layouts simple," for example, makes docs worse for humans?
It's a problem because it's coincidentally good now. Like saying "let's not do crime to please our supreme leader". No crime is better for people but what if tomorrow pleasing supreme leader will mean something else but we are already so accustomed to do things to please supreme leader, hey they used to be good for us before right?
AI will make it easy to get your documentation up for your users!
> Step one, write the documentation yourself.
> Step two, bots hit your website hundreds of times per minute.
> Step three, users never come to your site, they use OpenAI's site.
> Step four, ??? openAI profits
If your software business relies on people coming to your site to read docs then you were cooked from the start. It's about enabling your users whether they're on your site, ChatGPT, or anywhere else.
Software business is a business because it has customers, not users. In this case, you're left with the users, while OpenAI takes them as customers.
An effect I've found of building agents that interacts with an API is that the bot serves as a smoke test for error handling. If your API doesn't return a clear and actionable error, or doesn't validate json fields clearly or correctly, the bot will end up finding it out and getting confused.
I can imagine a near future where crud endpoints are just entirely tested by an AI service that tries to read the docs and navigate the endpoints and picks up any inconsistencies and faults.
I need to register kap.ai if it isn't already (Ka pai being Māori for "Well done!").
Assuming the app in question is open source, which certainly not all of them are, why would the AI read the docs, if it can read the source?
Documentation communicates intent, design decisions, and usage patterns that are often implicit or scattered throughout the source code, making it valuable even when the code is available.
Code is the final implementation, you / an AI can read the code and explain what it does, but it can't explain why it was written or when it should be used without more context.
Docs function both as a summary of code (functionality) and use cases. The day ai can infer both from code perfectly is the day ai can write the dependency on the fly.
Good docs don’t fix bad apps or APIs though. I get the sense that demand for docs is a signal that there’s a deeper problem with DX. Good docs generally only exist in places where they’ve given the rest of the DX enough love in the first place, so it’s more of a mark of quality than a means to quality.
Yes, and creating documentation is an exercise in understanding the whole experience. Often, nobody on the team truly gets it.
> This guide provides best practices for creating documentation that works effectively for both human readers and AI/LLM consumption in RAG systems.
What I'd be interested in seeing is best practices for creating documentation intended only for consumption by RAG systems, with the assumption that it's much easier and cheaper to do (and corresponding best practices for prompting systems to generate optimal output for different scenarios).
How do you turn off dark mode on that site? Hurts my eyes
Thanks for the feedback. We should definitely add that. :)
In Firefox, Reader View (F9) seems to handle it well.
A bit off topic, but I've been finding myself write "plan.txt" files for claude code.
1. Write plan 2. Ask Claude to review for understandability 3. Update as needed until it's clear 4. Execute the task(s) in the plan.
I'm finding Claude gets much further on the first pass. And I can version the plans.
Why aren't you not interested in learning to code instead?
Lol. I have 25 years of experience coding and am a CTO, I'm interested in coding faster, and focusing on higher-order problems.
I thought one of the use cases of AI is to write documentation? And I thought AI adapts to humans, now it seems the other way around
AI doesn't adapt to anything. Training creates a fixed model that doesn't change until you replace it.
I wish companies would invest more in docs. It's too hard to keep the quality high if it's just another thing for engineers to do. I've seen too many cases where a small group invests lots of time and effort bringing the docs up to standard and then another person or group comes along and starts dragging down the quality because they can't be bothered taking to time to see how and where their information fits and ensuring the formatting and styles are maintained.
Eventually the quality drops to such a level that some poor bastard spends their time bringing it all back up to standard - and the cycle repeats.
The most important characteristic of any internal documentation is trust. People need to trust it. If they trust it, they'll both read it and contribute to it. If they don't trust it they'll ignore it and leave it to rot.
Gaining that trust is really hard. The documentation needs to be safe to read, in that it won't mislead you and feed you stale information - the moment that happens, people lose trust in it.
Because the standard of internal docs at most companies is so low, employees will default to not trusting it. They have to be won over! That takes a lot of dedicated work, both in getting the documentation to a useful state and promoting it so people give it a chance.
Excellent advice. Good documentation makes a huge difference in AI-assisted software development.
Makes web scraping easier too.
[dead]