Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The case for continuous documentation (virtuallifestyle.nl)
106 points by morchen on June 6, 2021 | hide | past | favorite | 70 comments


I'm adamant that the documentation for a project should live in the same repository as the code itself. This is crucial for a number of reasons:

1. If the docs are in the same repo, a commit that changes the code can update the relevant documentation (in addition to the tests) as part of the same unit of work

2. This means it can be enforced during code review: if a developer forgets to update the docs they can be reminded before they land their PR

3. This also provides a version history for the documentation which is synchronized with the code history. This is really useful when looking at history and trying to figure out what changed when.

4. This also works great with branches, PRs and releases. New features can have their documentation developed alongside the code in a branch, which makes it easier to understand a proposed change. If your software is deployed in multiple places as multiple versions (or even just staging vs production) you have a way to view the correct documentation for each individual deployment.

5. Added together, all of this builds trust. A common problem I've seen with internal documentation is that no-one trusts it to be up-to-date. Making it part of the regular code development lifecycle can fix this.

6. If you do this, you can write automated tests that enforce aspects of your documentation! I call these documentation unit tests, and wrote about them here: https://simonwillison.net/2018/Jul/28/documentation-unit-tes... - even something as simple as a test that fails if a new API endpoint isn't mentioned in a markdown file using simple string matching can ensure no-one forgets about the docs when they add a new feature.


> a test that fails if a new API endpoint isn't mentioned [...]

That's a great idea. A related thought I had was testing assertions about architecture by looking at the graph of imports or calls.

From your blog post,

> if a change doesn’t update the relevant documentation, point that out in your review!

What you get at but don't seem to state explicitly is that finding the relevant documentation is hard, for both the author and reviewer. Finding any relevant documentation should be easy, but ideally we're finding all relevant documentation, and we need to be reaching sufficient confidence that we're missing little enough that we're able to hold back entropy enough to keep the docs useful.

Your tools address this for some cases! Doctest addresses this for other cases. From TFA here, it sounds like Swimm.io tries to address this for more cases (my gut says the article oversells it but I intend to look more closely).

To get further, an idea I've been toying with is to treat claims (implicit or explicit) in documentation as requiring citation, pointing not at sources but at tests. Ideally the test runner, when a test fails, can then surface all references to that test. In addition to highlighting portions of the docs that may need to change, this seems likely to also provide crucial context when fixing the code and/or the test.


How do you read the in-repo documentation? Search for all files named readme.md? I have never learned about a library from documentation scattered about the repo. There's the readme at the root, and everything else is on a web page, which is a better way to organize and browse documentation.


I use documentation systems that publish the documentation from the repo to a website. Most of my projects use Sphinx and reStructuredText for this, but I recently tried MyST (Markdown for Sphinx) and I like that a lot.

Some examples:

- https://docs.datasette.io serves documentation from https://github.com/simonw/datasette/tree/main/docs - which has documentation unit tests here: https://github.com/simonw/datasette/blob/main/tests/test_doc...

- https://sqlite-utils.datasette.io/ serves from https://github.com/simonw/sqlite-utils/tree/main/docs - unit tests here: https://github.com/simonw/sqlite-utils/blob/main/tests/test_...

- https://django-sql-dashboard.datasette.io/ serves from markdown in https://github.com/simonw/django-sql-dashboard/tree/main/doc... - I don't have documentation unit tests for that yet

Those three are all hosted on https://www.readthedocs.org but I've also used this trick on web app projects that host their own documentation deployed as part of the build process.


This is a good approach and reStructuredText well suited for technical documentation. The only downside I see in reST is, that it is mostly only readable from the standard implementation in Python, which is a custom parser, not a portable grammar for other languages to implement in any parser tool / library. There are some libraries for parsing it, but last I checked those were incomplete.

Emacs org-mode would be a great candidate as well, for things like runnable code inside the documentation as examples and export to many other formats and nice tooling for viewing the files. Unfortunately many git hosts do not support it well and render crappily.

I would recommend against switching to a non-standard Markdown dialect. Why switch away from reST, which offers all of the things nessecary for a good technical documentation, to some Markdown dialect, which has many important features only bolted on?

That said, one thing I noticed happening with this approach is, that people think "Oh, I have my documentation in my comments of the code! I don't need to write anything else!" And then I end up reading documentation like: "def get_a(): ..." Docstring or comment: "Get a." Wow, thanks, how helpful!

In short: There is no simple way to have good documentation, except for writing good documentation. Docstrings probably will not be sufficient, unless you write whole novels in your docstrings. A good documentation needs usage examples and rationale of why something was done in a specific way, what kind of gotchas there are and probably other stuff, that does not come to mind right now.


"Why switch away from reST, which offers all of the things nessecary for a good technical documentation, to some Markdown dialect, which has many important features only bolted on?"

Honestly, the main reason is that I've encountered developers who have an almost alergic reaction to rST - they genuinely hate writing in it, and will be deterred from writing documentation if they have to figure it out.

Custom Markdown flavours are more likely to get buy-in.


I see. Sometimes it seems, that there is some kind of animosity towards any non-markdown format, as if markdown was the one and only. Ha, so far from it ... But everything else must be eradicated with some kind of hostility, it seems. The people exhibiting this kind of behavior often do not know other formats well, nor have they bothered to use another format for a while to find out about what it can express. It is as if children do not want to give up on their favorite toy, even when there is need for something more capable, to write technical documentation.

I have used Markdown, Pandoc Markdown, probably Github Markdown, probably other dialects of Markdown and I kept missing features for writing documents.

I have written a thesis in rST and it worked very well, even though Pandoc at the time did not understand citations and arbitrary document internal references (both of which standard markdown does not have) in rST well enough. I had to write my own pre-parser for it. I could not have written that thesis in some markdown dialect incapable of expressing things, that one simply needs in academic writing. There was Pandoc Markdown, but it felt bolted-on and rather ad-hoc in comparison to rST, which brought all the things out of the box.

I remember also looking at AsciiDoc at some point. Not sure, why I did not choose it.

Then I discovered Org-mode. It has been a journey to uncover more and more of org-mode capabilities. Org-mode does not suffer from the same eco-system split up as markdown does. It does not have this bolted on feeling, that many markdown dialects leave me with. It usually has everything I need. In fact, I have written technical documents in it as well and have discovered things like literate programming. The tooling in Emacs is so good, it is really nice to work with org files. Good, that I can still export to primitive markdown, for anyone not knowledgeable in org-mode. Even though org files are merely human-readable plain text documents. It is unfortunate, that VCS hosts are unable to render them properly.

I would really dislike going back to markdown only. It feels so limiting to me now, that I have developed an aversion against writing documentation in an unsuitable format like markdown (talking standard markdown, probably commonmark). If there is already markdown only documentation, then well OK, I guess I can write it, although it surely will not feel great.


> Sometimes it seems, that there is some kind of animosity towards any non-markdown format

Can't speak for anyone else, but for my part there is animosity towards any format, period. Documentation should be plain ASCII - or, when strictly necessary[0], UTF-8 - text, and readable as such with only human-generated ad hod syntax such as *emphasis* or

  +------------------+
  | ascii-art tables |
  +---+--------------+
  | + | corner       |
  | | | side of cell |
  | - | bottom/top   |
  +---+--------------+
or http://links.to/whereever.written-literally. To the extent that markdown is tolerable, it's because it does not demand any obfuscation in how documentation is written.

Admittedly, I'm not necessarily a repesentative example of anything.


That is an interesting approach as well. What I see as advantages are: that there is no need for a specification and no need for it to be rendered, as one is to view it as plain text.

What I see as disadvantages are: It is not possible to render it properly or have support for it beyond showing plain text. Many users might write things different from each other, which might affect how easily a reader picks up what each document's style or conventions are.


> It is not possible to render it properly or have support for it beyond showing plain text.

Sure you can, and indeed that's what markdown was originally supposed to do (at least as it was first described to me). The catch is that you have to treat the plain text as the authoritative version: you do not edit the plain text in order to produce changes in the rendered document; you edit the plain text for it's own sake, without considering the effect on the rendered version, and the rendered version is derived from that.


> when strictly necessary[0]

Missed this and too late to edit.

0: Which historically meant "not stupid-quotes", with a addendum for stupid-dashes, stupid-ellipses, etc, but now also includes "not emoji".


I agree wholeheartedly.

The most annoying thing about Markdown is that most of the time (90%, I'd say) it's added some kind of formatting, I was not trying to format anything at all.


This is definitely the best approach in my opinion, providing the people writing the docs are capable of contributing directly.

One of my projects[0] builds and deploys a static documentation site[1] on every push to master. The static site generator (Nanoc, in this case) imports the library and uses it to publish its own documentation. All the examples are snippets of code[2] that are both displayed as-is and eval'd into the final output.

The guide can never be out of sync with the library.

[0] https://github.com/dfe-digital/govuk_design_system_formbuild...

[1] https://govuk-form-builder.netlify.app/

[2] https://github.com/DFE-Digital/govuk_design_system_formbuild...


In the PostgreSQL codebase they have readmes scattered about and it’s great to jump into some subfolder and get the details you need in the right context.


If you write your documentation in Emacs org-mode, you could use include [1] to include files on other levels of your repository and then you could export it all to a markdown file for people, who do not know about org-mode and Emacs. This would make documentation anywhere in the repository discoverable / automatically included in your resulting documentation file.

[1] https://orgmode.org/manual/Include-Files.html


The webpage can be generated from markdown files which can be in the repo.


Static site generated from the repo. Can be hosted locally or online. Github pages are usually leveraged for this usecase


If using Django there are tools like django-docs (https://django-docs.readthedocs.io/en/latest/) and the recently released django-sphinx-view (https://noumenal.es/django-sphinx-view/).


You can use the CI to publish you md files with tools like https://docusaurus.io/


Write the documentation as text files.

Include hashtags for topics, e.g. #authentication, #language, or #netscape.

Use the text file indexing and management system to browse and update the documentation.


Check out Backstage, and in particular, TechDocs: https://backstage.io/docs/features/techdocs/techdocs-overvie...


None of this works if the programmer isn't the same person that writes the docs. e.g. if you have a copy-writer come along and write/update the docs before each release, then its not captured in the same commit/branch/etc.


That can still work, in a couple of ways.

You can have the programmer write bad documentation and file an issue for it to be improved. You can then enforce that releases don't go out until those issues have been resolved by the copywriter.

You can also implement new features in a branch with multiple authors. The branch doesn't get merged until the documentation is in good shape.


It still works ok, just not as nicely.

The copy editor updates the repo and while their changes won’t be in the same commit, they should be nearby.

So you still get the benefit of docs history, and being in the same place.


I think you could also make a case that if you've adopted these strict documentation requirements you don't decouple the functional commits from the documentation commits. Your copy editor could work in the same branch as the code changes and then you only approve the PR when the documentation is at the same level of QA as the code. Otherwise a fear separate commits or branches is the thin edge of the wedge.


Sure it does, developers work with domain experts on features all the time. Perhaps you had SDETs adding some test infrastructure, or designers adding assets and layout information. Either you can have feature branches, or stubs/scaffolding for CI. I've seen both work.

The biggest issue I've see is management and product who seem allergic to the actual repos for some reason.


> I'm adamant that the documentation for a project should live in the same repository as the code itself. This is crucial for a number of reasons:

I have a small disagreement on that. I always feel documentation should be outside of code as it might need to be reviewed by people who do not have access access to the code base. Then there are also the "Why are we doing this" part of documentation that is difficult to mark out in code. Documenting in code is great for addressing "how are we doing this" part IMO.


The docs may live in the repo but they should definitely be published somewhere that non-GitHub users in the organization can view.

The higher level strategic stuff can absolutely live elsewhere - in my experience Google Docs or some kind of company-wide wiki often come into play here.


I was thinking more of an enterprise situation where multiple systems might be working together as part of a process. The code for each system might be in different repos in which case the documentation would also be broken. In such a context having the unit level documentation in the repo makes sense, but the overarching process documentation that is constantly evolving and referenced by multiple teams cannot be part of this repo.


I assume the OP meant, or at least how I took it and implement it, it can be in a dir called docs/ off the root of the tree. It doesn't have to be literally scattered about in the actual source, just in the same repo.

Though there are reasons to do both(some docs from the source itself, and some from docs/) but that's a different debate.


Is there any lib/way that automatically generate doc for api/endpoint. I think it is possible to create such generator for graphql api.


You can checkout Swagger and OpenAPI, there are libraries to annotate the endpoints in the code and then generate interactive docs out of that.


If it's technical docs for developers, you'll get more bang for your buck by making executable documentation first - tests, deployment automation, build automation. Make it so that to do 1 logical action then there's only 1 step needed.

How do i build this? Run the build command.

How do i test this? Run the test command.

How do i run only the unit tests? Run the unit test command.

How do i start this locally? Run the start-local command.

How do these components interact? Run the contract-test command.

...

That fixes "reference" type docs better than any reference doc but there's still a place for technical guides around a code base but short screen recordings voiced over by an experienced dev on the project navigating their IDE will beat any written guide on any metric (time to write, usefulness etc.)

If it's customer facing docs, treat them as code and host them inside the application in some way. There's few things worse than reading the wrong version of a doc.


GitHub have a pattern for this called "scripts to rule them all" - https://github.com/github/scripts-to-rule-them-all - I've not fully adopted it yet but I probably should, it looks very well thought-out.


We just abuse Make, so make test, make bootstrap, etc.


If you essentially have one live version of your program, like you do when it runs on your servers, coupled documentation is probably a good idea.

If you have multiple active versions, like you often do with software released to run on someone else's machines, coupled documentation "works," but has a big downside. Namely, it prompts people to "refactor mercilessly"/change everything all the time, together with the documentation.

When you need to maintain multiple versions at a time, having a single version of the document explaining the differences between all the live versions can somewhat curb the enthusiasm for gratuitous changes (since whoever does the changes must also maintain the increasingly long and ugly description in the single document describing all the versions.) And someone needing to work with all those versions has these differences nicely laid out and those areas not having differences also clearly visible. Whereas with multiple versions of the document you need to "diff" these versions if you want to build a mental model of what changed.

Sadly (for those agreeing with this), I presume that the above is a minority opinion.


I think this argument makes perfect sense. Personally I don't like in-line documentation. It's mostly popular among people who depend on bulky proprietary IDEs. I hate all these approaches which try to trick people into using proprietary tech.

I enjoy reading a nice documentation website maintained by the open source organization; it also gives me a touch-point with the organization which created the library. I also agree with your nuanced argument concerning incentives. Arguments related to incentives are almost always discarded by managers but they are very important.

I do think decoupling the documentation encourages people to think about documentation more carefully as a distinct and important activity. I find that in-line documentation tends to be neglected; as a developer, when you're in the middle of coding an important feature which requires your full attention, you don't want to be distracted with updating the comments all the time because it breaks your train of thought. Usually developers tell themselves that they will do it later and they often forget. Comments are often neglected in the PR review process too.

There is no way around it, you need to set aside some time to write or update the documentation as a distinct activity. There is a time for coding and there is a time for explaining.


Can you not just not write the documentation, even if it resides in the same repository, and then later make a commit to update it, as a separate task?


Yes, but why take up space in the actual source code and repo? It increases space usage both in terms of disk space (which means more download time) for the library and requires more scrolling when reading the code.

Also, if the library is a sub-dependency which the developer doesn't interact with directly, why should they download the documentation for it? They will never read those comments in the code anyway.


I see the value in internalizing the cost of breaking changes. But rather than just suffer a doc burden to discourage change, why not fix it with something like Stripe’s version conversions?


I'm a fan of having code samples in the documentation, and making sure (at e.g. build/test time) that those samples actually work. Given the headline, I thought the article would talk about this, but it's more of a general "why and how you should keep your documentation up to date".


The product actually makes sure code samples stay up-to-date when the code changes.


I just saw their demo. Fresh approach.


In the least - your repo should be the main gateway to a proper WIKI. The problem with decoupled documentation is that it's the proverbial tree in a forest - no one knows it's there when it "drops".

Docs are like code - the less your write of it, the less you have to maintain. Documentation should be treated as inherently evil. The only worse thing than no documentation is documentation that is not maintained and out of date. There is nothing more infuriating than following the docs only to find out from someone later that it was antiquated. Why is it there?

There are common sense rules. Why would you have the docs on how to set up a fresh checkout NOT live with the checkout? How would I know it lives somewhere else? How would anyone update those steps if those steps were not code?


Yes, "Wrong documentation is worse than no documentation."


An example of how to do this is the documentation for the PHP framework Symfony. Code examples from the documentation are run in the CI server. If a pull request breaks a code example, then that example must also be fixed as part of the pull request.

That's a fantastic feature for a popular open-source framework, as it means the documentation remains up to date.

I'm not sure at what point it's worth the effort for an internal project, though. If you have a cultural problem with incorrect, out-of-date or missing documentation this could make things worse. I'd look for the root cause of that first (training, motivation?), before trying to enforce it with technology.


Testing your examples from documentation is actually a great idea.

As a security tester, I can't even count the number of times I've gotten API documentation with omitted info like perhaps-trivial-to-them-but-blocker-to-me how to actually authenticate against the API. If they actually had it running somewhere, that sort of thing can't be missing. (Also, the number of times I've asked for API docs in an API-only test and they go "umm, let me task someone to write that real quick"... like, what did you think I was going to work with, balloons and thin air?)

When appropriate, I'll definitely be recommending clients to include their documentation examples in testing. But they will probably ignore it like the rest of our non-high-risk advice (today's 'low' findings are tomorrow's stepping stones for ransomware).


There are still unsolved problems with documentation that I'd like to find solutions for.

Everything can be made into code, but at a certain point it's just so complicated to do that you're spending more time and money automating your docs than your applications. So until we have solutions for all that, you will have to maintain some docs manually.

For those manual docs, how do you keep them fresh? I've thought of automatically sending an email to warn that in 30 days the document would be deleted unless someone updated it, but even if people agreed to such a system, they could just update some punctuation and it would remain stale. Even blank pages, people seem to want to keep around rather than fix.

How do you navigate your docs? Search engines actually suck for the most part. Search is a hard problem to solve, and a home rolled search will usually net terrible results. On the other hand, most people don't have the time to maintain a governance structure for their docs, much less an enforcement mechanism, so the docs invariably become terribly organized.

People also seem to need training to learn how to write good docs. I know there are some trendy pages being passed around about some kind of "golden framework for docs" but they don't explain how to write them either. I know how to write docs, but I feel like I'd need to write a whole book to get it across. One thing I found really useful was Atlassian's newer Confluence page templates, which come well organized and primed with examples of how to write the docs.

As a philosophy, I really, really love GitLab's Handbook First model. Their handbook is incredibly detailed and covers pretty much their whole organization, and is fairly easy to update. I feel like this one one of the magical missing links in getting more documentation for the important things that aren't code.


I believe this to be one of the success stories of my programming language, MethodScript[0]. Early on I made the strange decision (in the sense that I’ve never seen it elsewhere) to make the documentation for each api element be part of the code itself. The documentation generator is part of the code as well, so every single build of the software is capable of generating bespoke documentation for that exact version. The website simply hosts the newest version, but you can always generate your own locally.

Of course I also enforce that contributors must add/modify documentation at the same time as the code, but that’s easy, because if you modify most of the code, the documentation is also right next to it.

[0] https://methodscript.com


Documentation is great and all, but no one ever talks about when theres too much. Maybe because it's rare? At big companies I've seen "architects" churn out page after page of diagrams, design docs, runbooks, checklists, descriptions, etc. There is so much information that it becomes practically useless in aggregate, because no one is reasonably going to read it all. I'm not going to pretend that I have the patience or the attention span to have the gnostic mysteries of our Kubernetes infrastructure revealed to me.


Nothing a good search engine and proper requirements can fix.

In theory it should work like this:

I) "User should be able to do X" <- requirement

II) "X can be achieved by performing steps A, B, and C" <- description of the implementation (high-level, user-perspective)

III) "A works by using components 1 and 2" <- technical documentation (design-level, architecture perspective)

etc.

I) generates your index (what can I even do with the software)

II) generates the documentation (how can I do it)

III) and below is for technical use only (extending, modifying, porting)

Stuff like rationales for design decisions can be structured in the same layered way.

I don't know how something like this can be extracted after the fact, but no matter the development model (waterfall/agile), a structure like this should arise naturally anyway and the absolute amount of documentation isn't a problem. Lack of proper structure, however, is.


There is no such thing as too much documentation. There is out of date documentation, inaccessible documentation, unindexed documentation, poor documentation, redundant documentation, etc... But what you described is amazingly valuable. Just because it's not valuable to you, right now, doesn't mean there's too much of it.


In such cases I wish for a "cookbook" style of additional documentation, that I can search, to find examples for doing, what I want to do.


Someone is promoting swimm.io in a sort of sideways way? I sort of agree with the author, but the only solution presented is this unknown product.


I couldn't find any other product that solves this basic problem, happy to hear about them if they exist.



This is also the goal of Read the Docs. We even use the same wording in our docs: https://docs.readthedocs.io/en/stable/

The Python ecosystem has been building docs this way for over 10 years now, and it works great.


The article seems to skim over the complexities of docs workflows and the role of docs. In fact, docs are nowhere to be found throughout the article. What are those “docs” that the OP is talking about?

Perhaps it’d be fair to rename that article “The Case for Continuous READMEs”.


I don't agree with this at all. There are many projects such as Node.js which have excellent, up-to-date documentation on their websites (for all past versions too). This is good for the Node.js project because it forces developers visit the website which gives the open source project an opportunity to connect with their developers, to potentially monetize and stay independent.

On the other hand, in-code documentation is hard to follow because it's scattered all over the source code, relies on special IDEs (more corporate lock-in) and developers often forget to update the documentation anyway (even more easily than they would forget to update the website). Not to mention that it takes up a LOT of space and requires more scrolling; IMO this has a negative impact on the readability of the code. Well written code is simple enough that it doesn't need much in-line documentation.

I don't know why, but these days, when it comes to software development, I find that I disagree with 90% of all the top links that make it to the top of the HN front page. A lot of the practices which are being advocated are inefficient, bureaucratic and they seem to align with corporate interests as opposed to developer interests.

The agenda seems to be about making developers more reliant on proprietary tools, IDEs, subscription SaaS services - All at the expense of free software principles.

There is also an agenda around making developers more reliant on teams and less independent in the software development process. I remember coming across some outrageous claims such as "Good full stack developers don't exist". Also there is a push towards monorepos and other corporate structures which limit the degree of possible decentralization and autonomy of different projects and their dependencies. The shift towards static typing is also part of the trend towards centralization, de-modularization and high inter-dependency with proprietary tools and services.

It's kind of ironic that tight coupling used to be considered one of the main signs of low-quality code but this concept is barely mentioned these days and the agenda is to promote it without saying outright what is going on.


Ever wondered why the most popular package managers which have the most modules are all for dynamically typed languages? e.g. npm, Ruby Gems, pip... It's because dynamically typed languages are more modular since they have less rigid interfaces. With statically typed languages, there is a possibility that the type system of library Y might not correspond very elegantly with the type system of your own project X. Static typing require stronger coupling between the project and its libraries; it's typical that projects written by different teams will follow completely different typing conventions and names (for many different reasons); this adds friction.

A very common one is when a library was written before some new Type/Interface was introduced as part of the core language and the library had invented its own abstraction which does the same thing... So the interface exposed by the library became redundant. Statically typed libraries require a lot more maintenance and this may also explain why companies are increasingly pushing for a monorepo structure which facilitate this constant maintenance which would have been unnecessary with a dynamically typed language.


I completely agree that documentation should be part of the CI/CD and that it should be part of the code.


The only way for documentation to be part of CI (IMO) is for missing documentation to cause failed builds. There are a few ways I could think of to enforce this but is there anything off the shelf that does this?


I've been doing this for nearly three years now - it works really well.

It's not particularly sophisticated - just some tests which introspect the code and then use dumb pattern matching against the documentation text to check that different concepts from the code are mentioned at least once in the docs: https://simonwillison.net/2018/Jul/28/documentation-unit-tes...


Are you the guy who wrote this? Would love to interview you for my podcast.


Literate programming is probably the best documentation


That does not solve describing flows or patterns in the code


Oh cool, so books and websites in general are “bad” now.


Not, by this metric, if they live in the same repo as the code. When they don't, they have the same problem as any strongly coupled systems maintained across multiple repos, or you are paying the cost of keeping the two uncoupled.


This is why projects with micro service architecture are best saved in a mono repo


I think the monorepo question is complicated, but this is generally a strong element in the pros column. One exception to that is when you want decoupling for other reasons, the pressures of multiple repos can help motivate it in the day-to-day.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: