The benefit of being blind: the screen reader announces invisible characters and...

mwcampbell · on Nov 10, 2021

The next time someone tries to tell me that a true screen reader should use computer vision and machine learning (including OCR) rather than requiring applications to implement accessibility APIs, I will bring up this case.

SilasX · on Nov 10, 2021

HN exchange:

"Why can't we just, you know, direct blind users to a special protocol that structures the data appropriately and then lets them parse it however they want?"

Me: 'We did! It's called HTML! Designers just broke it!'

https://news.ycombinator.com/item?id=20224961

mwcampbell · on Nov 10, 2021

IMO, HTML is still closer to that ideal than anything else we have. My guess is that given a random web application and a random non-web GUI (especially if the latter is multi-platform), the web application will be more usable with a screen reader.

slaymaker1907 · on Nov 10, 2021

I'd say markdown is even better than HTML for writing generic documents since it enforces simplicity. In particular, it forces a linear flow of the document and does not have any support for stuff like JS.

joquarky · on Nov 10, 2021

And now many people are excited about throwing all of that away with canvas and web assembly.

mmis1000 · on Nov 11, 2021

Is it possible for developer to make canvas accessible? For example, pronounce "You are one the road that goes from left to right, there is a shop on your top and a inn on bottom" like mud.

gostsamo · on Nov 11, 2021

Real accessibility is about presenting the same information your other users have. So, instead of you typing the description, let each of the drawn objects have its own description and made them discoverable and navigable. I think that Google were trying to make flutter2 components accessible, but it means starting from ground zero and building the same stuff anew.

skyde · on Nov 10, 2021

right! But we would not need to use canvas if updating the DOM was not super slow.

I suspect the #1 reason is the layout/reflow engine but i might be wrong. Game engine do run physics at 60fps which is harder than CSS reflow.

HeavyStorm · on Nov 10, 2021

Html could have been that - or better, it was at first - but instead of creating a more specialized solution for running rich apps we decided to exploit html.

Right now we are in what I'd call the worse of both worlds, because we rely on html to do things it wasn't designed to, and there's no longer purity in any html out in the wild.

gostsamo · on Nov 10, 2021

Yep, together with the ml screen reading, they do not offer subsidized infinite battery life and machine learning hardware for the inferring model.

IceWreck · on Nov 10, 2021

How hard is it to program while being blind ? What sort of development do you do ? i understand that frontend is impossible but what other difficulties do you face ?

Are indent based langauges like python harder than bracket based languages ?

gostsamo · on Nov 10, 2021

Hi,

Front end is not entirely impossible, but impossible on doing pixel perfect designs. Otherwise, I know blind people who do FE, not sure if most of it is professional though.

Indent based languages are actually easier. Every screen reader has a way to announce indentation in code, while brackets could be confusing if not formatted or verbose if properly announced.

My main issues are dev tools with bad accessibility. Also, it takes me more time to get acquainted with new code and sometimes omophones in the source code which require extra attention. Filtering through logs is also a bitch in most cases. Besides the dev tools, you can summarize the rest as bad IO speed.

akavel · on Nov 10, 2021

Do you have some tricks for how you handle filtering through logs? Or some ideas if there could be a tool that could help you or mitigate your most critical issue[s]?

I found filtering through longs a major pain even for a fully sighted person like me, so I wrote a tool to help me with that, but it's fully in a "TUI" paradigm (i.e. curses-like), so I presume it wouldn't help you much (https://github.com/akavel/up). No promises, given that the tool as is scratched my itch, but I am honestly curious if something similar could reduce your PITA, including whether this specific tool could be made useful for you through some minimal effort on my side.

gostsamo · on Nov 10, 2021

Hi,

usually grep saves the day. I will check your tool, but what I need is for a terminal command that can recognize the meta fields from a log record and put them on a line separated from the main message. Also, it must be installed everywhere I work, which is not so easy. Putting logs in a table with filtering capabilities might be best, but this means web access to the location of the logs which is again tricky.

akavel · on Nov 11, 2021

The idea of my tool is not really to help in some specific way of processing logs, but rather to make it easier to fiddle with grep and other Linux CLI filtering tools, by shortening the feedback loop vs. normal shell REPL. I'm not sure if that sounds in any way clear; also, it might sound strange that shell REPL is IMO too slow, or that it is in any way important, but I found it enough so that I invented a way to speed it up, and it seemed to hit a nerve of quite many people, seeing from the reception the tool got. I can try to explain more if you are interested and/or don't really understand what I might be even talking about. I tried to explain it in the readme, but for many reasons I have no slightest freaking idea to what extent it might sound understandable to you - not the least due to the fact, that for fully-sighted people one important way I tried to convey the idea is through an animated gif of a terminal window showing how the tool is used. As someone said, with really innovative ideas, it's often necessary to push them down people's throats to make them understand those; this gif is part of that effort, and fortunately seems quite effective, but even it is not enough for some people, and I'm by default assuming that it's for obvious reasons completely inaccessible to you. I was pondering just now whether trying to copy this animation to asciinema could make it in any way more accessible to you, but as of now I have high doubts if that would work at all (including that I have no idea whether the asciinema site is at all accessible to you, as well as that I'm highly suspicious the terminal ANSI sequences generated by a library I'm using are "tiniest diffs", so although the result might look indistinguishable to fully-sighted people, a screen reader might [or might not???] take them at face value and read them as a mess of random jumps and single-character changes on the screen). That said, I'm more than 100% happy if you managed to understand what the tool is doing, and just don't think it could be useful to you, whether as is or with some accessibility improvement attempts. Or if you don't understand but don't feel like wanting to dive deeper either.

ryanianian · on Nov 10, 2021

> what I need is for a terminal command that can recognize the meta fields from a log record and put them on a line separated from the main message

Isn't this the exact use-case of structured logging?

Log events have

    {timestamp, log level, log category, string message, ...arbitrary key/value pairs}

Usually serializing each message as a single json line in a file.

Since it's all on one line you can still use grep, but then since it's machine-readable you can pipe the grep to anything that can parse json. Vanilla python3 works and tends to be a part of most ops toolkits. Such tooling can split out the fields onto other lines etc or in a more reader-friendly format.

gostsamo · on Nov 10, 2021

Yes, this has been my idea in many cases, but it is not always that I have a say over the logging format.

MathCodeLove · on Nov 10, 2021

I've been struggling with eye strain and have considered trying to approach development in a fashion similar to that takeb by blind devs. Any suggestions for guides or overviews for how I can get setup?

gostsamo · on Nov 10, 2021

Hi,

it depends on what you are working on and what you want to do. Generally, screen readers are not as good for programing as they are for plain text stuff, so they will be a limited substitute for whatever you are using now. If you are okay with working slower, they can help you listen through code and tool's messages providing relief for your eyes.

If you are using Windows, NVDA is the screen reader. Jaws is a bit too expensive for my taste without any significant edge over NVDA. The builtin narrator is still immature in my opinion. VSCode has excellent accessibility with a dedicated and involved team. Visual Studio also has extremely good accessibility support though I'm not using it. InteliJ sucks. Not completely, but enough that people do not see the benefit of using it. Eclipse is not popular these days, but it has good accessibility as well as far as I know. Sublime is not accessible.

If you are on Linux, the screen reader is Orca. It does not have the same level of support as the Windows stuff, but I know people who are developing on linux boxes so it is doable. Emacs must be good enough because it has self-voicing plugin and people who like and use it. As far as I know, VSCode for Linux has some accessibility features but I don't know how they compare to Windows.

If you are on Mac, your only choice is Voice Over by Apple as screen reader. It is good but not always perfect to my knowledge. I know people who use TextMaid, XCode, VSCode, and Emacs, but I don't have much feedback from there. It is totally doable though.

On Windows, I'm also using notepad++ as secondary editor because it is faster and works better for large files. Also, it is a good notetaking tool.

We can connect offline if you need some more info.

mrlemke · on Nov 10, 2021

I am very interested in how blind developers work. I have been pondering how to make computers and development more accessible. If you don't mind:

Do you have preference between CLI, TUI, or GUI dev tools?

Is highly symbolic code harder to understand using a screen reader than plain language code? By symbolic, I specifically mean any characters that are not alphanumeric.

gostsamo · on Nov 10, 2021

Hi,

I don't have preferences on the interface. As far as it is accessible, I can learn to work with it. E.g. VSCode make everything possible to make their interface accessible and they are continuously fixing any reported issues.

When it comes to code, verbose is better. Abbreviations take effort to decode. I can remap some symbols to have different pronunciations, but it does not work always. E.g. I've maid the sr to speak the ":=" operator in python as "assigned from", but brackets have nesting and orientation, and too many of them get nasty to listen to or follow.

exikyut · on Nov 11, 2021

It would be really cool to be able to hook into where words started and ended. Then you could add a background tone/frequency rising in pitch with indentation level (and maybe have no tone for the root level).

Oh, if only speech engines broke down the utterance process and made it more open...

gostsamo · on Nov 11, 2021

Hi,

Indentation level is a solved problem and start and end of words is also a customable behavior. Speech engines as a whole are open to customizations. There are some problems though that are just not easy to solve at all. It is like with the regular expressions and html. Hooking the SR to the language server might be an avenue of possible improvements, but the problem definition on my side is currently too vague to formulate correctly.

exikyut · on Nov 12, 2021

Thanks for the reply!

I've just realized I assumed speech-to-text and text-to-speech were similarly complex and unincentivized toward open development. (I wanted to play around with augmenting speech-to-text for some time.) TIL.

So how is indentation level typically handled?

And what other types of customizations are typically leveraged from an output-device standpoint? (Maybe there's a reference I can google for?)

Comparing the problem space to regular expressions and HTML immediately makes sense, that's a very intuitive way of putting it.

I can relate to being completely stumped about how to replace missing functionality with software, in my case organizing information (which is impaired because of autism). What does the problem space around the text-to-speech vaguery look like?

gostsamo · on Nov 12, 2021

Screen readers have the benefit that they have two parts. One of them is the "explorer" so to call it and the other one is a synthesizer. The explorer hooks to the accessibility services and apis of the host system and produces a text representation of the objects discovered. The synthesizer receives the text representations and maps them to sound output.

The easiest way of customization is to get between those two parts and to convert the representation through some rules, regex for example. That's how my rule with the ":=" operator works.

Indentation could be done either through announcing the number of spaces/tabs in the start of the line, or by defining how many of a symbol a level are and assigning a sound to each level that is played when the level changes. There is an option for doing both.

Screen readers have apis for extensions or scripts for more complex functionality. You can check those of Jaws and NVDA for examples. The apis are rather extensive and they allow for lots of customizations like improving support for a given program or general modification of the sr behavior.

exikyut · on Nov 12, 2021

I see, TIL. Thanks for the explanation.

I was thinking/imagining more along the lines of being able to drill down into phoneme pronunciation, adding micro-pauses to or pitch-bending certain syllables based on rules, for example, or having a firehose of machine readable annotations for a given utterance, including indications of the exact start and end times/samples of individual phonemes in the audio stream so you can then mix your own audio track that has additional augmentations in it into the final output, for example using your own synthesizer to modulate tones in the background representing the current indentation level. Yes, ridiculously complex; but by front-loading that complexity (and winning the data accessibility fights) it would be possible to do a lot of cool stuff...

I understand some people swear by JAWS as the generally best-in-class solution, which has admittedly put me off NVDA as I feel I'd sort of absorb a sort of biased sense of what's possible or how audio output software works in general. I guess I should just install NVDA already since it's the realistic option - if I started testing stuff in JAWS and talking about it the only reasonable assumption people would be able to make was that I was using a copy that had drifted ashore from the high seas, which would be kind of true...

gostsamo · on Nov 12, 2021

Depending on what synthesizer you use, you might be able to get in its internals. Keep in mind that each screen readers can use different synthesizers so that both JAWS and NVDA might use e-speak, windows core voices or something totally different.

In regard to the idea that Jaws is best in class, I'm inclined to disagree. Jaws might be a bit better in MS Office applications and UIA support, I haven't used it for years. However, NVDA has the better web story and until recently it was the screen reader that was actually working with VS Code.

exikyut · on Nov 12, 2021

I see, I'll have to have a deeper look. (I'm on Linux, so I think my options are espeak and possibly Festival.)

Thanks very much for the perspective on NVDA. I'll definitely have to give it a go! I've been interested specifically in Web accessibility for quite a while.

mrlemke · on Nov 10, 2021

Thanks for answering. What is your favorite programming language to work in? If you could use any language you wanted, what would be your top pick?

gostsamo · on Nov 10, 2021

Well, this is highly subjective. I'm paid to do python and node js from time to time and python really rocks for me. Not a small reason why I like python more is for the much better tracebacks. When looked in a console, it is much more pleasant to have the erroring line at the bottom which spares me copying the entire console in npp in trying to find the top of it.

That said, I know many blind devs who do java, c#, swift, c++ and so on. I had bad experiences with ide-s when I was starting to study software development on those languages and it've stayed with me, but it is not universal.

If I had the choice, I would not drop python, but I might add some of the functional languages or rust for the new ways of thinking they might teach me. So far, I've looked at them, but I haven't done nothing serious there.

mrlemke · on Nov 10, 2021

Interesting, thanks for sharing!

IceWreck · on Nov 10, 2021

Hey, thats cool. Thank you.

threatripper · on Nov 10, 2021

So, it's a backdoor that only the blind can see?

alphafredo · on Nov 10, 2021

T￸h￸￸i￸￸￸s c￸o￸￸m￸￸￸m￸￸￸￸e￸￸￸￸￸n￸￸￸￸￸￸t s￸h￸￸o￸￸￸u￸￸￸￸l￸￸￸￸￸d￸￸￸￸￸￸n￸￸￸￸￸￸￸'￸￸￸￸￸￸￸￸t b￸e e￸a￸￸s￸￸￸y t￸o r￸e￸￸a￸￸￸d b￸y s￸c￸￸r￸￸￸e￸￸￸￸e￸￸￸￸￸n r￸e￸￸a￸￸￸d￸￸￸￸e￸￸￸￸￸r￸￸￸￸￸￸s

mwcampbell · on Nov 10, 2021

With NVDA on Windows, when I read the comment normally, it's spelled out. When I read it character by character, I get "symbol FFF8" for each of the hidden Unicode characters. And when I move line by line through NVDA's linear representation of the web page, the hidden characters count against the length of the line for the purpose of word wrapping.

Narrator's behavior is weirder. If I turn on scan mode and move onto the line with the up or down arrow key, Narrator says nothing. If I read the current line with Insert+Up Arrow, Narrator spells it out like NVDA does. When moving character by character, Narrator says nothing for the hidden Unicode characters. And because Narrator doesn't do its own line wrapping but defers to the application to determine what counts as a line, the text only counts as one line.

Disclosure: I used to work on the Windows accessibility team at Microsoft, on Narrator among other things.

geocar · on Nov 10, 2021

It is difficult to see on an iPhone, but it sounds fine in Voiceover.

gostsamo · on Nov 10, 2021

yep, it is not.

WesolyKubeczek · on Nov 10, 2021

The benefit of being sighted is being able to use accessibility features while also being sighted.

Take a peek at those technologies sometimes, those things improve work comfort for everyone.

mwcampbell · on Nov 10, 2021

Still, it would not occur to most sighted programmers to review code using a screen reader. To me, this is another argument for having a truly diverse team (or community, in the case of an open-source project); a blind programmer who's already involved with the project would catch something like this. So in this particular case, blindness is truly not a disability.

marginalia_nu · on Nov 10, 2021

Being able to perceive BOM markers is tantamount to a superpower in programming.