Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Isn't that awesome?

It depends. In a world where commands produce, expect and consume well defined and highly structured data streams, it is actually great. It works well in Windows but only because scripting as the concept is relatively new to the Windows world.

In UNIX, however, it is usually a mess and data extraction using the PowerShell approach would almost never work due spurious characters appearing in the output of a command (for any reason, really) as the UNIX style is precisely this: «cobble things together, use a sledgehammer to make everything work and move on. If it ain't broken then don't fix it». This is why running the output through a «sed» and searching for stable string patterns to cut interesting parts out and then (optionally) running them through cut/awk/et al is the Swiss army knife.

Life has become somewhat easier recently with the advent and more widespread use of JSON, YAML (and to a certain extent XML before) as we now have jq, yq, dasel, mlr (Miller), xmlto etc – to capture, for instance, the JSON formatted output and do something with it just in the same way it is possible in PowerShell whilst also retaining the extensibility (see below) without having to rely on the availability of the source code of the producing utility/app.

> One of the nice things about it is that you can add stuff easily. You don't have to worry that every script that parses the output of your command will now break because you added an extra column or added extra functionality.

You can only add stuff easily if you control (own) the producer of the data stream. If the producer is a third party provided script/app you don't have the source code for, I believe you still have the same breakage problem, however PowerShell experts might want to chime in and correct me.



> In UNIX, however, it is usually a mess and data extraction using the PowerShell approach would almost never work due spurious characters appearing in the output of a command

Where would they come from?

> (for any reason, really) as the UNIX style is precisely this: «cobble things together, use a sledgehammer to make everything work and move on. If it ain't broken then don't fix it». This is why running the output through a «sed» and searching for stable string patterns to cut interesting parts out and then (optionally) running them through cut/awk/et al is the Swiss army knife.

If you're doing that a lot, the code tends to be fragile. If you use cut for instance, it breaks the second the data you're working it changes. Program decided column needs to be 5 characters wider? Now the stuff you're looking for is not there anymore.

That's how you end up with "ifconfig is old, everyone switch to ip now". At some point a program's output may be parsed by so much stuff that any change risks breaking something, and it forces it to remain static for eternity.

> You can only add stuff easily if you control (own) the producer of the data stream. If the producer is a third party provided script/app you don't have the source code for, I believe you still have the same breakage problem, however PowerShell experts might want to chime in and correct me.

No, my point is that the producer is free to improve without risking the consumers. If your command that produced IPv4 addresses adds support for IPv6, it doesn't suddenly break every script that relies on precise lengths, line numbers and columns.

You can also take somebody else's returned data and add extra stuff to it if you want to, just like you could take a bunch of JSON and modify it.


> If the producer is a third party provided script/app you don't have the source code for, I believe you still have the same breakage problem

There are number of ways to prevent that in PowerShell, and by default you usually have to do nothing. But if we imagine that vendor removes some param you relied on in newer versions, you can make it work with old scripts using it by providing so called "proxy function" that returns it back by just adding new stuff on top of existing stuff of underlying function. The similar can be done with objects and properties.


> That's how you end up with "ifconfig is old, everyone switch to ip now".

Let’s not confuse Unix and Linux, shall we? (I agree that any given use of cut -c is probably wrong, but this is a weird conclusion. People just use awk.)


> Where would they come from?

Logging components being the worst offenders immediately spring to me. Especially the ones that receive data points over a network in heterogenous environments. syslog running on a flavour ABC of UNIX receives an input from a locally running app that has a buffer overrun, the app has previously accepted a longer than permitted input and dumped the actual log entry + all trailing the garbage until the app encountered ASCII NULL into syslog. syslog does not care about the correctness of the received entry and, hence, is not affected and, say, diligently dumps it into a locally stored log file. The log parser is now screwed. I can think of similar examples outside log parsers, too, such interoperability related between different systems.

Granted, it has become less of a problem in recent years due the number of UNIX varieties having gone extinct or languages and frameworks considerably improving in the overall quality, but it has not completely disappered. Just less than a couple of years ago, a sloppy developer was dumping the PDF file content (in binary!) into the log file. The logger survived, but the log parser had a severe case of indigestion.

> If you're doing that a lot, the code tends to be fragile. If you use cut for instance, it breaks the second the data you're working it changes. Program decided column needs to be 5 characters wider? Now the stuff you're looking for is not there anymore.

You are absolutely correct. This is why I do not use «cut» and treat all columns as variable length patterns that can be matched using a regular expression in «sed». It is immune to column width changes as long as the column delimiters are known that are used as start and stop characters. «cut» is only useful when parsing fixed-length formats, such as SWIFT MT940/MT942, where the column width is guaranteed to remain fixed. «cut» is just overcomplicates everything and makes scripts prone to unpleasant breakages.

> That's how you end up with "ifconfig is old, everyone switch to ip now". At some point a program's output may be parsed by so much stuff that any change risks breaking something, and it forces it to remain static for eternity.

The cited reason to switch to «ip» was an unrelated to parsing, if I recall it correctly. But otherwise you are correct, the community has a proven track record of resisting changes in the output format due to the risk of breaking gazillions of cobbled together and band-aided shell scripts.

> No, my point is that the producer is free to improve without risking the consumers.

This is not guaranteed. If the producer changes the content of the structure or merely extends an existing structure, then consumers will continue to consume. However, if the producer decides to changes the structure of the output content itself, the breakage problem still persists. Changes to the content structure not infrequently occur when person A hands over to person B something called a piece XYZ that have been working on before, and person B has a different way of doing the same thing.

> If your command that produced IPv4 addresses adds support for IPv6, it doesn't suddenly break every script that relies on precise lengths, line numbers and columns.

Anything that is tightly coupled with precisely defines things is going to break, there is no scripting solution possible, I am afraid. E.g. if the script author relies on the maximum IPv4 address length (AAA.BBB.CCC.DDD) to never exceeed 15 characters or a specific format of IPv4 addresses, the added suppport for and IPv6 addresses appearing in the output will certainly break the script. Again, one possible solution is to treat all values as variable length patterns that are enclosed within delimiters and do not try to interpet the column content.


> The log parser is now screwed. I can think of similar examples outside log parsers, too, such interoperability related between different system

Use a sane system, like journald that can dump logs in JSON and doesn't require you to parse dates by hand. It can also deal with binary content fine, and can store stuff like crash dumps in the log if you want to, and provides functionality to make log parsing easy.

In any case I don't think such a problem should happen in the Powershell model. If you have a log object, then your $log.Message can contain any arbitrary garbage you want, and so you can just go and put that in a database and have that work with no trouble.

> This is not guaranteed.

Of course, but there are better and worse ways of doing things. With the object way, you can extend stuff quite easily with a minimum of care. As opposed to the unix model where you have to think whether somebody, somewhere might be using cut on this and fixing a typo in a descriptive text that adds a character might break something.

> Anything that is tightly coupled with precisely defines things is going to break, there is no scripting solution possible, I am afraid.

Not the best example I admit, but I mean in the case of something like Get-NetIPAddress in powershell, so long the user is either looking for the specific thing they want, or ignores stuff they don't recognize, you very much could add a yet new addressing scheme without running into trouble.

Good design helps a lot there. If you make it clear that stuff has types, and that new types may appear in the future, it's easy for the end user to write code that either ignores anything it doesn't understand or gives a good error message when it runs into it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: