Pipeline Output
If you’ve used PowerShell for very long, you know how to get values of of a pipeline.
$values= a | b | c
Nothing too difficult there.
Where things get interesting is if you want to get data from the middle of the pipeline. In this post I’ll give you some options (some better than others) and we’ll look briefly at the performance of each.
Method #1
First, there’s the lovely and often overlooked Tee-Object cmdlet. You can pass the name of a variable (i.e. without the $) to the -Variable parameter and the valu
es coming into the cmdlet will be written to the variable.
For instance:
Get-ChildItem c:\ -Recurse | Select-Object -Property FullName,Length | Tee-Object -Variable Files | Sort-Object -Property Length -Descending
After this code has executed, the variable $Files will contain the filenames and lengths before they were sorted. To append the values to an existing variable, include the -Append switch.
Tee-Object is easy to use, but it’s an entire command that’s essentially not doing anything “productive” in the pipeline. If you need to get values from multiple places in the pipeline, each would add an additional Tee-Object segment to the pipeline. Yuck.
Method #2
If the commands you’re using in the pipeline are advanced functions or cmdlets (and you’re only writing advanced functions and cmdlets, right?), you can use the -OutVariable common parameter to send the output of the command to a variable. Just like with Tee-Object, you only want to use the name of the variable.
If you’re dealing with cmdlets or advanced functions, this is the easiest and most flexible solution. Getting values from multiple places would just involve adding -OutVariable parameters to the appropriate places.
Get-ChildItem c:\ -Recurse | Select-Object -Property FullName,Length -OutVariable Files | Sort-Object -Property Length -Descending
This has the benefit of one less command in the pipeline, so that’s a nice bonus. If you want to append to an existing variable, here you would use a plus (+) in front of the variable name (like +Files).
Method #3
This method is simply to break the pipeline at the point you want to get the values and assign to a variable. Then, pipe the variable to the “remainder” of the pipeline. Nothing crazy. Here’s the code.
$files=Get-ChildItem c:\ -Recurse | Select-Object -Property FullName,Length $files | Sort-Object -Property Length -Descending
If you want to append, you could use the += operator instead of the assignment operator.
If you want to capture multiple “stages” in the pipeline, you could end up with a bunch of assignments and not much pipeline left.
Method #4
This method is similar to method #3, but uses the fact that assignment statements are also expressions. It’s easier to explain after you’ve seen it, so here’s the code:
($files=Get-ChildItem c:\ -Recurse | Select-Object -Property FullName,Length) | Sort-Object -Property Length -Descending
Notice how the first part of the pipeline (and the assignment) are inside parentheses? The value of the assignment expression is the value that was assigned, so this has the benefit of getting the variable set and passing the values on to the remainder of the pipeline.
If you want to get multiple sets of values from the pipeline, you would need to nest these parenthesized assignments multiple times. Statements like this can only be used as the first part of a pipeline, so don’t try something like this:
# THIS WON'T WORK!!!!! Get-ChildItem c:\ -Recurse | Select-Object -Property FullName,Length) | ($Sortedfiles=Sort-Object -Property Length -Descending)
Performance
I used the benchmark module from the gallery to measure the performance of these 4 techniques. I limited the number of objects to 1000 and staged those values in a variable to isolate the pipeline code from the data-gathering.
$files=dir c:\ -ErrorAction Ignore -Recurse | select-object -first 1000 $sb1={$files | select-object FullName,Length -OutVariable v1 | sort-object Length -Descending} $sb2={$files | select-object FullName,Length | tee-object -Variable v2| sort-object Length -Descending} $sb3={$v2=$files| select-object FullName,Length;$files | sort-object Length -Descending} $sb4={($v2=$files| select-object FullName,Length)|sort-object Length -Descending} Measure-These -ScriptBlock $sb1,$sb2,$sb3,$sb4 -Count 100 | Format-Table
Title/no. Average (ms) Count Sum (ms) Maximum (ms) Minimum (ms) --------- ------------ ----- -------- ------------ ------------ 1 98.60119 100 9860.119 131.7581 87.6203 2 120.32475 100 12032.4754 150.4985 104.6586 3 100.92144 100 10092.1436 132.2665 90.0685 4 98.48383 100 9848.383 135.5229 84.7717
The results aren’t particularly interesting. -OutVariable is about 20% slower than the rest, but other than that they’re all about the same. I’m a little bit disappointed, but 30% isn’t that big of a difference to pay for the cleaner syntax and flexibility (in my opinion).
BTW, those timings are for Windows PowerShell 5.1. The numbers for PowerShell 6.0 (Core) are similar:
Title/no. Average (ms) Count Sum (ms) Maximum (ms) Minimum (ms) --------- ------------ ----- -------- ------------ ------------ 1 120.97498 10 1209.7498 136.1319 112.0041 2 139.9865 10 1399.865 147.659 132.1466 3 128.86957 10 1288.6957 148.0096 115.0421 4 119.44978 10 1194.4978 142.9651 109.1328
Here we see slightly less spread (17%), but all of the numbers are a bit higher.
I’ll probably continue to use -OutVariable.
What about you?
Pingback: Getting Data From the Middle of a PowerShell Pipeline - PowerShell Station - How to Code .NET
Pingback: Dew Drop - June 25, 2018 (#2752) - Morning Dew