It’s 10 O’Clock. Do you know where your servers are?

Ok…that’s a strange title, but let me finish before you decide its lame. (On a side note, I’m a dad, so my humor tends to run in that direction naturally).

I see lots of examples in books and on the web about how to use pipeline input to functions. I’m not talking about how to implement pipeline input in your own advanced functions, but rather examples of using pipeline input with existing cmdlets.
The examples invariably look like this:

‘server1’,’server2’ | get-somethingInteresting –blah –blah2

This is a good thing. The object-oriented pipeline is in my opinion the most distinguishing feature of PowerShell, and we need to be using the pipeline in examples to keep scripters from falling back into their pre-PowerShell habits. There is an aspect of this that concerns me, though.

How many of you are dealing with a datacenter comprised of two servers? I’m guessing that if you only had two servers, you probably wouldn’t be all gung-ho about learning PowerShell, since it’s possible to manage two of almost anything without needing to resort to automation. Not to say that small environments are a bad fit for PowerShell, but just that in such a situation you probably wouldn’t have a desperate need for it.
How would you feel about typing that example in with five servers instead of two? You might do that (out of stubbornness), but if it were 100, you wouldn’t even consider doing such a thing. For that matter, what made you pick those specific two servers? Would you be likely to pick the same two a year from now? If your universe is anything like mine, you probably wouldn’t be looking at the same things next week, let alone next year.
My point is that while the example does show how to throw strings onto the pipeline to a cmdlet, and though the point of the example is the cmdlet rather than the details of the input, it feels like we’re giving a wrong impression about how things should work in the “real world”.

As an aside, I want to be very clear that I’m not dogging the PowerShell community. I feel that the PowerShell community is a very vibrant group of intelligent individuals who are very willing to share of their time and efforts to help get the word out about PowerShell and how we’re using it to remodel our corners of the world. We also are fortunate to have a group of people who are invested so much that they’re not only writing books about PowerShell, they’re writing good books. So to everyone who is working to make the PowerShell cosmos a better place, thanks! This is just something that has occurred to me that might help as well.

Ok..back to the soapbox.

If I’m not happy about supplying the names of servers on the pipeline like this, I must be thinking of something else. I know…we can store them in a file! The next kind of example I see is like this:

Get-content c:\servers.txt | get-somethingInteresting –blah –blah2

This is a vast improvement in terms of real-world usage. Here, we can maintain a text file with the list of our servers and use that instead of constant strings in our script. There’s some separation happening, which is generally a good thing (when done in moderation :-)). I still see some problems with this approach:

  • Where is the file? Is it on every server? Every workstation? Anywhere I’m running scripts in scheduled tasks or scheduled jobs?
  • What does the file look like? In this example it looks like a straight list of names. What if I decide I need more information?
  • What if I don’t want all of the servers? Do I trust pattern matching and naming conventions?
  • What if the file moves? I need to change every script.

I was a developer for a long time and a DBA for a while as well. The obvious answer is to store the servers in a table! There’s good and bad to this approach as well. I obviously can store more information, and any number of servers. I can also query based on different attributes, so I can be more flexible.

  • Do I really want to manage database connections in every script?
  • What about when the SQL Server (you are using SQL Server, right?) gets replaced. I have to adjust every script again!
  • Database permissions?
  • I have to remember what the database schema looks like every time I write a script?

What about querying AD to get the list? That would introduce another dependency, but with AD cmdlets I should be able to do what I need. But…

  • What directory am I going to hit (probably the same one most of the time, but what about servers in disconnected domains?)
  • Am I responsible for all of the computers in all of the OUs? If not, how do I know which ones to return?
  • Does AD have the attributes I need in order to filter the list appropriately?

At this point you’re probably wondering what the right answer is. The problem is that I don’t have the answer. You’re going to use whatever organizational scheme makes the most sense to you. If your background is like mine, you’ll probably use a database. If you’ve just got a small datacenter, you might use a text file or a csv. If you’re in right with the AD folks, they’ve got another solution for you. They all work and they all have problems. You’ll figure out workarounds for the stuff you don’t like. You’re using PowerShell, so you’re not afraid.

Now for the payoff: Whatever solution you decide to use, hide it in a function.

You should have a function that you always turn to called something like “get-XYXComputer”, where XYZ is an abbreviation for your company. When you write that function, give it parameters that will help you filter the list according to the kinds of work that you’re doing in your scripts. Some easy examples are to filter based on name (a must), on OS, the role of the server (web server, file server, etc.), or the geographical location of the server (if you have more than one datacenter). You can probably come up with several more, but it’s not too important to get them all to start with. As you use your function you’ll find that certain properties keep popping up in where-object clauses downstream from your new get-function, and that’s how you’ll know when it’s time to add a new parameter.

The insides of your function are not really important. The important thing is that you put the function in a module (or a script file) and include it using import-module or dot-sourcing in all of your scripts.
Now, you’re going to write code that looks like this:

Get-XYZComputer –servertype Web | get-somethinginteresting

A couple of important things to do when you write this function. First of all, make sure it outputs objects. Servernames are interesting, but PowerShell lives and breathes objects. Second of all, make sure that the name of the server is in a property called “Computername”. If you do this, you’ll have an easier time consuming these computer objects on the pipeline, since several cmdlets take the computername parameter from the pipeline by propertyname.

If you’re thinking this doesn’t apply to you because you only have five servers and have had the same ones for years, what is it that you’re managing?

  • Databases?
  • Users?
  • Folders?
  • WebSites?
  • Widgets?

If you don’t have a function or cmdlet to provide your objects you’re in the same boat. If you do, but it doesn’t provide you with the kind of flexibility you want (e.g. it requires you to provide a bunch of parameters that don’t change, or it doesn’t give you the kind of filtering you want), you can still use this approach. By customizing the acquisition of domain objects, you’re making your life easier for yourself and anyone who needs to use your scripts in the future. By including a reference to your company in the cmdlet name, you’ve making it clear that it’s custom for your environment (as opposed to using proxy functions to graft in the functionality you want). And if you decide to change how your data is stored, you just change the function.

So…do you know where your servers are? Can you use a function call to get the list without needing to worry about how your metadata is stored? If so, you’ve got another tool in your PowerShell toolbox that will serve you well. If not, what are you waiting for?
Let me know what you think.

–Mike

A PowerShell Puzzler

It has been said that you can write BASIC code in any language. When I look at PowerShell code, I tend to see a lot of code that looks like transplanted C# code. It’s easy to get confused sometimes, since C# and PowerShell syntax are similar, and when you are dealing with .NET framework objects the code is often nearly identical. Most of the time, though, the differences between the semantics are small and there aren’t a lot of surprises.

I recently found one case, however, that stumped me for a while. What makes it more painful is that I found it while conducting a PowerShell training session and was at a loss to explain it at the time. Please read the following line and try to figure out what will happen without running the code in a PowerShell session.

$services=get-wmiobject -class Win32_Service -computername localhost,NOSUCHCOMPUTER -ErrorAction STOP

.
.
.
.
You’re thinking about this, right?
.
.
.
.
.
.
Once you’ve thought about this for a few minutes, throw it in a command-line somewhere and see what it does.

The first thing (I think) that’s important to notice is that the behavior is completely different from anything that you will see in any other language (at least in my experience).

In most languages, if you have an assignment statement and a function call one of three things will happen:

  1. The assignment statement is successful (i.e. the variable will be set to the result of the function call)
  2. The function call will fail (and throw an exception), leaving the variable unchanged
  3. The assignment could fail (due to type incompatibility), leaving the variable unchanged

In PowerShell, though, we see a 4th option.

  • The function call succeeds for a while (generating output) and then fails, leaving the variable unchanged but sending output to the console (or to be captured by an enclosing scope).

Here’s what the output looks like when it’s run (note: I abbreviated some to make the command fit a line):
Puzzler

Not shown in the screenshot is that at the end of the list of localhost services is the expected exception.

How this makes sense is that an assignment statement in PowerShell assigns the final results of the pipeline on the RHS to the variable on the LHS. In this case, the pipeline started generating output when it used the localhost parameter value. As is generally the case with PowerShell cmdlets, that output was not batched. When the get-wmiobject cmdlet tried to use the NOSUCHCOMPUTER value for the ComputerName parameter, it obviously failed and since we specified -ErrorAction Stop, the pipeline execution immediately terminated by throwing an exception. Since we didn’t reach the “end” of the pipeline, the assignment never happens, but there is already output in the output stream. The rule for PowerShell is that any data in the output stream that isn’t captured (by piping it to a cmdlet, assigning it, or casting to [void]) is sent to the console, so the localhost services are sent to the console.

It all makes sense if you’re wearing your PowerShell goggles (note to self—buy some PowerShell goggles), but if you’re trying to interpret PowerShell as any other language this behavior is really unexpected.

Let me know what you think. Does this interpretation make sense or is there an easier way to see what’s happening here?

-Mike

PowerShell-Specific Code Smells: Building output using +=

Before I cover this specific code smell, I should probably explain one thing. The presence of code smells doesn’t necessarily mean that the code in question isn’t functional. In the example I gave last time (the extra long method), there’s no reason to think that just because a method is a thousand lines long that it doesn’t work. There are lots of examples of code that is not optimally coded that works fine nonetheless. The focus here is that you’re causing more work: Either up-front work in that the code is longer or more complicated than necessary, or later on, when someone (maybe you?) needs to maintain the code.

With that said, we should talk about aggregating output using a collection object and the += compound assignment operator. This is such a common pattern in programming languages that it’s a hard thing not to do in PowerShell, but there are some good reasons not to. To help understand what I mean, let’s look at some sample code.

function get-sqlservices {
param($computers)
    foreach ($computer in $computers){
           $output+=get-wmiobject -class Win32_Service -filter "Name like 'SQL%'"
    }
    return $output
}

$mycomputers='localhost','127.0.0.1',$env:COMPUTERNAME
measure-command{
get-sqlservices -computers $mycomputers | select -first 1
}

Before we discuss this code let me be clear: this is not great code for several reasons. For the purposes of discussion, though, let’s just look at how the output is handled. As I mentioned, this is how you’d do something like this in most programming languages and it works fine. On my laptop it ran in 723 milliseconds. If we change the list of computers to a longer list it takes considerably longer:

$mycomputers=('localhost','127.0.0.1',$env:COMPUTERNAME) * 100

Days : 0
Hours : 0
Minutes : 0
Seconds : 51
Milliseconds : 486
Ticks : 514863437
TotalDays : 0.000595906755787037
TotalHours : 0.0143017621388889
TotalMinutes : 0.858105728333333
TotalSeconds : 51.4863437
TotalMilliseconds : 51486.3437

Changing the function to send the output to the pipeline looks like this:

function get-sqlservices2 {
param($computers)
    foreach ($computer in $computers){
           get-wmiobject -class Win32_Service -filter "Name like 'SQL%'"
    }
}

$mycomputers=('localhost','127.0.0.1',$env:COMPUTERNAME) * 100
measure-command{
get-sqlservices2 -computers $mycomputers | select -first 1
}

Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 478
Ticks : 4782609
TotalDays : 5.53542708333333E-06
TotalHours : 0.00013285025
TotalMinutes : 0.007971015
TotalSeconds : 0.4782609
TotalMilliseconds : 478.2609

The code doesn’t look much different. The only changes are that we’re not assigning the output of the get-wmiobject cmdlet to anything and we don’t have an explicit return. This is a point of confusion to most people who come to PowerShell from a traditional imperative language (C#, Java, VB, etc.). In a PowerShell script, any value that isn’t “captured” either by assigning it to a variable or piping it somewhere is added to the output pipeline. The “return value” of the function is the combination of all such values and the value in a return statement (if present). So in this case, the output of the new function is the same as the output of the second. Changing it to use the pipeline didn’t change the value at all. So why is this considered a code smell? The reason is that the second script runs faster than the first did. In fact, it runs faster with 300 computers (100 copies of the list of 3) than the first did with 3 computers. Why is it so much faster? In PowerShell 3.0, the implementation of select-object was changed to stop the pipeline after the number of objects requested in the -first parameter. In other words, even though we passed 300 servers to the function, it stopped after it got the first result back from get-wmiobject from the first server.
You’re not always going to be using -first, but even when you’re not the values in the pipeline are available to downstream cmdlets before the function is done completing (if you don’t use +=). If you’re simply sending the output to the console you will begin to see the results immediately rather than having to wait. Another issue arises when your aggregating function has throws an exception before it’s done. If you didn’t hit the return statement, you won’t see any results at all. Being able to see the results up to the point of the error will probably help you track down where the error was. What if there were thousands of servers (or your dataset was considerably larger for some other reason)? Your process would eat memory as it built a huge collection. With pipeline output there’s no reason for the process to be using much memory at all. Finally, with pipeline output there’s one less thing to keep track of. One less variable means one less place to make a mistake (accidentally use = at some point instead of +=, misspell the variable name, etc.).

I hope you can see that with PowerShell, following this common pattern is not a good thing.

Let me know what you think.

Mike

PowerShell-Specific Code Smells

A code smell is something you find in source code that may indicate that there’s something wrong with the code. For instance, seeing a function that is over a thousand lines gives you a clue that something is probably wrong even without looking at the specific code in question. You could think of code smells as anti-“Best Practices”. I’ve been thinking about these frequently as I’ve been looking through some old PowerShell code.

I’m going to be writing posts about each of these, explaining why they probably happen and how the code can be rewritten to avoid these “smells”.

A few code smells that are specific to PowerShell that I’ve thought of so far are:

  1. Missing Param() statements
  2. Artificial “Common” Parameters
  3. Unapproved Verbs
  4. Building output using +=
  5. Lots of assignment statements
  6. Using [object] parameters to allow different types

Let me know if you think of others. I’ll probably expand the list as time goes on.

-Mike

Learn From My Mistake – Export-CSV

You’ve probably been told all your life that you should learn from your mistakes. I agree with this statement, but I prefer always to learn from other people’s mistakes. In this post, I’ll give you an opportunity to learn a bit more about PowerShell by watching me mess up. What a deal!

I was helping a colleague with a script he was writing. His script wasn’t very complicated. It simply read in a list of computernames from a text file and tried to access them via WMI. He wanted the script to keep track of the ones that were inaccessible and output that list to a second text file for later review.

I was helping him via IM (not the best approach, but we were both busy with other things), and what we came up with was something like this.

$errorservers=@()
$servers=get-content c:\temp\servers.txt
foreach($server in $servers){
     try {
        #Do some stuff here
     } catch {
       $errorServers+=$server
     }
}
$errorServers | export-csv c:\temp\ErrorServers.txt

Imagine my surprise when he emailed me the error output (which we both were expecting to be a list of unreachable servers) and it looked like this:

#TYPE System.String
"PSPath","PSParentPath","PSChildName","PSDrive","PSProvider","ReadCount","Length"
"C:\temp\computers.txt","C:\temp","computers.txt","C","Microsoft.PowerShell.Core\FileSystem","2","14"

When I was able to observe the script running, I had him output the value of $errorServers, and when he did, the output looked exactly like we had expected. Changing the last line to

$errorServers | out-file c:\temp\ErrorServers.txt

fixed the script as far as he was concerend. But where did the “mystery columns” come from?

It turns out when you read a file with get-content, PowerShell “decorates” the strings it returns with information about where the string came from (what file, folder, drive, provider, line, and length of the read data). When we sent the string to the console for output, the console printed the value of the string (since that’s how the console outputs strings). However, when we sent the string to a file with export-csv, the cmdlet looked for properties that made sense to be columns and found the “decorations” that were provided by get-content. I had no idea that PowerShell was doing this because I don’t generally look for properties on strings (as a side note, the $profile variable set by PowerShell also has some “hidden” properties).

To see this more clearly, I did the following:

PS> get-content C:\temp\computers.txt | select -first 1 | format-list *

kyndig

PS> get-content C:\temp\computers.txt | select -first 1 | format-list * -force

PSPath : C:\temp\computers.txt
PSParentPath : C:\temp
PSChildName : computers.txt
PSDrive : C
PSProvider : Microsoft.PowerShell.Core\FileSystem
ReadCount : 1
Length : 6

I was able to learn something useful from my mistake this time. Hope you did, too.

–Mike

Programming Entity Framework: Code First by Lerman and Miller; O’Reilly Media

Programming Entity Framework Code First (Cover)

Programming Entity Framework Code First


Ok…my first book review. Programming Entity Framework: Code First is a short book (under 200 pages) by Julia Lerman and Rowan Miller which covers the “Code First” method of developing Entity Framework development. I’m not really plugged in to the Entity Framework community, but I recognized Julia Lerman’s name from the many PluralSight courses and books that she’s authored. I didn’t recognize Rowan Miller, but at the time of writing he was the program manager for the Entity Framework team at Microsoft, so I’m sure he brought a lot to the book. One thing to note is that it was published in 2010 and was written using Visual Studio 2010 and Entity Framework 4. Working along with the text I was in Visual Studio 2013 Preview and Entity Framework 5. There were slight differences, but nothing that would really lessen the value of the book.

Coming from a DBA and development background, I have mixed feelings about ORMs. First of all, I get the whole object/relational impedance mismatch thing. Developers in general don’t like writing the data-access code for apps. On the other hand, I don’t mind writing SQL or data-access code that much, and often can find performance benefits from hand-coding the SQL. I have supported developers who have used Entity Framework enough that I know that it does a pretty good job of generating rational SQL under normal circumstances, and have only seen a few cases where it was a contributing factor to a performance “incident”. That being said, having an ORM generate SQL against a database which DBAs designed is not at all the same thing as having the ORM generate the SQL and the database. My curiosity about this scenario is what leads to this book review.

As I mentioned in the opening, this is a short book (listed at 194 pages, but my PDF only has 192, including all of the “intentionally left blank” and non-content pages). The reason that it is so short is that most of the explanation of Entity Framework programming is left to Lerman’s earlier “Programming Entity Framework” which is a much heftier tome, at over 900 pages. The fact that it’s a short book is in its favor, however. The Entity Framework team have done their job well in that the Code First development method is not very complicated (at least to begin). The material in the book falls into 3 parts: the introduction (chapters 1 and 2), the catalog of annotations and fluent configurations (chapters 3 through 5), and more advanced topics (chapters 6 through 8).

The introduction gives you a history of Entity Framework, emphasizing the fact that developers were bound to the database (either an actual database or a logical model of the database) in earlier development models. It then proceeds to show how the Code First model allows the developer to use POCOs (Plain-Old CLR Objects). The objects used are taken from the application used in “Programming Entity Framework” and give a realistic baseline for the conversation as the book proceeds. The tone in this section is very casual and is presented as a kind of a tutorial. The authors are very good to warn the reader when code they are presenting will cause issues for upcoming steps, which is a nice detail. Many books aren’t careful in this and lead to lots of confusion when code subsequently fails to compile or the results don’t match the text.

In the second section, the authors cover the variety of configurations which can be made using either annotations or fluent configuration. The presentation here has the feel of a catalog: listing each type of configuration, how it’s accomplished, and what options are available. There is still some tutorial narration alongside the catalog, but reading it didn’t make me want to try the code, rather I just took stock of what was available.

The third section was presented in a topic-by-topic basis, as the methods discussed varied from one to the next. There were much longer code samples, and the applications were much more advanced. Again, I really didn’t feel the need to try each bit of code. The discussion was enough for my purposes.

All in all, I was very impressed. The book did a great job of making me aware of the capabilities and limitations of Code First development, although I believe most of the limitations had been addressed by the team since the book’s publication. The writing was clear and the examples seemed to be very well chosen. I would recommend this book without reservation to either developers who are interested in Entity Framework or for DBAs who are skeptical that a tool can generate a database with the complexity that they’d prefer.

I’m looking forward to watching some of the authors’ PluralSight courses to get up to speed on improvements.

Mike

Disclosure of Material Connection: I received one or more of the products or services mentioned above for free in the hope that I would mention it on my blog. Regardless, I only recommend products or services I use personally and believe will be good for my readers. I am disclosing this in accordance with the Federal Trade Commission’s 16 CFR, Part 255: “Guides Concerning the Use of Endorsements and Testimonials in Advertising.”

PowerShell Identity Function (revisted)

One of my earliest (posts was about implementing an “identity function” in PowerShell to assist in typing lists of strings without bothering with commas.

The function I presented was this:

function identity{
    return $args
}

I recently saw (in a Scripting Games entry) that you don’t need to define your own function for this. write-output works just fine.

For example:

$values=write-output apple orange banana

Mike

PowerShellStation.com update

I just changed the syntax highlighting used by the site (to SyntaxHighlighter Evolved). One reason is that it’s much easier to use.

I have tried to go through the older posts and update the markup to include the proper codes to highlight using the new plugin. If you notice one that doesn’t look quite right, let me know.

Mike

Best Practices Update and some Scripting Games thoughts

Just a quick note to let you know that I haven’t given up on writing about PowerShell best practices. A few things which have derailed my thinking.

  • My first “best practice” I thought was a no-brainer. After I wrote it I got thinking about what actual benefit there was to sticking to single-quotes rather than using double-quotes. Perhaps it makes sense to use double quotes all the time unless you don’t want interpolation and control characters.
  • The 2013 Scripting Games started. Reading the comments by the community regarding the scripts has been a real eye-opener about how people feel about different topics. I think I’ll probably wait until the games are over and try to compile a list of what everyone seems to agree on.

With regard to the Scripting Games, if you haven’t gotten involved with them it’s not too late. There are still 2 events left (I think). Even if you don’t feel up to competing, looking at over a hundred different implementations of the same problem will definitely get your brain working on some new stuff to try in your scripts. Maybe some technique you hadn’t really used before (splatting? parameter validation? pipeline input? comment-based help?). Take some time to read through some of the entries and at the very least you’ll start to develop an opinion on what “good” means in a script. If you do enter, don’t worry too much about the judging. The point values have been “evolving” over time and the important thing (to me) is the constructive comments I’ve received on my scripts. Some of the comments haven’t been accurate (or helpful), but hey, you get what you pay for.

My hat is definitely off to Don Jones and the rest of the PowerShell.org folks for hosting this. If you’ve been watching the forums at all, you can tell that they’re working hard to make it successful. If you’ve looked at scripts, you know that they’ve added a lot of awesome functionality on the judging side for how the commenting and scoring is handled.

Looking forward to event 5.

Mike

PowerShell Best Practice #1 – Use Single Quotes

I’m going to kick off this series with a no-brainer.

In PowerShell, there are 2 ways to quote strings, using single-quotes (‘) or double-quotes (“). This is probably not a surprise to you if you’ve seen PowerShell scripts before.

A “best practice” in PowerShell is that you should always default to using single-quotes unless you specifically need to use double-quotes. The reasons to use double-quotes are:

  • To enable substitution in the string (variables or expressions)
  • To utilize escape sequences in the string (introduced by backtick `)
  • To simplify embedding single-quotes in the string (without doubling the single quotes)

I have to admit, I find myself getting lazy about this and switching between types of quotes with no rhyme or reason. In fact, sometimes I see that I’m using double-quotes as the default just in case I end up doing variable substitution. In my opinion, however, this is not something I should be doing.

Here’s a post from Don Jones about quoting.

Anyone disagree with this one?