Using PowerShell To Split A String Without Losing The Character You Split On

Last week, I wrote a post on the difference between .split() and -split in PowerShell. This week, we’re going to keep splitting strings, but we’re going to try to retain the character that we’re splitting on. Whether you use .split() or -split, when you split a string, it takes that character and essentially turns it into the separation of the two items on either side of it. But, what if I want to keep that character instead of losing it to the split?

Does A String Start Or End In A Certain Character?

Can you tell in PowerShell if a string ends in a specific character, or if it starts in one? Of course you can. Regex to the rescue!

It’s a pretty simple task, actually. Consider the following examples

In the first two examples, I’m checking to see if the string ends in a backslash. In the last two examples, I’m seeing if the string starts with one. The regex pattern being matched for the first two is .+?\$ . What’s that mean? Well, the first part .+? means “any character, and as many of them as it takes to get to the next part of the regex. The second part \\ means “a backslash” (because \ is the escape character, we’re basically escaping the escape character. The last part $ is the signal for the end of the line. Effectively what we have is “anything at all, where the last thing on the line is a backslash” which is exactly what we’re looking for. In the second two examples, I’ve just moved the \\ to the start of the line and started with ^ instead of ending with $ because ^ is the signal for the start of the line.

Now you can do things like this.

Here, I’m checking to see if the string ‘bears’ ends in a backslash, and if it doesn’t, I’m appending one.

Quick Tip: Validate The Length Of An Integer

A little while ago, I fielded a question in the PowerShell Slack channel which was “How do I make sure a variable, which is an int, is of a certain length?”

Turns out it’s not too hard. You just need to use a little regex. Consider the following example.

$v6 is an int that is six digits long. $v2 is an int that is only two inches long. On lines three and four, we’re testing to see if each variables match the pattern ‘^\d{6}$’ which is regex speak for “start of the line, any digit, and six of them, end of the line”. The first one will be true, because it’s six digits, and the second one will be false. You could also use something like ‘^\d{4,6}$’ to validate that the int is between four and six digits long.


Quick Tip: PowerShell Regex To Get Value Between Quotation Marks

If you’ve got a value like the following…

… that maybe came from the body of a file, was returned by some other part of a script, etc., and you just want the portions that are actually between the quotes, the quickest and easiest way to get it is through a regular expression match.

That’s right, forget splitting or trimming or doing other weird string manipulation stuff. Just use the [regex]::matches() feature of PowerShell to get your values.

Matches takes two parameters. 1. The value to look for matches in, in this case the here-string in my $s variable, and 2. The regular expression to be used for matching. Since Matches returns a few items, we are making sure to just select the value for each match.

So what is that regex doing? Let’s break it down into it’s parts.

  • (?<=\”) this part is a look behind as specified by the ?<= part. In this case, whatever we are matching will come right after a quote. Doing the look behind prevents the quotation mark itself from actually being part of the matched value. Notice I have to escape the quotation mark character.
  • .+? this part basically matches as many characters as it takes to get to whatever the next part of the regex is. Look into regex lazy mode vs greedy mode.
  • (?=\”) this part is a look ahead as specified by the ?= part. We’re looking ahead for a quotation mark because whatever comes after our match is done will be a quotation mark.

So basically what we’ve got is “whatever comes after a quotation mark, and as much of that as you need until you get to another quotation mark”. Easy, right? Don’t you love regex?