contact@consultdmw.com

How to Use Excel to Extract Date from String Muddling Text and Dates

Last updated on 2024-05-02.


Identifying a Problem

In many of the spreadsheets I've been asked to upgrade, users have been obliged to use a single cell to hold all sorts of information that should have been distributed across a number of cells in the row.

In correcting matters — adding additional columns — there's the challenge of splitting the original input into the new cells.

Often it's a date that needs to be isolated from the text string that it occupies. Its location in the string, and even its format, may make isolatiing the date an extra challenge, as in this selection:

  • TPB/SG 14/05/2017
  • 29/04/2017 AH
  • SG/DP phone Jill first
  • 13/04/2017 Bob Smith
  • Bob Smith
  • | Mae Greene
  • RB toDp
  • BK to PCAT
  • Steve Dicks/SG
  • GG +4(0)632900843
  • Andy Pink & Gina Blue
  • Pete 5/9
  • LMW

No doubt there are other strings that would be useful for testing formulas and code. Your speadsheets might contain variations that you'd need to include for testing.


Strategy and Limitations

These are the conditions I'm proposing for deciding how our formula determines there's a date to lift out of a text string:

  1. Text is identified as containing a date when there's a character sequence of ?#/?#/??## (# is a digit, ? is a digit or is missing)
  2. The supposed date is bounded by spaces, with the one exception of the date occupying first place in the string
  3. It's too risky making the assumption that 5/9, for example, represents a date. So character sequences bounded by spaces and having one / only are not treated as dates
  4. If a string appears to contain more than one date, then its the first one that's dug out in preference
  5. If the ?#/?#/??## sequence suggests a date but that is not an actual date, then the formula ignores it.

Further down the page we'll use VBA to lift some of these conditions, to make a function that is more general purpose that our formula.


Making a Start on dd/mm/yyyy Dates

Here we use Excel's SEARCH function to locate the position at which a /??/ character sequence, if any, is included in the the string:

Extraction of date  from text string in Excel

We've identified the three text strings that possibly contain a date, and marked up the other ones that certainly do not. Because we know that a dd/mm/yyyy contains 10 characters we can isolate them:

Excel SEARCH and MID functions in formulas

Thus isolated, the dates are in text form. They're not dates that Excel would recognise as such.


Convert Date Text Strings into Excel Dates

Excel's DATEVALUE function converts a data text into a numeric representation of a date:

DATEVALUE function in Excel

Excel works dates based on its built-in calendar, which starts in the year 1900 and extends to year 9999. 01/Jan/1900 is day one, 02/Jan/1900 is day two, and so on, adding one to the number for each day further on.

The General number format is applied to Column C. That's why Excel displays the results in cells C2, C3 and C5 as numbers. These represent the number of days elapsed since the start of Excel's calendar.


Managing the #VALUE! Warning

We use the IFERROR function to suppless the #VALUE! formula warning (more on formula warnings in Error Warnings)

ISERROR function in Excel formula

Having investigated techniques for isolating dates, we combine them into a single formula:

IFERROR, DATEVALUE and MID functions in Excel formula

Changing the number format applied to Column B to Short Date to make Excel display those numbers as what we recognise as dates:

Application of Excel Short Date format

The formula works for the ##/##/#### dates in the text strings I've illustrated. Please test it your own strings before you come to rely upon the formula.

Also, you should test the formula, and possibly modify it, on your dates if they are not, like mine, in the dd/mm/yyyy format. Go to Window's Settings and choose Time & Language to check your date formats:

Windows date formats

The above formats are the ones Excel is adopting for the spreadsheets illustrated in this Excel Tip.


Other Date Formats

Our formula successfully isolates ##/##/#### sequences from the text string. But now what we need to do is extend it to accommodate other legitimate sequences. Trying it out on some of these:

Date format issues in Excel spreadsheet

We're going to need a lot of If ... ElseIf ... Else stuff if we're going to make the formula general purpose. Accepted, Excel 2016 can accommodate up to 32,767 characters in a cell (early versions of Excel, 255), but the task of copying this formula and adapting it for other worksheets, without fault, would make my head hurt. So, at this point, I turn to VBA.


Setting the Particulars for Our VBA Function

These are the parameters for our VBA function that is to isolate a potential date from a text string:

  1. The text string of potential date must contain two forward slashes (/)
  2. All forms ?#/?#/??## must be accommodated (# is a digit, ? is a digit or is omitted)
  3. If the text string contains more than one date, then the function returns the first of them.

These are circumstances that I can think of in which the code should not be allowed to extract a date:

  1. Strings that contain a profusion of forward slashes, for example P/BS/G 1/12/1/1914/12/2018. Though it could extract potential dates (P/BS/G 1/12/1/1914/12/2018 or P/BS/G 1/12/1/1914/12/2018), I can't convince myself that any such subjective interpretation should be left open to the function
  2. The typo “O” (Oscar) instead of a zero, per 29/O4/2017 AH. Tempting, but how far should double-guessing go?
  3. What is almost certainly meant to be a date, but has been mis-typed, for example 29/02/2017 AH. Same reasoning as §2

Summarising the above:

  1. The text string must contain a sub-string of the form ?#/?#/??##
  2. The sub-string must be bounded by spaces, except at the beginning (a space to the right) and the end of the text string (a space to the left). Though restrictive, I'm content with this condition because in the majority of spreadsheets I've worked on the users have been methodical in using spaces to distinguish dates from other text.

Additional Challenges for the VBA

This is the set of text strings that I'm working with in order to demonstrate the VBA:

  • TPB/SG 14/05/2018
  • TPB/SG 14/05/17
  • TPB/SG 1/05/2017
  • TPB/SG AH14/05/2017
  • 29/04/2017AH
  • 29/04/2017 AH
  • 2/04/2017 AH
  • 02/04/2017 AH
  • TPB/SG 14/05/17AH
  • TPB/SG AH1/05/2017
  • TPB/SG 14/5/2017 AH
  • Pete 5/9/19
  • Pete 5/9 AH
  • LMW
  • TPB/SG14/05/17AH
  • P/BS/G 1/12/1/1914/12/2018
  • P/BS/G 1/12/1/19 14/12/2018
  • P/BS/G 01/02/1914/12/2018
  • P/BS/G 1/12/1928 14/12/2018
  • P/BS/G 01/02/1914// 12/1
  • P/BS/G 01/02/1914A2019

You'll need to add patterns that I've not included when you try the VBA function on your strings. But what I hope you see I'm doing is provide strings that might challenge any function you might be considering for the job.


VBA

This is my current VBA function:

Excel VBA function for date extraction from text string

This is the function applied to the list of text strings:

Application of VBA function in Excel formula

What we see is the function returning a serial number for any text that it recognises as a date and a zero for any string that it considers does not contain a date. Now we need to turn the serial number into a recognisable date and to suppress the zero:

Custom dmwGetDate() function for Excel

Column C has the Short Date format applied to its cells. Now we can replace Columns B and C with a single, new column B:

Application of dmwGetDate() formula


Your Support for DMW TIPS

Please support this website by making a donation to help keep it free of advertising and to help towards cost of time spent adding new content.

To make a contribution by PayPal in GBP (£ sterling) —

To make a contribution by PayPal in USD ($ US) —

Thanks, in anticipation.

“Dates are great because they're a
high-energy fruit with a lot of fiber.”

Hannah Bronfman, American entrepreneur.