We want to extract the dates from the following texts, the dates can be like “09/13/2021” or “09-13-2021”, but not “09-13/2021” or “09/13/2021”.
For example:
input str200 address
"War of 1812 Peace Treaty was signed on 12/24/1814"
"World's first movie theater opened in Paris on 12/25-1776"
"Texas A&M University went coed on 04/27/1963"
"The most recent leap second was added on 12-31/2016"
end
The trick to accomplish this is to capture the match of the first separator [/-] into a group (…), then referencing it using \# (where # is the index of the group) to match the second occurrence.
. gen date = ustrregexs(0) if ustrregexm(history, "\d{2}([/-])\d{2}\1\d{4}")
(2 missing values generated)
.
. list
+------------------------------------------------------------------------+
| history date |
|------------------------------------------------------------------------|
1. | War of 1812 Peace Treaty was signed on 12/24/1814 12/24/1814 |
2. | World's first movie theater opened in Paris on 12/25-1776 |
3. | Texas A&M University went coed on 04/27/1963 04/27/1963 |
4. | The most recent leap second was added on 12-31/2016 |
+------------------------------------------------------------------------+