Difference between revisions of "Regular expressions"
m (→Library of working regular expressions) |
m (→Library of working regular expressions) |
||
(9 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | Since | + | Since Tabbles 5.2.31, you can use regular expressions within the auto-tagging rules to tag files based on their name, path and content. |
==What are the regular expressions and how do they work?== | ==What are the regular expressions and how do they work?== | ||
− | - on [http://en.wikipedia.org/wiki/Regular_expression wikipedia] | + | - Regex on [http://en.wikipedia.org/wiki/Regular_expression wikipedia] |
- Page on [http://www.codeproject.com/KB/dotnet/regextutorial.aspx Codeplex] | - Page on [http://www.codeproject.com/KB/dotnet/regextutorial.aspx Codeplex] | ||
− | - Online regular expression | + | - Online regular expression editors [https://regexr.com/ RegExr] and [https://regex101.com/ regex101] |
- Download [http://www.ultrapico.com/ExpressoDownload.htm Expresso], a regular expression editor | - Download [http://www.ultrapico.com/ExpressoDownload.htm Expresso], a regular expression editor | ||
Line 14: | Line 14: | ||
- Regular expression [http://regexlib.com/DisplayPatterns.aspx?cattabindex=0&categoryid=1&p=1 library] (warning: the syntax could be different... the one on the MSDN page is the one that should work) | - Regular expression [http://regexlib.com/DisplayPatterns.aspx?cattabindex=0&categoryid=1&p=1 library] (warning: the syntax could be different... the one on the MSDN page is the one that should work) | ||
+ | |||
+ | ==Do I have to write them myself?== | ||
+ | |||
+ | You will hardly need to write any regex yourself, because there are a multitude of regex that have been written and tested out there. Your best chance is to simply google "regex {whatever I need}" and see what comes up. Below a few useful resources: | ||
+ | |||
+ | - [https://ipsec.pl/data-protection/2012/european-personal-data-regexp-patterns.html European personal data regexp patterns] | ||
+ | |||
+ | - Searchable Collection of regex [http://regexlib.com/Default.aspx RegExLib] | ||
+ | |||
==How do I use them in Tabbles?== | ==How do I use them in Tabbles?== | ||
Line 19: | Line 28: | ||
Go to: '''Tools > Auto-tagging rules > New''' and edit the window to something like this: | Go to: '''Tools > Auto-tagging rules > New''' and edit the window to something like this: | ||
− | http:// | + | http://tabbles.net/images/wiki/T5-Auto-Tagging-RegEx.png |
Line 70: | Line 79: | ||
|- | |- | ||
| <nowiki>.*(201\d).*</nowiki> || matches 2010 to 2019 (useful for years). Change "201" to "200" to match 2000 to 2009 || Andrea | | <nowiki>.*(201\d).*</nowiki> || matches 2010 to 2019 (useful for years). Change "201" to "200" to match 2000 to 2009 || Andrea | ||
+ | |- | ||
+ | | <nowiki>((67\d{2})|(4\d{3})|(5[1-5]\d{2})|(6011))(-?\s?\d{4}){3}|(3[4,7])\d{2}-?\s?\d{6}-?\s?\d{5}</nowiki> || Credit cards (works with the major cards, with "-" or without) || Andrea | ||
|- | |- | ||
| <nowiki>.*C\d{3}.*</nowiki> || Matches files with a path like "D:\Customers\C001" to "C999" or "...\stuff_C001_stuff" (to ...C999) || Andrea | | <nowiki>.*C\d{3}.*</nowiki> || Matches files with a path like "D:\Customers\C001" to "C999" or "...\stuff_C001_stuff" (to ...C999) || Andrea | ||
Line 77: | Line 88: | ||
| <nowiki>[0-3][0-9][0-1][1-9]\d{2}-\d{4}?[^0-9]*</nowiki> || Danish CPR Number (DDMMYY-NNNN, like 310180-1234) || Andrea | | <nowiki>[0-3][0-9][0-1][1-9]\d{2}-\d{4}?[^0-9]*</nowiki> || Danish CPR Number (DDMMYY-NNNN, like 310180-1234) || Andrea | ||
|- | |- | ||
− | | <nowiki>.*( | + | | <nowiki>.*((0[1-9]|[12]\d|3[01])(0[1-9]|1[0-2])\d{3}\d{2}\d{4}).*</nowiki> || Unique Master Citizen Number(JMBG), based on [https://en.wikipedia.org/wiki/Unique_Master_Citizen_Number wikipedia] || Andrea |
+ | |- | ||
+ | | <nowiki>(^|\s)(00[1-9]|0[1-9]0|0[1-9][1-9]|[1-6]\d{2}|7[0-6]\d|77[0-2])(-?|[\. ])([1-9]0|0[1-9]|[1-9][1-9])\3(\d{3}[1-9]|[1-9]\d{3}|\d[1-9]\d{2}|\d{2}[1-9]\d)($|\s|[;:,!\.\?])</nowiki> || USA Social Security Number (It recognizes the formats 123456789, 123 45 6789, 123-45-6789) || Andrea | ||
|- | |- | ||
| <nowiki>.*(?:(?:[B-DF-HJ-NP-TV-Z]|[AEIOU])[AEIOU][AEIOUX]|[B-DF-HJ-NP-TV-Z]{2}[A-Z]){2}[\dLMNP-V]{2}(?:[A-EHLMPR-T](?:[04LQ][1-9MNP-V]|[1256LMRS][\dLMNP-V])|[DHPS][37PT][0L]|[ACELMRT][37PT][01LM])(?:[A-MZ][1-9MNP-V][\dLMNP-V]{2}|[A-M][0L](?:[1-9MNP-V][\dLMNP-V]|[0L][1-9MNP-V]))[A-Z].*</nowiki> || Italian "Codice Fiscale" || Andrea | | <nowiki>.*(?:(?:[B-DF-HJ-NP-TV-Z]|[AEIOU])[AEIOU][AEIOUX]|[B-DF-HJ-NP-TV-Z]{2}[A-Z]){2}[\dLMNP-V]{2}(?:[A-EHLMPR-T](?:[04LQ][1-9MNP-V]|[1256LMRS][\dLMNP-V])|[DHPS][37PT][0L]|[ACELMRT][37PT][01LM])(?:[A-MZ][1-9MNP-V][\dLMNP-V]{2}|[A-M][0L](?:[1-9MNP-V][\dLMNP-V]|[0L][1-9MNP-V]))[A-Z].*</nowiki> || Italian "Codice Fiscale" || Andrea |
Latest revision as of 11:36, 14 May 2018
Since Tabbles 5.2.31, you can use regular expressions within the auto-tagging rules to tag files based on their name, path and content.
Contents
What are the regular expressions and how do they work?
- Regex on wikipedia
- Page on Codeplex
- Online regular expression editors RegExr and regex101
- Download Expresso, a regular expression editor
- The reference page on MSDN
- Regular expression library (warning: the syntax could be different... the one on the MSDN page is the one that should work)
Do I have to write them myself?
You will hardly need to write any regex yourself, because there are a multitude of regex that have been written and tested out there. Your best chance is to simply google "regex {whatever I need}" and see what comes up. Below a few useful resources:
- European personal data regexp patterns
- Searchable Collection of regex RegExLib
How do I use them in Tabbles?
Go to: Tools > Auto-tagging rules > New and edit the window to something like this:
From now on, whenever you create/save/rename a file/folder that matches tha regular expression, the file will be tagged (and a one-click pop-up should come up too).
If you want to automatically tag the files you already have, you need to use the function Tools > Run rules now. In order to use this function you need to first select a folder/disk within Tabbles.
Pocket size tutorial
Let's analyze a working regular expression:
.*\.avi$|.*\.mov$|.*\.mpg$ This one matches .avi OR .mov OR .mpg. A little explanation:
.* = matches any character
\. = matches the character "." (the dot)
avi = matches the pattern "avi" (and also mov and mpg)
^ = beginning of the pattern to be matched (put this in the beginning of the thing you want to match)
$ = end of the pattern to be matched (append this at the end of the thing you want to match)
| = plain and simple logical OR
So, if you want to add another extension, like .mp3, you append |.*\.mp3$ to the previous expression.
Other interesting stuff:
\b = matches backslash, ^ = beginning of the line to be matched, the opposite of $
Well, the rest is in the msdn reference...
Library of working regular expressions
This is a list of working regular expression: to contribute or request new ones, post in the forum thread :-)
Expression | Effects | Author |
---|---|---|
.*\.avi$|.*\.mov$|.*\.mpg$ | matches .avi OR .mov OR .mpg | Andrea |
.*England.*|.*Great.*Britain|.*United.*Kingdom.*|.*Northern.*Ireland|.*Wales.*|.*Scotland.* | matches "Great Britain", "Great_Britain", "Great-Britain" (along with the rest of the UK) etc. | Andrea |
.*(201\d).* | matches 2010 to 2019 (useful for years). Change "201" to "200" to match 2000 to 2009 | Andrea |
((67\d{2})|(4\d{3})|(5[1-5]\d{2})|(6011))(-?\s?\d{4}){3}|(3[4,7])\d{2}-?\s?\d{6}-?\s?\d{5} | Credit cards (works with the major cards, with "-" or without) | Andrea |
.*C\d{3}.* | Matches files with a path like "D:\Customers\C001" to "C999" or "...\stuff_C001_stuff" (to ...C999) | Andrea |
.*(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\]).* | Email address | Andrea |
[0-3][0-9][0-1][1-9]\d{2}-\d{4}?[^0-9]* | Danish CPR Number (DDMMYY-NNNN, like 310180-1234) | Andrea |
.*((0[1-9]|[12]\d|3[01])(0[1-9]|1[0-2])\d{3}\d{2}\d{4}).* | Unique Master Citizen Number(JMBG), based on wikipedia | Andrea |
(^|\s)(00[1-9]|0[1-9]0|0[1-9][1-9]|[1-6]\d{2}|7[0-6]\d|77[0-2])(-?|[\. ])([1-9]0|0[1-9]|[1-9][1-9])\3(\d{3}[1-9]|[1-9]\d{3}|\d[1-9]\d{2}|\d{2}[1-9]\d)($|\s|[;:,!\.\?]) | USA Social Security Number (It recognizes the formats 123456789, 123 45 6789, 123-45-6789) | Andrea |
.*(?:(?:[B-DF-HJ-NP-TV-Z]|[AEIOU])[AEIOU][AEIOUX]|[B-DF-HJ-NP-TV-Z]{2}[A-Z]){2}[\dLMNP-V]{2}(?:[A-EHLMPR-T](?:[04LQ][1-9MNP-V]|[1256LMRS][\dLMNP-V])|[DHPS][37PT][0L]|[ACELMRT][37PT][01LM])(?:[A-MZ][1-9MNP-V][\dLMNP-V]{2}|[A-M][0L](?:[1-9MNP-V][\dLMNP-V]|[0L][1-9MNP-V]))[A-Z].* | Italian "Codice Fiscale" | Andrea |
.*\.(?:doc|pdf|chm|ppt|xls|rtf|docx|xlsx)$ | matches .doc OR .pdf etc | Renincuente |
.*\b3x\d\d\b.* | matches "season 3" (e.g. *3x01*,*3x01* etc) | Maurizio |
(?>.*\.)(?!(?:dll|cfg)$).*$ | EXCLUDES .dll and .cfg files. Matches all the others | Renincuente |
(?:\w*_)?\d{2}-04-\d{4}(?:_\w*)? | Matches pics taken in the month of April, named like Name_23-04-2010 or 23-04-2010_Name | KaptK |
Hint: be careful with the dots and the crazy characters - dont' lose half of them while copy-pasting!