Difference between revisions of "Regular expressions"

From Tabbles Wiki
Jump to: navigation, search
m (What are the regular expressions and how do they work?)
m (Library of working regular expressions)
 
Line 87: Line 87:
 
|-
 
|-
 
|  <nowiki>[0-3][0-9][0-1][1-9]\d{2}-\d{4}?[^0-9]*</nowiki> || Danish CPR Number (DDMMYY-NNNN, like 310180-1234) || Andrea
 
|  <nowiki>[0-3][0-9][0-1][1-9]\d{2}-\d{4}?[^0-9]*</nowiki> || Danish CPR Number (DDMMYY-NNNN, like 310180-1234) || Andrea
 +
|-
 +
|  <nowiki>.*((0[1-9]|[12]\d|3[01])(0[1-9]|1[0-2])\d{3}\d{2}\d{4}).*</nowiki> || Unique Master Citizen Number(JMBG), based on [https://en.wikipedia.org/wiki/Unique_Master_Citizen_Number wikipedia] || Andrea
 
|-
 
|-
 
|  <nowiki>(^|\s)(00[1-9]|0[1-9]0|0[1-9][1-9]|[1-6]\d{2}|7[0-6]\d|77[0-2])(-?|[\. ])([1-9]0|0[1-9]|[1-9][1-9])\3(\d{3}[1-9]|[1-9]\d{3}|\d[1-9]\d{2}|\d{2}[1-9]\d)($|\s|[;:,!\.\?])</nowiki> || USA Social Security Number (It recognizes the formats 123456789, 123 45 6789, 123-45-6789) || Andrea
 
|  <nowiki>(^|\s)(00[1-9]|0[1-9]0|0[1-9][1-9]|[1-6]\d{2}|7[0-6]\d|77[0-2])(-?|[\. ])([1-9]0|0[1-9]|[1-9][1-9])\3(\d{3}[1-9]|[1-9]\d{3}|\d[1-9]\d{2}|\d{2}[1-9]\d)($|\s|[;:,!\.\?])</nowiki> || USA Social Security Number (It recognizes the formats 123456789, 123 45 6789, 123-45-6789) || Andrea

Latest revision as of 11:36, 14 May 2018

Since Tabbles 5.2.31, you can use regular expressions within the auto-tagging rules to tag files based on their name, path and content.

What are the regular expressions and how do they work?

- Regex on wikipedia

- Page on Codeplex

- Online regular expression editors RegExr and regex101

- Download Expresso, a regular expression editor

- The reference page on MSDN

- Regular expression library (warning: the syntax could be different... the one on the MSDN page is the one that should work)

Do I have to write them myself?

You will hardly need to write any regex yourself, because there are a multitude of regex that have been written and tested out there. Your best chance is to simply google "regex {whatever I need}" and see what comes up. Below a few useful resources:

- European personal data regexp patterns

- Searchable Collection of regex RegExLib


How do I use them in Tabbles?

Go to: Tools > Auto-tagging rules > New and edit the window to something like this:

T5-Auto-Tagging-RegEx.png


From now on, whenever you create/save/rename a file/folder that matches tha regular expression, the file will be tagged (and a one-click pop-up should come up too).

If you want to automatically tag the files you already have, you need to use the function Tools > Run rules now. In order to use this function you need to first select a folder/disk within Tabbles.

Pocket size tutorial

Let's analyze a working regular expression:

.*\.avi$|.*\.mov$|.*\.mpg$ This one matches .avi OR .mov OR .mpg. A little explanation:

.* = matches any character

\. = matches the character "." (the dot)

avi = matches the pattern "avi" (and also mov and mpg)

^ = beginning of the pattern to be matched (put this in the beginning of the thing you want to match)

$ = end of the pattern to be matched (append this at the end of the thing you want to match)

| = plain and simple logical OR


So, if you want to add another extension, like .mp3, you append |.*\.mp3$ to the previous expression.

Other interesting stuff:

\b = matches backslash, ^ = beginning of the line to be matched, the opposite of $

Well, the rest is in the msdn reference...

Library of working regular expressions

This is a list of working regular expression: to contribute or request new ones, post in the forum thread :-)


Expression Effects Author
.*\.avi$|.*\.mov$|.*\.mpg$ matches .avi OR .mov OR .mpg Andrea
.*England.*|.*Great.*Britain|.*United.*Kingdom.*|.*Northern.*Ireland|.*Wales.*|.*Scotland.* matches "Great Britain", "Great_Britain", "Great-Britain" (along with the rest of the UK) etc. Andrea
.*(201\d).* matches 2010 to 2019 (useful for years). Change "201" to "200" to match 2000 to 2009 Andrea
((67\d{2})|(4\d{3})|(5[1-5]\d{2})|(6011))(-?\s?\d{4}){3}|(3[4,7])\d{2}-?\s?\d{6}-?\s?\d{5} Credit cards (works with the major cards, with "-" or without) Andrea
.*C\d{3}.* Matches files with a path like "D:\Customers\C001" to "C999" or "...\stuff_C001_stuff" (to ...C999) Andrea
.*(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\]).* Email address Andrea
[0-3][0-9][0-1][1-9]\d{2}-\d{4}?[^0-9]* Danish CPR Number (DDMMYY-NNNN, like 310180-1234) Andrea
.*((0[1-9]|[12]\d|3[01])(0[1-9]|1[0-2])\d{3}\d{2}\d{4}).* Unique Master Citizen Number(JMBG), based on wikipedia Andrea
(^|\s)(00[1-9]|0[1-9]0|0[1-9][1-9]|[1-6]\d{2}|7[0-6]\d|77[0-2])(-?|[\. ])([1-9]0|0[1-9]|[1-9][1-9])\3(\d{3}[1-9]|[1-9]\d{3}|\d[1-9]\d{2}|\d{2}[1-9]\d)($|\s|[;:,!\.\?]) USA Social Security Number (It recognizes the formats 123456789, 123 45 6789, 123-45-6789) Andrea
.*(?:(?:[B-DF-HJ-NP-TV-Z]|[AEIOU])[AEIOU][AEIOUX]|[B-DF-HJ-NP-TV-Z]{2}[A-Z]){2}[\dLMNP-V]{2}(?:[A-EHLMPR-T](?:[04LQ][1-9MNP-V]|[1256LMRS][\dLMNP-V])|[DHPS][37PT][0L]|[ACELMRT][37PT][01LM])(?:[A-MZ][1-9MNP-V][\dLMNP-V]{2}|[A-M][0L](?:[1-9MNP-V][\dLMNP-V]|[0L][1-9MNP-V]))[A-Z].* Italian "Codice Fiscale" Andrea
.*\.(?:doc|pdf|chm|ppt|xls|rtf|docx|xlsx)$ matches .doc OR .pdf etc Renincuente
.*\b3x\d\d\b.* matches "season 3" (e.g. *3x01*,*3x01* etc) Maurizio
(?>.*\.)(?!(?:dll|cfg)$).*$ EXCLUDES .dll and .cfg files. Matches all the others Renincuente
(?:\w*_)?\d{2}-04-\d{4}(?:_\w*)? Matches pics taken in the month of April, named like Name_23-04-2010 or 23-04-2010_Name KaptK


Hint: be careful with the dots and the crazy characters - dont' lose half of them while copy-pasting!