XXXX redacted
You've been given a contract from a three letter abbreviation government agency. They want a program to automatically redact sensitive information from documents to be released to the public. They want fine control over what gets redacted though.
Given a piece of free-form, possibly Unicode text, (assume text only, no markup or formatting codes) they want to be able to redact: whole words, (case sensitive or insensitive) or partial words, (case sensitive or insensitive). Further, they want the option to "overkill" redact a partial word. Overkill redact means if the word contains the redact target, even if is only part of the word, redact the entire word.
For our purposes, a "word" here, means: a character group, separated by white-space and possibly punctuation; not necessarily strictly alphabetic characters. To "redact" a word or partial word, replace each character in the redaction target with a capital letter 'X'. There should be the same number of graphemes in the final redacted word as there were in the non-redacted word.
- Task
Write a procedure to "redact" a given piece of text. Your procedure should take the text (or a link to it), the redaction target (or a link to it) and the redaction options. It need not be a single routine, as long as there is some way to programmatically select which operation will be performed. It may be invoked from a command line or as an internal routine, but it should be separately invokable, not just a hard coded block.
The given strings are enclosed in square brackets to denote them. The brackets should not be counted as part of the strings.
Using the test string: [Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom.]
Show the redacted sentence for each of the redaction target strings [Tom] & [tom] using the following options:
- Whole word
- Whole word, Case insensitive
- Partial word
- Partial word, Case insensitive
- Partial word, Overkill
- Partial word, Case insensitive, Overkill
Note that some combinations don't, or at least, shouldn't really differ from less specific combination. E.G. "Whole word, Overkill" should be theoretically be exactly the same as "Whole word".
Extra kudos for not including adjoining punctuation during "Overkill" redaction.
Extra kudos if the redaction target can contain non-letter characters.
The demo strings use the abbreviations w/p for whole/partial word, i/s for case insensitive/sensitive, n/o for normal/overkill. You are not required to use those, or any abbreviation. They are just for display, though may be useful to show what operation you are intending to perform.
Ideal expected output (adjoining punctuation untouched):
Redact 'Tom': [w|s|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. [w|i|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [p|s|n] XXX? XXXs bottom tomato is in his stomach while playing the "XXX-tom" brand tom-toms. That's so tom. [p|i|n] XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. [p|s|o] XXX? XXXX bottom tomato is in his stomach while playing the "XXXXXXX" brand tom-toms. That's so tom. [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX.
Redact 'tom': [w|s|n] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [w|i|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [p|s|n] Tom? Toms botXXX XXXato is in his sXXXach while playing the "Tom-XXX" brand XXX-XXXs. That's so XXX. [p|i|n] XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. [p|s|o] Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX.
- Stretch
Complex Unicode: Using the test string: [π§ π¨ π§ π¨βπ©βπ¦] and the redaction strings: [π¨] and [π¨βπ©βπ¦]
Show the redacted strings when using the option "Whole word" (Case sensitivity shouldn't matter.) A single grapheme should be replaced by a single 'X'.
π§ π¨ π§ π¨βπ©βπ¦ Redact 'π¨' [w] π§ X π§ π¨βπ©βπ¦ Redact 'π¨βπ©βπ¦' [w] π§ π¨ π§ X
- Metrics
- Counting
- Word frequency
- Letter frequency
- Jewels and stones
- I before E except after C
- Bioinformatics/base count
- Count occurrences of a substring
- Count how many vowels and consonants occur in a string
- Remove/replace
- XXXX redacted
- Conjugate a Latin verb
- Remove vowels from a string
- String interpolation (included)
- Strip block comments
- Strip comments from a string
- Strip a set of characters from a string
- Strip whitespace from a string -- top and tail
- Strip control codes and extended characters from a string
- Anagrams/Derangements/shuffling
- Word wheel
- ABC problem
- Sattolo cycle
- Knuth shuffle
- Ordered words
- Superpermutation minimisation
- Textonyms (using a phone text pad)
- Anagrams
- Anagrams/Deranged anagrams
- Permutations/Derangements
- Find/Search/Determine
- ABC words
- Odd words
- Word ladder
- Semordnilap
- Word search
- Wordiff (game)
- String matching
- Tea cup rim text
- Alternade words
- Changeable words
- State name puzzle
- String comparison
- Unique characters
- Unique characters in each string
- Extract file extension
- Levenshtein distance
- Palindrome detection
- Common list elements
- Longest common suffix
- Longest common prefix
- Compare a list of strings
- Longest common substring
- Find common directory path
- Words from neighbour ones
- Change e letters to i in words
- Non-continuous subsequences
- Longest common subsequence
- Longest palindromic substrings
- Longest increasing subsequence
- Words containing "the" substring
- Sum of the digits of n is substring of n
- Determine if a string is numeric
- Determine if a string is collapsible
- Determine if a string is squeezable
- Determine if a string has all unique characters
- Determine if a string has all the same characters
- Longest substrings without repeating characters
- Find words which contains all the vowels
- Find words which contain the most consonants
- Find words which contains more than 3 vowels
- Find words whose first and last three letters are equal
- Find words with alternating vowels and consonants
- Formatting
- Substring
- Rep-string
- Word wrap
- String case
- Align columns
- Literals/String
- Repeat a string
- Brace expansion
- Brace expansion using ranges
- Reverse a string
- Phrase reversals
- Comma quibbling
- Special characters
- String concatenation
- Substring/Top and tail
- Commatizing numbers
- Reverse words in a string
- Suffixation of decimal numbers
- Long literals, with continuations
- Numerical and alphabetical suffixes
- Abbreviations, easy
- Abbreviations, simple
- Abbreviations, automatic
- Song lyrics/poems/Mad Libs/phrases
- Mad Libs
- Magic 8-ball
- 99 bottles of beer
- The Name Game (a song)
- The Old lady swallowed a fly
- The Twelve Days of Christmas
- Tokenize
- Text between
- Tokenize a string
- Word break problem
- Tokenize a string with escaping
- Split a character string based on change of character
- Sequences
Ada
-- Redact text
-- J. Carter 2023 Apr
with Ada.Characters.Handling;
with Ada.Containers.Vectors;
with Ada.Strings.Fixed;
with Ada.Strings.Unbounded;
with Ada.Text_IO;
procedure Redact is
use Ada.Strings.Unbounded;
package Field_Lists is new Ada.Containers.Vectors (Index_Type => Positive, Element_Type => Unbounded_String);
function Parsed (Line : String) return Field_Lists.Vector;
-- Presumes that Line consists of fields speparated by 1 or more spaces (' ')
-- Returns a list of the parsed fields
function Redact (Word : in Field_Lists.Vector;
Pattern : in String;
Whole_Word : in Boolean;
Case_Sensitive : in Boolean;
Overkill : in Boolean)
return String;
-- Redacts the words or parts of words in Word containing Pattern
-- If Whole_Word, the entire word must match Pattern, and Overkill is ignored
-- Case_Sensitive determines whether or not the match is case sensitive
-- Overkill means the entire word is redacted even if only a part matches
function Parsed (Line : String) return Field_Lists.Vector is
Result : Field_Lists.Vector;
Start : Natural := Line'First;
Stop : Natural;
begin -- Parsed
All_Fields : loop
Start := Ada.Strings.Fixed.Index_Non_Blank (Line (Start .. Line'Last) );
exit All_Fields when Start = 0;
Stop := Ada.Strings.Fixed.Index (Line (Start .. Line'Last), " ");
if Stop = 0 then
Stop := Line'Last + 1;
end if;
Result.Append (New_Item => To_Unbounded_String (Line (Start .. Stop - 1) ) );
Start := Stop + 1;
end loop All_Fields;
return Result;
end Parsed;
function Redact (Word : in Field_Lists.Vector;
Pattern : in String;
Whole_Word : in Boolean;
Case_Sensitive : in Boolean;
Overkill : in Boolean)
return String is
subtype Lower is Character range 'a' .. 'z';
subtype Upper is Character range 'A' .. 'Z';
Pat : constant String := (if Case_Sensitive then Pattern else Ada.Characters.Handling.To_Lower (Pattern) );
Result : Unbounded_String;
Start : Positive; -- Start of a word, ignoring initial punctuation
Stop : Positive; -- End of a word, ignoring terminal punctuation
First : Natural; -- Start of partial match
Last : Natural; -- End of partial match
begin -- Redact
All_Words : for I in 1 .. Word.Last_Index loop
One_Word : declare
Raw : String := To_String (Word.Element (I) );
Woid : String := (if Case_Sensitive then Raw else Ada.Characters.Handling.To_Lower (Raw) );
begin -- One_Word
Start := Woid'First; -- Ignore initial punctuation
Find_Start : loop
exit Find_Start when Woid (Start) in Lower | Upper;
Start := Start + 1;
end loop Find_Start;
Stop := Woid'Last; -- Ignore terminal punctuation
Find_Stop : loop
exit Find_Stop when Woid (Stop) in Lower | Upper;
Stop := Stop - 1;
end loop Find_Stop;
if Whole_Word then
if Woid (Start .. Stop) = Pat then
Raw (Start .. Stop) := (Start .. Stop => 'X');
end if;
else
Last := Start - 1;
All_Matches : loop -- Multiple matches are possible within a single word
First := Ada.Strings.Fixed.Index (Woid (Last + 1 .. Stop), Pat);
exit All_Matches when First = 0;
Last := (if Overkill then Stop else First + Pattern'Length - 1);
if Overkill then
First := Start;
end if;
Raw (First .. Last) := (First .. Last => 'X');
end loop All_Matches;
end if;
Append (Source => Result, New_Item => Raw & (if I = Word.Last_Index then "" else " ") );
end One_Word;
end loop All_Words;
return To_String (Result);
end Redact;
subtype Pattern_String is String (1 .. 3);
type Pattern_List is array (1 .. 2) of Pattern_String;
Pattern : constant Pattern_List := ("Tom", "tom");
Line : constant String := "Tom? Toms bottom tomato is in his stomach while playing the " & '"' & "Tom-tom" & '"' &
" brand tom-toms. That's so tom.";
Word : constant Field_Lists.Vector := Parsed (Line);
begin -- Redact
All_Patterns : for Pat of Pattern loop
Ada.Text_IO.Put_Line (Item => "Pattern: " & Pat);
Wholeness : for Whole in Boolean loop
Sensitivity : for Sense in Boolean loop
if Whole then
Ada.Text_IO.Put_Line (Item => 'W' & (if Sense then 'S' else 'I') & "N: " & Redact (Word, Pat, Whole, Sense, False) );
else
Overkill : for Over in Boolean loop
Ada.Text_IO.Put_Line (Item => (if Whole then 'W' else 'P') &
(if Sense then 'S' else 'I') &
(if Over then 'O' else 'N') & ": " &
Redact (Word, Pat, Whole, Sense, Over) );
end loop Overkill;
end if;
end loop Sensitivity;
end loop Wholeness;
end loop All_Patterns;
end Redact;
- Output:
Pattern: Tom PIN: XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. PIO: XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. PSN: XXX? XXXs bottom tomato is in his stomach while playing the "XXX-tom" brand tom-toms. That's so tom. PSO: XXX? XXXX bottom tomato is in his stomach while playing the "XXXXXXX" brand tom-toms. That's so tom. WIN: XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. WSN: XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. Pattern: tom PIN: XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. PIO: XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. PSN: Tom? Toms botXXX XXXato is in his sXXXach while playing the "Tom-XXX" brand XXX-XXXs. That's so XXX. PSO: Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. WIN: XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. WSN: Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX.
AppleScript
ASObjC
This uses ASObjC to access macOS's Foundation framework's regex and text-replacement methods. The methods support ICU-compatible regular expressions.
use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions
on redact(theText, redactionTargets, options)
set |β| to current application
-- Set up a regex search pattern for the target or list of targets supplied.
-- Since it has to be able to match grapheme characters which may be combinations of
-- others in the same string, include catches for "Zero Width Joiner" characters.
set targets to |β|'s class "NSMutableArray"'s arrayWithArray:(redactionTargets as list)
repeat with thisTarget in targets
set thisTarget's contents to (|β|'s class "NSRegularExpression"'s escapedPatternForString:(thisTarget))
end repeat
set targetPattern to "(?<!\\u200d)(?:" & (targets's componentsJoinedByString:("|")) & ")(?!\\u200d)"
-- If necessary, modify the pattern according to the requested options. Only "w", "o", "i", and "s" need attention.
if (options contains "w") then
-- Don't match where preceded or followed by either a hyphen or anything which isn't punctuation or white space.
set targetPattern to "(?<![-[^[:punct:]\\s]])" & targetPattern & "(?![-[^[:punct:]\\s]])"
else if (options contains "o") then
-- Include any preceding or following run of hyphens and/or non-(punctuation or white-space).
set targetPattern to "[-[^[:punct:]\\s]]*" & targetPattern & "[-[^[:punct:]\\s]]*+"
end if
-- Default to case-insensitivity as in vanilla AppleScript unless otherwise indicated by option or AS 'considering' attribute.
if ((options contains "i") or ((options does not contain "s") and ("i" = "I"))) then Β¬
set targetPattern to "(?i)" & targetPattern
-- Locate all the matches in the text.
set mutableText to |β|'s class "NSMutableString"'s stringWithString:(theText)
set regexObject to |β|'s class "NSRegularExpression"'s regularExpressionWithPattern:(targetPattern) Β¬
options:(0) |error|:(missing value)
set matchObjects to regexObject's matchesInString:(mutableText) options:(0) range:({0, mutableText's |length|()})
set matchRanges to matchObjects's valueForKey:("range")
-- Replace each character or grapheme in the matched ranges with "X".
set regexSearch to |β|'s NSRegularExpressionSearch
repeat with i from (count matchRanges) to 1 by -1
tell mutableText to replaceOccurrencesOfString:(".(?:\\u200d.)*+") withString:("X") Β¬
options:(regexSearch) range:(item i of matchRanges)
end repeat
return mutableText as text
end redact
-- Test code:
set theText to "Tom? Toms bottom tomato is in his stomach while playing the \"Tom-tom\" brand tom-toms. That's so tom."
set output to {}
repeat with redactionTarget in {"Tom", "tom"}
set end of output to "Redact " & redactionTarget & ":"
repeat with options in {"[w|s|n]", "[w|i|n]", "[p|s|n]", "[p|i|n]", "[p|s|o]", "[p|i|o]"}
set end of output to options & ": " & redact(theText, redactionTarget, options)
end repeat
set end of output to ""
end repeat
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to linefeed
set output to output as text
set AppleScript's text item delimiters to astid
return output
- Output:
"Redact Tom:
[w|s|n]: XXX? Toms bottom tomato is in his stomach while playing the \"Tom-tom\" brand tom-toms. That's so tom.
[w|i|n]: XXX? Toms bottom tomato is in his stomach while playing the \"Tom-tom\" brand tom-toms. That's so XXX.
[p|s|n]: XXX? XXXs bottom tomato is in his stomach while playing the \"XXX-tom\" brand tom-toms. That's so tom.
[p|i|n]: XXX? XXXs botXXX XXXato is in his sXXXach while playing the \"XXX-XXX\" brand XXX-XXXs. That's so XXX.
[p|s|o]: XXX? XXXX bottom tomato is in his stomach while playing the \"XXXXXXX\" brand tom-toms. That's so tom.
[p|i|o]: XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the \"XXXXXXX\" brand XXXXXXXX. That's so XXX.
Redact tom:
[w|s|n]: Tom? Toms bottom tomato is in his stomach while playing the \"Tom-tom\" brand tom-toms. That's so XXX.
[w|i|n]: XXX? Toms bottom tomato is in his stomach while playing the \"Tom-tom\" brand tom-toms. That's so XXX.
[p|s|n]: Tom? Toms botXXX XXXato is in his sXXXach while playing the \"Tom-XXX\" brand XXX-XXXs. That's so XXX.
[p|i|n]: XXX? XXXs botXXX XXXato is in his sXXXach while playing the \"XXX-XXX\" brand XXX-XXXs. That's so XXX.
[p|s|o]: Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing the \"XXXXXXX\" brand XXXXXXXX. That's so XXX.
[p|i|o]: XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the \"XXXXXXX\" brand XXXXXXXX. That's so XXX.
"
Or with the grapheme text:
set graphemeText to "π§ π¨ π§ π¨βπ©βπ¦"
set output to {}
repeat with redactionTarget in {"π¨", "π¨βπ©βπ¦"}
set end of output to "Redact " & redactionTarget & ":"
set end of output to "[w]: " & redact(graphemeText, redactionTarget, "[w]")
set end of output to ""
end repeat
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to linefeed
set output to output as text
set AppleScript's text item delimiters to astid
return output
- Output:
"Redact π¨:
[w]: π§ X π§ π¨βπ©βπ¦
Redact π¨βπ©βπ¦:
[w]: π§ π¨ π§ X
"
Vanilla (core language only)
The above uses ASObjC to take advantage of the Foundation framework's regex functions. But the core AppleScript language is itself perfectly capable of performing the task, albeit with somewhat more code. A fairly recent macOS version's needed for the grapheme characters to be recognised and handled satisfactorily (macOS 10.14's fine), but with plain text, the code below works on any system since Mac OS X 10.5. On macOS 10.14, it's about twice as fast as the ASObjC.
Test code and output as above.
on redact(theText, redactionTargets, options)
(* Script object containing the basic process. *)
script default
property textItems : missing value
property outputText : theText
-- Replace every instance of each of the passed redaction targets with an "X" sequence of the same length.
on redact()
set astid to AppleScript's text item delimiters
repeat with thisTarget in (redactionTargets as list)
set AppleScript's text item delimiters to thisTarget's contents
set my textItems to my outputText's text items
applyOption()
set AppleScript's text item delimiters to getXs(count thisTarget)
set my outputText to my textItems as text
end repeat
set astid to AppleScript's text item delimiters
end redact
on applyOption()
end applyOption
on getXs(targetLength)
set Xs to ""
repeat targetLength times
set Xs to Xs & "X"
end repeat
return Xs
end getXs
end script
(* Child script objects with their own applyOption() handlers for word-match and overkill. *)
script wordMatch
property parent : default
property newTextItems : missing value
-- Derive new text items from those just extracted with the current delimiter, losing any delimitation within words.
on applyOption()
set my newTextItems to {}
set i to 1
repeat with j from 2 to (count my textItems)
set precedingExtract to text from text item i to text item (j - 1) of my outputText -- Substring from the text.
set thisTextItem to item j of my textItems -- Text item from the list.
if not Β¬
((precedingExtract ends with "-") or Β¬
(((count precedingExtract's words) > 0) and (precedingExtract ends with precedingExtract's last word)) or Β¬
(thisTextItem begins with "-") or Β¬
(((count thisTextItem's words) > 0) and (thisTextItem begins with thisTextItem's first word))) then
set end of my newTextItems to precedingExtract
set i to j
end if
end repeat
set end of my newTextItems to text from text item i to text item j of my outputText
set my textItems to my newTextItems
end applyOption
end script
script overkill
property parent : default
-- Where the extracted text items are delimited within words, replace the word stumps' characters with "X"s.
on applyOption()
repeat with i from 2 to (count my textItems)
set precedingTextItem to item (i - 1) of my textItems
if ((count precedingTextItem's words) > 0) then
set lastword to precedingTextItem's last word
if ((precedingTextItem ends with lastword) or (precedingTextItem ends with (lastword & "-"))) then
set editLength to (count text from last word to end of precedingTextItem)
set Xs to getXs(editLength)
if ((count precedingTextItem) > editLength) then Β¬
set Xs to text 1 thru -(editLength + 1) of precedingTextItem & Xs
set item (i - 1) of my textItems to Xs
end if
else if ((precedingTextItem is "-") and (i > 2)) then -- Hyphen between two target instances.
set item (i - 1) of my textItems to "X"
end if
set thisTextItem to item i of my textItems
if ((count thisTextItem's words) > 0) then
set firstWord to thisTextItem's first word
if ((thisTextItem begins with firstWord) or (thisTextItem begins with ("-" & firstWord))) then
set editLength to (count text 1 thru first word of thisTextItem)
set Xs to getXs(editLength)
if ((count thisTextItem) > editLength) then set Xs to Xs & text (editLength + 1) thru end of thisTextItem
set item i of my textItems to Xs
end if
end if
end repeat
end applyOption
end script
(* Outer handler code. *)
-- Select the script object to use as the redactor.
if (options contains "w") then
set redactor to wordMatch
else if (options contains "o") then
set redactor to overkill
else
set redactor to default
end if
-- Invoke it with the necessary text comparison attributes imposed.
if ((options contains "i") or ((options does not contain "s") and ("i" = "I"))) then
considering white space and punctuation but ignoring case
tell redactor to redact()
end considering
else
considering white space, punctuation and case
tell redactor to redact()
end considering
end if
return redactor's outputText
end redact
AutoHotkey
No Complex Unicode!
str = Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom.
words := ["Tom", "tom"]
opts := ["wsn", "win", "psn", "pin", "pso", "pio"]
for i, word in words
{
result .= "Redact '" word "'`n"
for j, opt in opts
result .= opt "`t" redact(str, word, opt) "`n"
result .= "`n"
}
MsgBox, 262144, , % result
return
redact(str, word, opt){
if InStr(opt, "w") ; Whole word
a := "(^|[^-])\K\b", z := "\b(?!-)"
if InStr(opt, "o") ; Overkill
a .= "\b[\w\-]*", z := "[\w\-]*\b" z
if InStr(opt, "i") ; Case insensitive
i := "i)"
ndle := i a "\Q" word "\E" z
while pos := RegExMatch(str, ndle, mtch, A_Index=1?1:pos+StrLen(mtch))
{
rplc := ""
loop % StrLen(mtch)
rplc .= "X"
str := RegExReplace(str, ndle, rplc,, 1)
}
return str
}
- Output:
Redact 'Tom' wsn XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. win XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. psn XXX? XXXs bottom tomato is in his stomach while playing the "XXX-tom" brand tom-toms. That's so tom. pin XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. pso XXX? XXXX bottom tomato is in his stomach while playing the "XXXXXXX" brand tom-toms. That's so tom. pio XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. Redact 'tom' wsn Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. win XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. psn Tom? Toms botXXX XXXato is in his sXXXach while playing the "Tom-XXX" brand XXX-XXXs. That's so XXX. pin XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. pso Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. pio XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX.
C
This is a very basic ASCII-only implementation, no Unicode or regular expressions.
#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
typedef enum {
whole_word = 1,
overkill = 2,
case_insensitive = 4
} redact_options;
bool is_word_char(char ch) {
return ch == '-' || isalpha((unsigned char)ch);
}
// Performs in-place redaction of the target with the specified options.
void redact(char* text, const char* target, redact_options options) {
size_t target_length = strlen(target);
if (target_length == 0)
return;
char* start = text;
char* end = text + strlen(text);
while (start < end) {
// NB: strcasestr is a non-standard extension. It's similar to the
// standard strstr function, but case-insensitive.
char* str = (options & case_insensitive) ? strcasestr(start, target)
: strstr(start, target);
if (str == NULL)
break;
char* word_start = str;
char* word_end = str + target_length;
if (options & (overkill | whole_word)) {
while (word_start > start && is_word_char(*(word_start - 1)))
--word_start;
while (word_end < end && is_word_char(*word_end))
++word_end;
}
if (!(options & whole_word) ||
(word_start == str && word_end == str + target_length))
memset(word_start, 'X', word_end - word_start);
start = word_end;
}
}
void do_basic_test(const char* target, redact_options options) {
char text[] = "Tom? Toms bottom tomato is in his stomach while playing the "
"\"Tom-tom\" brand tom-toms. That's so tom.";
redact(text, target, options);
printf("[%c|%c|%c]: %s\n", (options & whole_word) ? 'w' : 'p',
(options & case_insensitive) ? 'i' : 's',
(options & overkill) ? 'o' : 'n', text);
}
void do_basic_tests(const char* target) {
printf("Redact '%s':\n", target);
do_basic_test(target, whole_word);
do_basic_test(target, whole_word | case_insensitive);
do_basic_test(target, 0);
do_basic_test(target, case_insensitive);
do_basic_test(target, overkill);
do_basic_test(target, case_insensitive | overkill);
}
int main() {
do_basic_tests("Tom");
printf("\n");
do_basic_tests("tom");
return 0;
}
- Output:
Redact 'Tom': [w|s|n]: XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. [w|i|n]: XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [p|s|n]: XXX? XXXs bottom tomato is in his stomach while playing the "XXX-tom" brand tom-toms. That's so tom. [p|i|n]: XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. [p|s|o]: XXX? XXXX bottom tomato is in his stomach while playing the "XXXXXXX" brand tom-toms. That's so tom. [p|i|o]: XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. Redact 'tom': [w|s|n]: Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [w|i|n]: XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [p|s|n]: Tom? Toms botXXX XXXato is in his sXXXach while playing the "Tom-XXX" brand XXX-XXXs. That's so XXX. [p|i|n]: XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. [p|s|o]: Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. [p|i|o]: XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX.
C++
#include <iostream>
using namespace std;
string redact(const string &source, const string &word, bool partial, bool insensitive, bool overkill) {
string temp = source;
auto different = [insensitive](char s, char w) {
if (insensitive) {
return toupper(s) != toupper(w);
} else {
return s != w;
}
};
auto isWordChar = [](char c) {
return c == '-' || isalpha(c);
};
for (size_t i = 0; i < temp.length() - word.length() + 1; i++) {
bool match = true;
for (size_t j = 0; j < word.length(); j++) {
if (different(temp[i + j], word[j])) {
match = false;
break;
}
}
if (match) {
auto beg = i;
auto end = i + word.length();
if (!partial) {
if (beg > 0 && isWordChar(temp[beg - 1])) {
continue;
}
if (end < temp.length() && isWordChar(temp[end])) {
continue;
}
}
if (overkill) {
while (beg > 0 && isWordChar(temp[beg - 1])) {
beg--;
}
while (end < temp.length() && isWordChar(temp[end])) {
end++;
}
}
for (size_t k = beg; k < end; k++) {
temp[k] = 'X';
}
}
}
return temp;
}
void example(const string &source, const string &word) {
std::cout << "Redact " << word << '\n';
std::cout << "[w|s|n] " << redact(source, word, false, false, false) << '\n';
std::cout << "[w|i|n] " << redact(source, word, false, true, false) << '\n';
std::cout << "[p|s|n] " << redact(source, word, true, false, false) << '\n';
std::cout << "[p|i|n] " << redact(source, word, true, true, false) << '\n';
std::cout << "[p|s|o] " << redact(source, word, true, false, true) << '\n';
std::cout << "[p|i|o] " << redact(source, word, true, true, true) << '\n';
std::cout << '\n';
}
int main() {
string text = "Tom? Toms bottom tomato is in his stomach while playing the \"Tom-tom\" brand tom-toms. That's so tom";
example(text, "Tom");
example(text, "tom");
return 0;
}
- Output:
Redact Tom [w|s|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom [w|i|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX [p|s|n] XXX? XXXs bottom tomato is in his stomach while playing the "XXX-tom" brand tom-toms. That's so tom [p|i|n] XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX [p|s|o] XXX? XXXX bottom tomato is in his stomach while playing the "XXXXXXX" brand tom-toms. That's so tom [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX Redact tom [w|s|n] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX [w|i|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX [p|s|n] Tom? Toms botXXX XXXato is in his sXXXach while playing the "Tom-XXX" brand XXX-XXXs. That's so XXX [p|i|n] XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX [p|s|o] Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX
D
import std.stdio;
import std.uni;
string redact(string source, string word, bool partial = false, bool insensitive = false, bool overkill = false) {
bool different(char s, char w) {
if (insensitive) {
return s.toUpper != w.toUpper;
} else {
return s != w;
}
}
bool isWordChar(char c) {
return c == '-' || c.isAlpha;
}
auto temp = source.dup;
foreach (i; 0 .. temp.length - word.length + 1) {
bool match = true;
foreach (j; 0 .. word.length) {
if (different(temp[i + j], word[j])) {
match = false;
break;
}
}
if (match) {
auto beg = i;
auto end = i + word.length;
if (!partial) {
if (beg > 0 && isWordChar(temp[beg - 1])) {
// writeln("b boundary ", temp[beg - 1]);
continue;
}
if (end < temp.length && isWordChar(temp[end])) {
// writeln("e boundary ", temp[end]);
continue;
}
}
if (overkill) {
while (beg > 0 && isWordChar(temp[beg - 1])) {
beg--;
}
while (end < temp.length - 1 && isWordChar(temp[end])) {
end++;
}
}
temp[beg .. end] = 'X';
}
}
return temp.idup;
}
void example(string source, string word) {
writeln("Redact ", word);
writeln("[w|s|n] ", redact(source, word, false, false, false));
writeln("[w|i|n] ", redact(source, word, false, true, false));
writeln("[p|s|n] ", redact(source, word, true, false, false));
writeln("[p|i|n] ", redact(source, word, true, true, false));
writeln("[p|s|o] ", redact(source, word, true, false, true));
writeln("[p|i|o] ", redact(source, word, true, true, true));
writeln;
}
void main(string[] args) {
string text = `Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom`;
example(text, "Tom");
example(text, "tom");
}
- Output:
Redact Tom [w|s|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom [w|i|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX [p|s|n] XXX? XXXs bottom tomato is in his stomach while playing the "XXX-tom" brand tom-toms. That's so tom [p|i|n] XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX [p|s|o] XXX? XXXX bottom tomato is in his stomach while playing the "XXXXXXX" brand tom-toms. That's so tom [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX Redact tom [w|s|n] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX [w|i|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX [p|s|n] Tom? Toms botXXX XXXato is in his sXXXach while playing the "Tom-XXX" brand XXX-XXXs. That's so XXX [p|i|n] XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX [p|s|o] Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX
ed
Uses non-portable GNU Ed extension for case-insensitive matching. The number of replacement lines has to be copy-pasted as many times as necessary for full replacement. Because, unlike sed, ed has no loops. Also note that the replacement is always three X-es, because ed has no way to customize the replacement length.
H
2,j
1s/.*/&|/
,j
t
t0
t0
t0
t0
t0
# Repeat as many times as possible. Because ed has no loops
1s/^(.*)\|(.*)\b\1\b/\1|\2XXX/g
1s/^(.*)\|(.*)\b\1\b/\1|\2XXX/g
1s/^(.*)\|(.*)\b\1\b/\1|\2XXX/g
1s/^(.*)\|(.*)\b\1\b/\1|\2XXX/g
1s/^(.*)\|(.*)\b\1\b/\1|\2XXX/g
1s/^(.*)\|(.*)\b\1\b/\1|\2XXX/g
1s/^(.*)\|(.*)\b\1\b/\1|\2XXX/g
1s/^(.*)\|(.*)\b\1\b/\1|\2XXX/g
1s/^(.*)\|(.*)\b\1\b/\1|\2XXX/g
1s/^(.*)\|(.*)\b\1\b/\1|\2XXX/g
1s/^(.*)\|(.*)$/\2/
2s/^(.*)\|(.*)\b\1\b/\1|\2XXX/gi
2s/^(.*)\|(.*)\b\1\b/\1|\2XXX/gi
2s/^(.*)\|(.*)\b\1\b/\1|\2XXX/gi
2s/^(.*)\|(.*)\b\1\b/\1|\2XXX/gi
2s/^(.*)\|(.*)\b\1\b/\1|\2XXX/gi
2s/^(.*)\|(.*)\b\1\b/\1|\2XXX/gi
2s/^(.*)\|(.*)\b\1\b/\1|\2XXX/gi
2s/^(.*)\|(.*)\b\1\b/\1|\2XXX/gi
2s/^(.*)\|(.*)\b\1\b/\1|\2XXX/gi
2s/^(.*)\|(.*)\b\1\b/\1|\2XXX/gi
2s/^(.*)\|(.*)$/\2/
3s/^(.*)\|(.*)\1/\1|\2XXX/g
3s/^(.*)\|(.*)\1/\1|\2XXX/g
3s/^(.*)\|(.*)\1/\1|\2XXX/g
3s/^(.*)\|(.*)\1/\1|\2XXX/g
3s/^(.*)\|(.*)\1/\1|\2XXX/g
3s/^(.*)\|(.*)\1/\1|\2XXX/g
3s/^(.*)\|(.*)\1/\1|\2XXX/g
3s/^(.*)\|(.*)\1/\1|\2XXX/g
3s/^(.*)\|(.*)\1/\1|\2XXX/g
3s/^(.*)\|(.*)\1/\1|\2XXX/g
3s/^(.*)\|(.*)$/\2/
4s/^(.*)\|(.*)\1/\1|\2XXX/gi
4s/^(.*)\|(.*)\1/\1|\2XXX/gi
4s/^(.*)\|(.*)\1/\1|\2XXX/gi
4s/^(.*)\|(.*)\1/\1|\2XXX/gi
4s/^(.*)\|(.*)\1/\1|\2XXX/gi
4s/^(.*)\|(.*)\1/\1|\2XXX/gi
4s/^(.*)\|(.*)\1/\1|\2XXX/gi
4s/^(.*)\|(.*)\1/\1|\2XXX/gi
4s/^(.*)\|(.*)\1/\1|\2XXX/gi
4s/^(.*)\|(.*)\1/\1|\2XXX/gi
4s/^(.*)\|(.*)$/\2/
5s/^(.*)\|(.*)\b(\w*)\1(\w*)\b(.*)$/\1|\2XXX\5/g
5s/^(.*)\|(.*)\b(\w*)\1(\w*)\b(.*)$/\1|\2XXX\5/g
5s/^(.*)\|(.*)\b(\w*)\1(\w*)\b(.*)$/\1|\2XXX\5/g
5s/^(.*)\|(.*)\b(\w*)\1(\w*)\b(.*)$/\1|\2XXX\5/g
5s/^(.*)\|(.*)\b(\w*)\1(\w*)\b(.*)$/\1|\2XXX\5/g
5s/^(.*)\|(.*)\b(\w*)\1(\w*)\b(.*)$/\1|\2XXX\5/g
5s/^(.*)\|(.*)\b(\w*)\1(\w*)\b(.*)$/\1|\2XXX\5/g
5s/^(.*)\|(.*)\b(\w*)\1(\w*)\b(.*)$/\1|\2XXX\5/g
5s/^(.*)\|(.*)\b(\w*)\1(\w*)\b(.*)$/\1|\2XXX\5/g
5s/^(.*)\|(.*)\b(\w*)\1(\w*)\b(.*)$/\1|\2XXX\5/g
5s/^(.*)\|(.*)$/\2/
6s/^(.*)\|(.*)\b(\w*)\1(\w*)\b(.*)$/\1|\2XXX\5/gi
6s/^(.*)\|(.*)\b(\w*)\1(\w*)\b(.*)$/\1|\2XXX\5/gi
6s/^(.*)\|(.*)\b(\w*)\1(\w*)\b(.*)$/\1|\2XXX\5/gi
6s/^(.*)\|(.*)\b(\w*)\1(\w*)\b(.*)$/\1|\2XXX\5/gi
6s/^(.*)\|(.*)\b(\w*)\1(\w*)\b(.*)$/\1|\2XXX\5/gi
6s/^(.*)\|(.*)\b(\w*)\1(\w*)\b(.*)$/\1|\2XXX\5/gi
6s/^(.*)\|(.*)\b(\w*)\1(\w*)\b(.*)$/\1|\2XXX\5/gi
6s/^(.*)\|(.*)\b(\w*)\1(\w*)\b(.*)$/\1|\2XXX\5/gi
6s/^(.*)\|(.*)\b(\w*)\1(\w*)\b(.*)$/\1|\2XXX\5/gi
6s/^(.*)\|(.*)\b(\w*)\1(\w*)\b(.*)$/\1|\2XXX\5/gi
6s/^(.*)\|(.*)$/\2/
# Pretty descriptions
1s/.*/Whole word: &/
2s/.*/Whole word, case insensitive: &/
3s/.*/Partial word: &/
4s/.*/Partial word, case insensitive: &/
5s/.*/Partial word, overkill: &/
6s/.*/Partial word, case insensitive, overkill: &/
,p
Q
- Output:
$ cat xxx-redacted.ed | ed -lEGs xxx-redacted.input Whole word: XXX? Toms bottom tomato is in his stomach while playing the "XXX-tom" brand tom-toms. That's so tom. Whole word, case insensitive: XXX? Toms bottom tomato is in his stomach while playing the "XXX-XXX" brand XXX-toms. That's so XXX. Partial word: XXX? XXXs bottom tomato is in his stomach while playing the "XXX-tom" brand tom-toms. That's so tom. Partial word, case insensitive: XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. Partial word, overkill: XXX? XXX bottom tomato is in his stomach while playing the "XXX-tom" brand tom-toms. That's so tom. Partial word, case insensitive, overkill: XXX? XXX XXX XXX is in his XXX while playing the "XXX-XXX" brand XXX-XXX. That's so XXX.
FreeBASIC
No Complex Unicode!
Function isWordChar(c As String) As Integer
Return (c = "-" Or c >= "A" And c <= "Z" Or c >= "a" And c <= "z")
End Function
Function Redact(source As String, word As String, partial As Boolean, insensitive As Boolean, overkill As Boolean) As String
Dim As Integer i, j, beg, fin, match
Dim As String temp, s, w
temp = source
i = 1
While i <= Len(temp) - Len(word) + 1
match = 1
j = 1
While j <= Len(word)
s = Mid(temp, i + j - 1, 1)
w = Mid(word, j, 1)
If insensitive Then
If Lcase(s) <> Lcase(w) Then
match = 0
Exit While
End If
Else
If s <> w Then
match = 0
Exit While
End If
End If
j += 1
Wend
If match Then
beg = i
fin = i + Len(word) - 1
If Not partial Then
If beg > 1 And isWordChar(Mid(temp, beg - 1, 1)) Then
i += 1
Continue While
End If
If fin < Len(temp) And isWordChar(Mid(temp, fin + 1, 1)) Then
i += 1
Continue While
End If
End If
If overkill Then
While beg > 1 And isWordChar(Mid(temp, beg - 1, 1))
beg -= 1
Wend
While fin < Len(temp) And isWordChar(Mid(temp, fin + 1, 1))
fin += 1
Wend
End If
Mid(temp, beg, fin - beg + 1) = String(fin - beg + 1, "X")
End If
i += 1
Wend
Return temp
End Function
Sub Example(source As String, word As String)
Print "Redact '"; word; "':"
Print "[w|s|n] "; Redact(source, word, False, False, False)
Print "[w|i|n] "; Redact(source, word, False, True, False)
Print "[p|s|n] "; Redact(source, word, True, False, False)
Print "[p|i|n] "; Redact(source, word, True, True, False)
Print "[p|s|o] "; Redact(source, word, True, False, True)
Print "[p|i|o] "; Redact(source, word, True, True, True)
Print
End Sub
Dim As String text
text = "Tom? Toms bottom tomato is in his stomach while playing the ""Tom-tom"" brand tom-toms. That's so tom"
Example(text, "Tom")
Example(text, "tom")
Sleep
- Output:
Same as D entry.
FutureBasic
include "NSLog.incl"
local fn Redact( string as CFStringRef, target as CFStringRef, whole as BOOL, icase as BOOL, over as BOOL ) as CFStringRef
ErrorRef err = NULL
CFStringRef w = @"p", i = @"s", o = @"n", result = NULL
NSRegularExpressionOptions options = 0
CFStringRef pattern = fn RegularExpressionEscapedPattern( target )
if whole == YES
pattern = fn StringWithFormat( @"%@%@%@", @"(?<![-[^[:punct:]\\s]])", pattern, @"(?![-[^[:punct:]\\s]])" )
w = @"w"
end if
if over == YES
pattern = fn StringWithFormat( @"%@%@%@", @"[-[^[:punct:]\\s]]*", pattern, @"[-[^[:punct:]\\s]]*+" )
o = @"o"
end if
if icase == YES Then options = NSRegularExpressionCaseInsensitive : i = @"i"
RegularExpressionRef regex = fn RegularExpressionWithPattern( pattern, options, @err )
if err then NSLog( @"%@", fn ErrorLocalizedDescription( err ) )
CFMutableStringRef mutStr = fn MutableStringWithString( string )
CFArrayRef matches = fn RegularExpressionMatches( regex, mutStr, 0, fn CFRangeMake( 0, len( mutStr ) ) )
long x, count = len(matches)
for x = 0 to count - 1
CFRange matchRange = fn ValueRange( fn ObjectValueForKey( matches[x], @"range" ) )
MutableStringReplaceOccurrencesOfString( mutStr, @".(?:\\u200d.)*+", @"X", NSRegularExpressionSearch, matchRange )
next
result = fn StringWithFormat( @"[%@|%@|%@] %@", w, i, o, mutStr )
end fn = result
CFStringRef tomTest
tomTest = @"Tom? Toms bottom tomato is in his stomach while playing the \"Tom-tom\" brand tom-toms. That's so tom."
NSLog( @"Test string:\n%@\n\nRedact 'Tom':", tomTest )
NSLog( @"%@", fn Redact( tomTest, @"Tom", YES, NO, NO ) )
NSLog( @"%@", fn Redact( tomTest, @"Tom", YES, YES, NO ) )
NSLog( @"%@", fn Redact( tomTest, @"Tom", NO, NO, NO ) )
NSLog( @"%@", fn Redact( tomTest, @"Tom", NO, YES, NO ) )
NSLog( @"%@", fn Redact( tomTest, @"Tom", NO, NO, YES ) )
NSLog( @"%@", fn Redact( tomTest, @"Tom", NO, YES, YES ) )
NSLog( @"\nRedact 'tom':" )
NSLog( @"%@", fn Redact( tomTest, @"tom", YES, NO, NO ) )
NSLog( @"%@", fn Redact( tomTest, @"tom", YES, YES, NO ) )
NSLog( @"%@", fn Redact( tomTest, @"tom", NO, NO, NO ) )
NSLog( @"%@", fn Redact( tomTest, @"tom", NO, YES, NO ) )
NSLog( @"%@", fn Redact( tomTest, @"tom", NO, NO, YES ) )
NSLog( @"%@", fn Redact( tomTest, @"tom", NO, YES, YES ) )
NSLogSetFont( fn FontWithName( @"Menlo", 18.0 ) )
NSLog( @"\n π§ π¨ π§ π¨βπ©βπ¦" )
NSLog( @"Redact 'π¨': %@", fn Redact( @"π§ π¨ π§ π¨βπ©βπ¦", @"π¨", YES, YES, YES ) )
NSLog( @"Redact 'π¨βπ©βπ¦': %@", fn Redact( @"π§ π¨ π§ π¨βπ©βπ¦", @"π¨βπ©βπ¦", YES, YES, YES ) )
HandleEvents
- Output:
Test string: Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. Redact 'Tom': [w|s|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. [w|i|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [p|s|n] XXX? XXXs bottom tomato is in his stomach while playing the "XXX-tom" brand tom-toms. That's so tom. [p|i|n] XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. [p|s|o] XXX? XXXX bottom tomato is in his stomach while playing the "XXXXXXX" brand tom-toms. That's so tom. [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. Redact 'tom': [w|s|n] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [w|i|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [p|s|n] Tom? Toms botXXX XXXato is in his sXXXach while playing the "Tom-XXX" brand XXX-XXXs. That's so XXX. [p|i|n] XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. [p|s|o] Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. π§ π¨ π§ π¨βπ©βπ¦ Redact 'π¨': [w|i|o] π§ X π§ π¨βπ©βπ¦ Redact 'π¨βπ©βπ¦': [w|i|o] π§ π¨ π§ X
Go
Go has a problem with zero width joiner (ZWJ) emojis such as the final one in the test string which is not recognized as a single 'character' by the language as it consists of five Unicode code-points (or 'runes') instead of one. This problem is aggravated (as here) when one of the constituents of the ZWJ emoji happens to be a 'normal' emoji contained within the same test string!
Care is therefore needed to ensure that when a normal emoji is being redacted it doesn't also redact one of the constituents of a ZWJ emoji.
To get the number of 'X's right where a ZWJ emoji or other character combination is being replaced, a third party library function is used which counts the number of graphemes in a string, as required by the task.
package main
import (
"fmt"
"github.com/rivo/uniseg"
"log"
"regexp"
"strings"
)
func join(words, seps []string) string {
lw := len(words)
ls := len(seps)
if lw != ls+1 {
log.Fatal("mismatch between number of words and separators")
}
var sb strings.Builder
for i := 0; i < ls; i++ {
sb.WriteString(words[i])
sb.WriteString(seps[i])
}
sb.WriteString(words[lw-1])
return sb.String()
}
func redact(text, word, opts string) {
var partial, overkill bool
exp := word
if strings.IndexByte(opts, 'p') >= 0 {
partial = true
}
if strings.IndexByte(opts, 'o') >= 0 {
overkill = true
}
if strings.IndexByte(opts, 'i') >= 0 {
exp = `(?i)` + exp
}
rgx := regexp.MustCompile(`[\s!-&(-,./:-@[-^{-~]+`) // all punctuation except -'_
seps := rgx.FindAllString(text, -1)
words := rgx.Split(text, -1)
rgx2 := regexp.MustCompile(exp)
for i, w := range words {
match := rgx2.FindString(w)
// check there's a match and it's not part of a ZWJ emoji
if match == "" || strings.Index(w, match+"\u200d") >= 0 ||
strings.Index(w, "\u200d"+match) >= 0 {
continue
}
switch {
case overkill:
words[i] = strings.Repeat("X", uniseg.GraphemeClusterCount(w))
case !partial:
if words[i] == match {
words[i] = strings.Repeat("X", uniseg.GraphemeClusterCount(w))
}
case partial:
repl := strings.Repeat("X", uniseg.GraphemeClusterCount(word))
words[i] = rgx2.ReplaceAllLiteralString(w, repl)
}
}
fmt.Printf("%s %s\n\n", opts, join(words, seps))
}
func printResults(text string, allOpts, allWords []string) {
fmt.Printf("Text: %s\n\n", text)
for _, word := range allWords {
fmt.Printf("Redact '%s':\n", word)
for _, opts := range allOpts {
redact(text, word, opts)
}
}
fmt.Println()
}
func main() {
text := `Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom.
'Tis very tomish, don't you think?`
allOpts := []string{"[w|s|n]", "[w|i|n]", "[p|s|n]", "[p|i|n]", "[p|s|o]", "[p|i|o]"}
allWords := []string{"Tom", "tom", "t"}
printResults(text, allOpts, allWords)
text = "π§ π¨ π§ π¨βπ©βπ¦"
allOpts = []string{"[w]"}
allWords = []string{"π¨", "π¨βπ©βπ¦"}
printResults(text, allOpts, allWords)
text = "Argentinaπ§π¦πΉ Franceπ¨π«π· Germanyπ§π©πͺ Netherlandsπ¨βπ©βπ¦π³π±"
allOpts = []string{"[p]", "[p|o]"}
printResults(text, allOpts, allWords)
}
- Output:
Text: Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. 'Tis very tomish, don't you think? Redact 'Tom': [w|s|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. 'Tis very tomish, don't you think? [w|i|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. 'Tis very tomish, don't you think? [p|s|n] XXX? XXXs bottom tomato is in his stomach while playing the "XXX-tom" brand tom-toms. That's so tom. 'Tis very tomish, don't you think? [p|i|n] XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. 'Tis very XXXish, don't you think? [p|s|o] XXX? XXXX bottom tomato is in his stomach while playing the "XXXXXXX" brand tom-toms. That's so tom. 'Tis very tomish, don't you think? [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. 'Tis very XXXXXX, don't you think? Redact 'tom': [w|s|n] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. 'Tis very tomish, don't you think? [w|i|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. 'Tis very tomish, don't you think? [p|s|n] Tom? Toms botXXX XXXato is in his sXXXach while playing the "Tom-XXX" brand XXX-XXXs. That's so XXX. 'Tis very XXXish, don't you think? [p|i|n] XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. 'Tis very XXXish, don't you think? [p|s|o] Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. 'Tis very XXXXXX, don't you think? [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. 'Tis very XXXXXX, don't you think? Redact 't': [w|s|n] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. 'Tis very tomish, don't you think? [w|i|n] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. 'Tis very tomish, don't you think? [p|s|n] Tom? Toms boXXom XomaXo is in his sXomach while playing Xhe "Tom-Xom" brand Xom-Xoms. ThaX's so Xom. 'Tis very Xomish, don'X you Xhink? [p|i|n] Xom? Xoms boXXom XomaXo is in his sXomach while playing Xhe "Xom-Xom" brand Xom-Xoms. XhaX's so Xom. 'Xis very Xomish, don'X you Xhink? [p|s|o] Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing XXX "XXXXXXX" brand XXXXXXXX. XXXXXX so XXX. 'Tis very XXXXXX, XXXXX you XXXXX? [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing XXX "XXXXXXX" brand XXXXXXXX. XXXXXX so XXX. XXXX very XXXXXX, XXXXX you XXXXX? Text: π§ π¨ π§ π¨βπ©βπ¦ Redact 'π¨': [w] π§ X π§ π¨βπ©βπ¦ Redact 'π¨βπ©βπ¦': [w] π§ π¨ π§ X Text: Argentinaπ§π¦πΉ Franceπ¨π«π· Germanyπ§π©πͺ Netherlandsπ¨βπ©βπ¦π³π± Redact 'π¨': [p] Argentinaπ§π¦πΉ FranceXπ«π· Germanyπ§π©πͺ Netherlandsπ¨βπ©βπ¦π³π± [p|o] Argentinaπ§π¦πΉ XXXXXXXX Germanyπ§π©πͺ Netherlandsπ¨βπ©βπ¦π³π± Redact 'π¨βπ©βπ¦': [p] Argentinaπ§π¦πΉ Franceπ¨π«π· Germanyπ§π©πͺ NetherlandsXπ³π± [p|o] Argentinaπ§π¦πΉ Franceπ¨π«π· Germanyπ§π©πͺ XXXXXXXXXXXXX
Julia
The solution must kludge a check with the variable "multichar" to properly substitute "X" instead of "XXXX" with the last example. Otherwise Julia (v 1.4) interprets one 184-bit Unicode extended emoji character as four Unicode characters.
function doif_equals(word, pattern, insens=false)
regex = insens ? Regex("^$pattern\$", "i") : Regex("^$pattern\$")
return replace(word, regex => pattern in multichars ? "X" : "X"^length(pattern))
end
doif_ci_equals(word, pattern) = doif_equals(word, pattern, true)
function doif_includes(word, pattern, insens=false)
regex = insens ? Regex(pattern, "i") : Regex(pattern)
return replace(word, regex => "X"^length(pattern))
end
doif_ci_includes(word, pattern) = doif_includes(word, pattern, true)
function overkill(word, pattern, insens=false)
regex = insens ? Regex(pattern, "i") : Regex(pattern)
return occursin(regex, word) ? "X"^length(word) : word
end
ci_overkill(word, pattern) = overkill(word, pattern, true)
const method = Dict(
"[w|s|n]" => doif_equals,
"[w|i|n]" => doif_ci_equals,
"[p|s|n]" => doif_includes,
"[p|i|n]" => doif_ci_includes,
"[p|s|o]" => overkill,
"[p|i|o]" => ci_overkill
)
const multichars = Set(["π¨βπ©βπ¦", ])
function redact(teststring, pattern)
ws = split(teststring, r"[^ \?\"\.]+")
words = filter(!=(""), split(teststring, r"[\s\?\"\.]+"))
fs = popfirst!(words)
f = method[fs]
return fs * ws[2] * mapreduce(i -> f(words[i], pattern) * ws[i + 2], *, 1:length(words))
end
const testtext = """
[w|s|n] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom.
[w|i|n] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom.
[p|s|n] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom.
[p|i|n] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom.
[p|s|o] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom.
[p|i|o] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom.
"""
const stretchtext = "[w|s|n] π§ π¨ π§ π¨βπ©βπ¦"
for test in [(testtext, ["Tom", "tom"]), (stretchtext, ["π¨", "π¨βπ©βπ¦"])]
for pat in test[2]
println("\nRedact pattern \"$pat\":")
for teststring in string.(split(strip(test[1]), r"\n"))
println(redact(teststring, pat))
end
end
println()
end
- Output:
Redact pattern "Tom": [w|s|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. [w|i|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [p|s|n] XXX? XXXs bottom tomato is in his stomach while playing the "XXX-tom" brand tom-toms. That's so tom. [p|i|n] XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. [p|s|o] XXX? XXXX bottom tomato is in his stomach while playing the "XXXXXXX" brand tom-toms. That's so tom. [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. Redact pattern "tom": [w|s|n] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [w|i|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [p|s|n] Tom? Toms botXXX XXXato is in his sXXXach while playing the "Tom-XXX" brand XXX-XXXs. That's so XXX. [p|i|n] XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. [p|s|o] Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. Redact pattern "π¨": [w|s|n] π§ X π§ π¨βπ©βπ¦ Redact pattern "π¨βπ©βπ¦": [w|s|n] π§ π¨ π§ X
Lua
Note: The syntax-highlighter used here for Lua appears to be confused by the nested quote styles, but the syntax is valid as written.
function redact(text, targ, opts)
local part, case, ovrk = opts:find("p")~=nil, opts:find("s")~=nil, opts:find("o")~=nil
local oknp = ovrk or not part
local patt = oknp and "([%w%-]+)" or "(%w+)"
local ci = case and function(s) return s end or function(s) return s:lower() end
local matches = function(s,w) return part and ci(s):find(ci(w))~=nil or ci(s)==ci(w) end
local replace = function(s,w) return oknp and string.rep("X",#s) or ci(s):gsub(ci(w), string.rep("X",#w)) end
return text:gsub(patt, function(word) return matches(word,targ) and replace(word,targ) or word end)
end
text = [[Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom.]]
targlist, optslist = { "Tom", "tom" }, { "[w|s|n]", "[w|i|n]", "[p|s|n]", "[p|i|n]", "[p|s|o]", "[p|i|o]" }
for _,targ in ipairs(targlist) do
print("Redact '"..targ.."':")
for _,opts in ipairs(optslist) do
print(opts .. " " .. redact(text, targ, opts))
end
end
- Output:
Redact 'Tom': [w|s|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. [w|i|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [p|s|n] XXX? XXXs bottom tomato is in his stomach while playing the "XXX-tom" brand tom-toms. That's so tom. [p|i|n] XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. [p|s|o] XXX? XXXX bottom tomato is in his stomach while playing the "XXXXXXX" brand tom-toms. That's so tom. [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. Redact 'tom': [w|s|n] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [w|i|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [p|s|n] Tom? Toms botXXX XXXato is in his sXXXach while playing the "Tom-XXX" brand XXX-XXXs. That's so XXX. [p|i|n] XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. [p|s|o] Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX.
Perl
use strict;
use warnings;
my $test = <<END;
Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom.
'Tis very tomish, don't you think?
END
sub redact {
my($str, $redact, %opt) = @_;
my $insensitive = $opt{'i'} or 0;
my $partial = $opt{'p'} or 0;
my $overkill = $opt{'o'} or 0;
my $rx =
$insensitive ?
$partial ?
$overkill ? qr/ \b{wb} ((?i)[-\w_]* [\w*']* $redact [-'\w]* \S*?) \b{wb} /x
: qr/ ((?i)$redact) /x
: qr/ \b{wb}(?<!-) ((?i)$redact) (?!-)\b{wb} /x
:
$partial ?
$overkill ? qr/ \b{wb} ([-\w]* [\w*']* $redact [-'\w]* \S*?) \b{wb} /x
: qr/ ($redact) /x
: qr/ \b{wb}(?<!-) ($redact) (?!-)\b{wb} /x
;
$str =~ s/($rx)/'X' x length $1/gre;
}
for my $redact (<Tom tom t>) {
print "\nRedact '$redact':\n";
for (['[w|s|n]', {}],
['[w|i|n]', {i=>1}],
['[p|s|n]', {p=>1}],
['[p|i|n]', {p=>1, i=>1}],
['[p|s|o]', {p=>1, o=>1}],
['[p|i|o]', {p=>1, i=>1, o=>1}]
) {
my($option, $opts) = @$_;
no strict 'refs';
printf "%s %s\n", $option, redact($test, $redact, %$opts)
}
}
- Output:
Redact 'Tom': [w|s|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. 'Tis very tomish, don't you think? [w|i|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. 'Tis very tomish, don't you think? [p|s|n] XXX? XXXs bottom tomato is in his stomach while playing the "XXX-tom" brand tom-toms. That's so tom. 'Tis very tomish, don't you think? [p|i|n] XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. 'Tis very XXXish, don't you think? [p|s|o] XXX? XXXX bottom tomato is in his stomach while playing the "XXXXXXX" brand tom-toms. That's so tom. 'Tis very tomish, don't you think? [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. 'Tis very XXXXXX, don't you think? Redact 'tom': [w|s|n] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. 'Tis very tomish, don't you think? [w|i|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. 'Tis very tomish, don't you think? [p|s|n] Tom? Toms botXXX XXXato is in his sXXXach while playing the "Tom-XXX" brand XXX-XXXs. That's so XXX. 'Tis very XXXish, don't you think? [p|i|n] XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. 'Tis very XXXish, don't you think? [p|s|o] Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. 'Tis very XXXXXX, don't you think? [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. 'Tis very XXXXXX, don't you think? Redact 't': [w|s|n] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. 'Tis very tomish, don't you think? [w|i|n] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. 'Tis very tomish, don't you think? [p|s|n] Tom? Toms boXXom XomaXo is in his sXomach while playing Xhe "Tom-Xom" brand Xom-Xoms. ThaX's so Xom. 'Tis very Xomish, don'X you Xhink? [p|i|n] Xom? Xoms boXXom XomaXo is in his sXomach while playing Xhe "Xom-Xom" brand Xom-Xoms. XhaX's so Xom. 'Xis very Xomish, don'X you Xhink? [p|s|o] Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing XXX "XXXXXXX" brand XXXXXXXX. XXXXXX so XXX. 'Tis very XXXXXX, XXXXX you XXXXX? [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing XXX "XXXXXXX" brand XXXXXXXX. XXXXXX so XXX. XXXX very XXXXXX, XXXXX you XXXXX?
Phix
You can run this online here. Note the windows console makes a complete mockery of those unicode characters.
Written on the assumption that overkill implies partial (see talk page).
utf32_length() fashioned after Reverse_a_string#Phix with added ZWJ - I do not expect it to be entirely complete.
enum WHOLE,PARTIAL,OVERKILL,INSENSITIVE constant spunc = " \r\n.?\"" -- spaces and punctuation function utf32_length(sequence utf32) integer l = length(utf32) for i=1 to l do integer ch = utf32[i] if (ch>=0x300 and ch<=0x36f) or (ch>=0x1dc0 and ch<=0x1dff) or (ch>=0x20d0 and ch<=0x20ff) or (ch>=0xfe20 and ch<=0xfe2f) then l -= 1 elsif ch=0x200D then -- ZERO WIDTH JOINER l -= 2 end if end for return l end function function redact(string text, word, integer options) sequence t_utf32 = utf8_to_utf32(text), l_utf32 = t_utf32, w_utf32 = utf8_to_utf32(word) string opt = "[?|s]" if options>INSENSITIVE then options -= INSENSITIVE opt[4] = 'i' l_utf32 = lower(t_utf32) w_utf32 = lower(w_utf32) end if opt[2] = "wpo"[options] integer idx = 1 while true do idx = match(w_utf32,l_utf32,idx) if idx=0 then exit end if integer edx = idx+length(w_utf32)-1 if options=WHOLE then if (idx=1 or find(l_utf32[idx-1],spunc)) and (edx=length(l_utf32) or find(l_utf32[edx+1],spunc)) then t_utf32[idx..edx] = repeat('X',utf32_length(t_utf32[idx..edx])) end if elsif options=PARTIAL or options=OVERKILL then if options=OVERKILL then while idx>1 and not find(l_utf32[idx-1],spunc) do idx -= 1 end while while edx<length(l_utf32) and not find(l_utf32[edx+1],spunc) do edx += 1 end while end if t_utf32[idx..edx] = repeat('X',utf32_length(t_utf32[idx..edx])) end if idx = edx+1 end while text = utf32_to_utf8(t_utf32) return {opt,text} end function constant test = ` Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom.`, tests = {"Tom","tom","t"} for t=1 to length(tests) do printf(1,"Redact %s:\n",{tests[t]}) for o=WHOLE to OVERKILL do printf(1,"%s:%s\n",redact(test,tests[t],o)) printf(1,"%s:%s\n",redact(test,tests[t],o+INSENSITIVE)) end for end for constant ut = "π§ π¨ π§ π¨βπ©βπ¦", fmt = """ %s Redact π¨ %s %s Redact π¨βπ©βπ¦ %s %s """ printf(1,fmt,{ut}&redact(ut,"π¨",WHOLE)&redact(ut,"π¨βπ©βπ¦",WHOLE))
- Output:
w/p/o means whole/partial/overkill word, s/i means case sensitive/insensitive.
Redact Tom: [w|s]:XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. [w|i]:XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [p|s]:XXX? XXXs bottom tomato is in his stomach while playing the "XXX-tom" brand tom-toms. That's so tom. [p|i]:XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. [o|s]:XXX? XXXX bottom tomato is in his stomach while playing the "XXXXXXX" brand tom-toms. That's so tom. [o|i]:XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. Redact tom: [w|s]:Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [w|i]:XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [p|s]:Tom? Toms botXXX XXXato is in his sXXXach while playing the "Tom-XXX" brand XXX-XXXs. That's so XXX. [p|i]:XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. [o|s]:Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. [o|i]:XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. Redact t: [w|s]:Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. [w|i]:Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. [p|s]:Tom? Toms boXXom XomaXo is in his sXomach while playing Xhe "Tom-Xom" brand Xom-Xoms. ThaX's so Xom. [p|i]:Xom? Xoms boXXom XomaXo is in his sXomach while playing Xhe "Xom-Xom" brand Xom-Xoms. XhaX's so Xom. [o|s]:Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing XXX "XXXXXXX" brand XXXXXXXX. XXXXXX so XXX. [o|i]:XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing XXX "XXXXXXX" brand XXXXXXXX. XXXXXX so XXX. β βββββββββββββββπ§βπ¨βπ§βπ¨βπ©βπ¦ Redactβπ¨β[w|s]βπ§βXXβπ§βπ¨βπ©βπ¦ Redactβπ¨βπ©βπ¦β[w|s]βπ§βπ¨βπ§βXXXX
Python
#!/usr/bin/env python3
def redact(source, word, partial=False, insensitive=False, overkill=False):
def different(s, w):
if insensitive:
return s.upper() != w.upper()
else:
return s != w
def is_word_char(c):
return c == '-' or c.isalpha()
temp = list(source)
i = 0
while i <= len(temp) - len(word):
match = True
j = 0
while j < len(word):
if different(temp[i + j], word[j]):
match = False
break
j += 1
if match:
beg = i
end = i + len(word)
if not partial:
if beg > 0 and is_word_char(temp[beg - 1]):
i += 1
continue
if end < len(temp) and is_word_char(temp[end]):
i += 1
continue
if overkill:
while beg > 0 and is_word_char(temp[beg - 1]):
beg -= 1
while end < len(temp) - 1 and is_word_char(temp[end]):
end += 1
for k in range(beg, end):
temp[k] = 'X'
i += 1
return ''.join(temp)
def example(source, word):
print("Redact '", word, "':")
print("[w|s|n]", redact(source, word, False, False, False))
print("[w|i|n]", redact(source, word, False, True, False))
print("[p|s|n]", redact(source, word, True, False, False))
print("[p|i|n]", redact(source, word, True, True, False))
print("[p|s|o]", redact(source, word, True, False, True))
print("[p|i|o]", redact(source, word, True, True, True))
print()
if __name__ == "__main__":
text = 'Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That\'s so tom'
example(text, "Tom")
example(text, "tom")
Raku
sub redact ( Str $str, Str $redact, :i(:$insensitive) = False, :p(:$partial) = False, :o(:$overkill) = False ) {
my $rx =
$insensitive ??
$partial ??
$overkill ?? rx/:i <?after ^ | <:Po> | \s > (<[\w<:!Po>-]>*? [\w*\']? $redact [\w*\'\w+]? \S*?) <?before $ | <:Po> | \s > / #'
!! rx/:i ($redact) /
!! rx/:i <?after ^ | [\s<:Po>] | \s > ($redact) <?before $ | <:Po> | \s > /
!!
$partial ??
$overkill ?? rx/ <?after ^ | <:Po> | \s > (<[\w<:!Po>-]>*? [\w*\']? $redact [\w*\'\w+]? \S*?) <?before $ | <:Po> | \s > / #'
!! rx/ ($redact) /
!! rx/ <?after ^ | [\s<:Po>] | \s > ($redact) <?before $ | <:Po> | \s > /
;
$str.subst( $rx, {'X' x $0.chars}, :g )
}
my %*SUB-MAIN-OPTS = :named-anywhere;
# Operate on a given file with the given parameters
multi MAIN (
Str $file, #= File name with path
Str $target, #= Redact target text string
:$i = False, #= Case insensitive flag
:$p = False, #= Partial words flag
:$o = False #= Overkill flag
) { put $file.IO.slurp.&redact( $target, :i($i), :p($p), :o($o) ) }
# Operate on the internal strings / parameters
multi MAIN () {
# TESTING
my $test = q:to/END/;
Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom.
'Tis very tomish, don't you think?
END
#'
for 'Tom', 'tom', 't' -> $redact {
say "\nRedact '$redact':";
for '[w|s|n]', $redact, {},
'[w|i|n]', $redact, {:i},
'[p|s|n]', $redact, {:p},
'[p|i|n]', $redact, {:p, :i},
'[p|s|o]', $redact, {:p, :o},
'[p|i|o]', $redact, {:p, :i, :o}
-> $option, $str, %opts { printf "%s %s\n", $option, $test.&redact($str, |%opts) }
}
my $emoji = 'π§ π¨ π§ π¨βπ©βπ¦';
printf "%20s %s\n", '', $emoji;
printf "%20s %s\n", "Redact 'π¨' [w]", $emoji.&redact('π¨');
printf "%20s %s\n", "Redact 'π¨βπ©βπ¦' [w]", $emoji.&redact('π¨βπ©βπ¦');
# Even more complicated Unicode
$emoji = 'Argentinaπ§π¦πΉ Franceπ¨π«π· Germanyπ§π©πͺ Netherlandsπ¨βπ©βπ¦π³π±';
printf "\n%20s %s\n", '', $emoji;
printf "%20s %s\n", "Redact 'π¨' [p]", $emoji.&redact('π¨', :p);
printf "%20s %s\n", "Redact 'π¨βπ©βπ¦' [p]", $emoji.&redact('π¨βπ©βπ¦', :p);
printf "%20s %s\n", "Redact 'π¨' [p|o]", $emoji.&redact('π¨', :p, :o);
printf "%20s %s\n", "Redact 'π¨βπ©βπ¦' [p|o]", $emoji.&redact('π¨βπ©βπ¦', :p, :o);
}
- Output:
Redact 'Tom': [w|s|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. 'Tis very tomish, don't you think? [w|i|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. 'Tis very tomish, don't you think? [p|s|n] XXX? XXXs bottom tomato is in his stomach while playing the "XXX-tom" brand tom-toms. That's so tom. 'Tis very tomish, don't you think? [p|i|n] XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. 'Tis very XXXish, don't you think? [p|s|o] XXX? XXXX bottom tomato is in his stomach while playing the "XXXXXXX" brand tom-toms. That's so tom. 'Tis very tomish, don't you think? [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. 'Tis very XXXXXX, don't you think? Redact 'tom': [w|s|n] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. 'Tis very tomish, don't you think? [w|i|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. 'Tis very tomish, don't you think? [p|s|n] Tom? Toms botXXX XXXato is in his sXXXach while playing the "Tom-XXX" brand XXX-XXXs. That's so XXX. 'Tis very XXXish, don't you think? [p|i|n] XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. 'Tis very XXXish, don't you think? [p|s|o] Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. 'Tis very XXXXXX, don't you think? [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. 'Tis very XXXXXX, don't you think? Redact 't': [w|s|n] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. 'Tis very tomish, don't you think? [w|i|n] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. 'Tis very tomish, don't you think? [p|s|n] Tom? Toms boXXom XomaXo is in his sXomach while playing Xhe "Tom-Xom" brand Xom-Xoms. ThaX's so Xom. 'Tis very Xomish, don'X you Xhink? [p|i|n] Xom? Xoms boXXom XomaXo is in his sXomach while playing Xhe "Xom-Xom" brand Xom-Xoms. XhaX's so Xom. 'Xis very Xomish, don'X you Xhink? [p|s|o] Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing XXX "XXXXXXX" brand XXXXXXXX. XXXXXX so XXX. 'Tis very XXXXXX, XXXXX you XXXXX? [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing XXX "XXXXXXX" brand XXXXXXXX. XXXXXX so XXX. XXXX very XXXXXX, XXXXX you XXXXX? π§ π¨ π§ π¨βπ©βπ¦ Redact 'π¨' [w] π§ X π§ π¨βπ©βπ¦ Redact 'π¨βπ©βπ¦' [w] π§ π¨ π§ X Argentinaπ§π¦πΉ Franceπ¨π«π· Germanyπ§π©πͺ Netherlandsπ¨βπ©βπ¦π³π± Redact 'π¨' [p] Argentinaπ§π¦πΉ FranceXπ«π· Germanyπ§π©πͺ Netherlandsπ¨βπ©βπ¦π³π± Redact 'π¨βπ©βπ¦' [p] Argentinaπ§π¦πΉ Franceπ¨π«π· Germanyπ§π©πͺ NetherlandsXπ³π± Redact 'π¨' [p|o] Argentinaπ§π¦πΉ XXXXXXXX Germanyπ§π©πͺ Netherlandsπ¨βπ©βπ¦π³π± Redact 'π¨βπ©βπ¦' [p|o] Argentinaπ§π¦πΉ Franceπ¨π«π· Germanyπ§π©πͺ XXXXXXXXXXXXX
REXX
REXX doesn't have regular expressions, so I had to roll-my-own parsing.
/*REXX program redacts a string (using Xs) from a text, depending upon specified options*/
zDefault= 'Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom"' ,
"brand tom-toms. That's so tom."
parse arg x '~' z /*obtain optional arguments from the CL*/
if x=='' | x=="," then x= 'Tom tom t' /*Not specified? Then use the default.*/
if z= '' | z= "," then z= zDefault /* " " " " " " */
options= 'wβsβn wβiβn pβsβn pβiβn pβsβo pβiβo' /*most glyphs can be used instead of β */
call build /*build some stemmed arrays for REDACT.*/
do j=1 for words(x); q= word(x, j) /*process each of the needle strings.*/
if j==1 then say 'haystack' z /*show a title if this is the 1at time.*/
say; say 'needle: ' q
do k=1 for words(options); useOpt= word(options, k)
say ' ['useOpt"]" redact(useOpt, q)
end /*k*/
end /*j*/
exit 0 /*stick a fork in it, we're all done. */
/*ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ*/
build: #= words(z); ?= 'β'; !.=
do i=1 for #; n= word(z, i); n= elide(n, 'HEAD',1 )
n= elide(n, 'TAIL',,,-1)
@.0.i= n; upper n; @.1.i= n
end /*k*/; return
/*ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ*/
elide: parse arg $, hot, LO, HI, inc; L= length($); inc= word(inc 1, 1)
if LO=='' then LO=L; if HI=='' then HI= L
do k=LO for HI by inc; _= substr($, k, 1)
if datatype(_, 'M') then leave; !.hot.i= !.hot.i || _
if inc==1 then $= substr($, 2) /*hot β‘ Heads Or Tails*/
else $= left($, length($) - 1)
end /*k*/; return $ /*elides punctuation. */
/*ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ*/
opt: parse arg wop; oU= option; upper oU wop; return pos(wop, oU) \== 0
/*ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ*/
redact: parse arg option,qq; nz=; lu= 0; ww= opt('w'); pp= \ww; ii= opt('i'); oo= opt('o')
qu= qq; upper qu; Lqq= length(qq); if ii then do; upper qq; lu= 1; end
do r=1 for #; a= @.lu.r; na= @.0.r; La= length(a)
if ww then if a==qq then na= copies(?, Lqq)
if pp then do 1; _= pos(qq, a); if _==0 then leave
nn= na; if ii then upper nn
do La; _= pos(qq, nn); if _==0 then leave
na= overlay(?, na, _, Lqq, ?);
nn= na; if ii then upper nn
end /*La*/
end /*1*/
if oo then if pos(?, na)\==0 then na= copies(?, length(na) )
nz= nz !.head.r || na || !.tail.r
end /*r*/
return strip( translate(nz, 'X', ?) )
- output when using the default inputs:
haystack Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. needle: Tom [wβsβn] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. [wβiβn] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [pβsβn] XXX? XXXs bottom tomato is in his stomach while playing the "XXX-tom" brand tom-toms. That's so tom. [pβiβn] XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. [pβsβo] XXX? XXXX bottom tomato is in his stomach while playing the "XXXXXXX" brand tom-toms. That's so tom. [pβiβo] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. needle: tom [wβsβn] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [wβiβn] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [pβsβn] Tom? Toms botXXX XXXato is in his sXXXach while playing the "Tom-XXX" brand XXX-XXXs. That's so XXX. [pβiβn] XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. [pβsβo] Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. [pβiβo] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. needle: t [wβsβn] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. [wβiβn] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. [pβsβn] Tom? Toms boXXom XomaXo is in his sXomach while playing Xhe "Tom-Xom" brand Xom-Xoms. ThaX's so Xom. [pβiβn] Xom? Xoms boXXom XomaXo is in his sXomach while playing Xhe "Xom-Xom" brand Xom-Xoms. XhaX's so Xom. [pβsβo] Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing XXX "XXXXXXX" brand XXXXXXXX. XXXXXX so XXX. [pβiβo] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing XXX "XXXXXXX" brand XXXXXXXX. XXXXXX so XXX.
RPL
βͺ DUP2 SWAP 1 DUP SUB POS SIGN DEPTH β text seps wip stack @ wip is a flag for a word in progress βͺ wip NOT 1 + 1 text SIZE FOR j text j DUP SUB IF seps OVER POS wip XOR THEN + ELSE 1 wip - 'wip' STO END NEXT DEPTH stack - 3 + βLIST β« β« 'βTKN' STO @ ( "string" β { wordpos "word" "sep" "word" .. } ) βͺ "" 1 3 PICK SIZE FOR c OVER c DUP SUB NUM RβB #20h OR BβR CHR + NEXT SWAP DROP β« 'LOWER' STO βͺ β text pos rep βͺ pos 1 > text 1 pos 1 - SUB "" IFTE rep + text pos rep SIZE + OVER SIZE SUB + β« β« 'REPL' STO @ mimics HP-48+ instruction only for strings βͺ DUP2 LOWER SWAP LOWER "" β token word lord loken xxx βͺ xxx 1 7 FS? 9 FS? OR token word IFTE SIZE START "X" + NEXT IF 7 FC? 9 FC? AND THEN 'xxx' STO 1 SF loken token DO 8 FC? OVER word POS 4 PICK lord POS IFTE IF DUP NOT THEN DROP 1 CF ELSE IF 8 FS? THEN ROT OVER xxx REPL SWAP END xxx REPL END UNTIL 1 FC? END SWAP DROP END β« β« 'XREPL' STO @ ( "word" "rd" β "woXX" | "XXXX" ) βͺ DUP 1 GET OVER SIZE FOR t DUP t GET LOWER t SWAP PUT 2 STEP β« 'LOTKN' STO @ ( { "TOKENS" } β { "tokens" } ) βͺ 7 9 FOR f f CF NEXT IF DUP 1 DUP SUB "W" == THEN 7 SF END IF DUP 2 DUP SUB "I" == THEN 8 SF END IF 3 DUP SUB "O" == THEN 9 SF END IF 8 FS? THEN LOWER END SWAP " ,.?'βͺβ«" βTKN DUP LOTKN β word tokens lokens βͺ "" tokens 1 GET DUP 2 MOD ROT ROT tokens SIZE FOR w 8 FS? 'lokens' 'tokens' IFTE w GET IF 3 PICK w 2 MOD == THEN tokens w GET SWAP IF word 7 FS? βͺ == β« βͺ POS β« IFTE THEN word XREPL END END + NEXT β« β« 'RDACT' STO @ ( "text" "word" "PAR" β "text" ) βͺ "Tom? Toms bottom tomato is in his stomach while playing the βͺTom-tomβ« brand tom-toms. That's so tom." { "WSN" "WIN" "PSN" "PIN" "PSO" "PIO" } β sentence cases βͺ { } 1 6 FOR k sentence "Tom" cases k GET RDACT + NEXT 1 6 FOR k sentence "tom" cases k GET RDACT + NEXT β« β« 'TASK' STO
- Output:
1: { "XXX? Toms bottom tomato is in his stomach while playing the βͺTom-tomβ« brand tom-toms. That's so tom." "XXX? Toms bottom tomato is in his stomach while playing the βͺTom-tomβ« brand tom-toms. That's so XXX." "XXX? XXXs bottom tomato is in his stomach while playing the βͺXXX-tomβ« brand tom-toms. That's so tom." "XXX? XXXs botXXX XXXato is in his sXXXach while playing the βͺXXX-XXXβ« brand XXX-XXXs. That's so XXX." "XXX? XXXX bottom tomato is in his stomach while playing the βͺXXXXXXXβ« brand tom-toms. That's so tom." "XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the βͺXXXXXXXβ« brand XXXXXXXX. That's so XXX." "Tom? Toms bottom tomato is in his stomach while playing the βͺTom-tomβ« brand tom-toms. That's so XXX." "XXX? Toms bottom tomato is in his stomach while playing the βͺTom-tomβ« brand tom-toms. That's so XXX." "Tom? Toms botXXX XXXato is in his sXXXach while playing the βͺTom-XXXβ« brand XXX-XXXs. That's so XXX." "XXX? XXXs botXXX XXXato is in his sXXXach while playing the βͺXXX-XXXβ« brand XXX-XXXs. That's so XXX." "Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing the βͺXXXXXXXβ« brand XXXXXXXX. That's so XXX." "XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the βͺXXXXXXXβ« brand XXXXXXXX. That's so XXX." }
Rust
use bitflags::bitflags;
bitflags! {
pub struct RedactOptions: u8 {
const NoOptions = 0;
const WholeWord = 1;
const Overkill = 2;
const CaseInsensitive = 4;
}
}
fn is_word_char(ch: &char) -> bool {
return ch == &'-' || ch.is_alphabetic();
}
// Performs in-place redaction of the target with the specified options.
fn redact(text: &mut String, target: &str, options: &RedactOptions) {
let target_length = target.len();
if target_length == 0 {
return;
}
let mut start_pos = 0;
let end_pos = text.len();
while start_pos < end_pos {
let search_pos = if options.contains(RedactOptions::CaseInsensitive) {
text[start_pos..]
.to_ascii_lowercase()
.find(&target.to_ascii_lowercase())
} else {
text[start_pos..].find(&target)
};
if search_pos == None {
break;
}
let pos = search_pos.unwrap() + start_pos;
let mut word_start = pos;
let mut word_end = word_start + target_length;
if options.intersects(RedactOptions::Overkill | RedactOptions::WholeWord) {
let textchars: Vec<char> = text.chars().collect();
while word_start > start_pos && is_word_char(&textchars[word_start - 1]) {
word_start -= 1;
}
while word_end < end_pos && is_word_char(&textchars[word_end]) {
word_end += 1;
}
}
if !options.contains(RedactOptions::WholeWord)
|| (word_start == pos && word_end == pos + target_length)
{
text.replace_range(word_start..word_end, &"X".repeat(word_end - word_start));
}
start_pos = word_end;
}
}
fn do_basic_test(target: &str, options: RedactOptions) {
let mut text = String::new()
+ "Tom? Toms bottom tomato is in his stomach while playing the "
+ "\"Tom-tom\" brand tom-toms. That's so tom.";
redact(&mut text, target, &options);
println!(
"[{}|{}|{}]: {}",
if options.contains(RedactOptions::WholeWord) {
'w'
} else {
'p'
},
if options.contains(RedactOptions::CaseInsensitive) {
'i'
} else {
's'
},
if options.contains(RedactOptions::Overkill) {
'o'
} else {
'n'
},
text
);
}
fn do_tests(target: &str) {
println!("Redact '{}':", target);
do_basic_test(target, RedactOptions::WholeWord);
do_basic_test(
target,
RedactOptions::WholeWord | RedactOptions::CaseInsensitive,
);
do_basic_test(target, RedactOptions::NoOptions);
do_basic_test(target, RedactOptions::CaseInsensitive);
do_basic_test(target, RedactOptions::Overkill);
do_basic_test(
target,
RedactOptions::CaseInsensitive | RedactOptions::Overkill,
);
}
fn main() {
do_tests("Tom");
println!();
do_tests("tom");
return ();
}
- Output:
Redact 'Tom': [w|s|n]: XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. [w|i|n]: XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [p|s|n]: XXX? XXXs bottom tomato is in his stomach while playing the "XXX-tom" brand tom-toms. That's so tom. [p|i|n]: XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. [p|s|o]: XXX? XXXX bottom tomato is in his stomach while playing the "XXXXXXX" brand tom-toms. That's so tom. Redact 'Tom': [w|s|n]: XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. [w|i|n]: XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [p|s|n]: XXX? XXXs bottom tomato is in his stomach while playing the "XXX-tom" brand tom-toms. That's so tom. [p|i|n]: XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. [p|s|o]: XXX? XXXX bottom tomato is in his stomach while playing the "XXXXXXX" brand tom-toms. That's so tom. [p|i|o]: XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. Redact 'tom': [w|s|n]: Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [w|i|n]: XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [p|s|n]: Tom? Toms botXXX XXXato is in his sXXXach while playing the "Tom-XXX" brand XXX-XXXs. That's so XXX. [p|i|n]: XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. [p|s|o]: Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. [p|i|o]: XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX.
Swift
import Foundation
struct RedactionOptions: OptionSet {
let rawValue: Int
static let wholeWord = RedactionOptions(rawValue: 1 << 0)
static let overKill = RedactionOptions(rawValue: 1 << 1)
static let caseInsensitive = RedactionOptions(rawValue: 1 << 2)
}
func redact(text: String, target: String, options: RedactionOptions) throws -> String {
// Set up a regex search pattern for the specified target.
// Since it has to be able to match grapheme characters which may
// be combinations of others in the same string, include catches
// for "Zero Width Joiner" characters.
var pattern = "(?<!\\u200d)" + NSRegularExpression.escapedPattern(for: target) + "(?!\\u200d)"
if options.contains(.wholeWord) {
// Don't match where preceded or followed by either a hyphen
// or anything which isn't punctuation or white space.
pattern = "(?<![-[^[:punct:]\\s]])" + pattern + "(?![-[^[:punct:]\\s]])"
} else if options.contains(.overKill) {
// Include any preceding or following run of hyphens and/or
// non-(punctuation or white-space).
pattern = "[-[^[:punct:]\\s]]*" + pattern + "[-[^[:punct:]\\s]]*+"
}
// Default to case-sensitivity.
if options.contains(.caseInsensitive) {
pattern = "(?i)" + pattern
}
let regex = try NSRegularExpression(pattern: pattern)
let mutableText = NSMutableString(string: text)
// Locate all the matches in the text and replace each character
// or grapheme in the matched ranges with "X".
for match in regex.matches(in: text, range: NSRange(text.startIndex..., in: text)) {
mutableText.replaceOccurrences(of: ".(?:\\u200d.)*+", with: "X",
options: .regularExpression, range: match.range)
}
return mutableText as String
}
func optionString(options: RedactionOptions) -> String {
var result = options.contains(.wholeWord) ? "w" : "p"
result.append("|")
result.append(options.contains(.caseInsensitive) ? "i" : "s")
result.append("|")
result.append(options.contains(.overKill) ? "o" : "n")
return result
}
func doBasicTest(target: String, options: RedactionOptions) {
let text = "Tom? Toms bottom tomato is in his stomach while playing the \"Tom-tom\" brand tom-toms. That's so tom."
do {
try print("[\(optionString(options: options))]: \(redact(text: text, target: target, options: options))")
} catch {
print(error.localizedDescription)
}
}
func doBasicTests(target: String) {
print("Redact '\(target)':")
doBasicTest(target: target, options: .wholeWord)
doBasicTest(target: target, options: [.wholeWord, .caseInsensitive])
doBasicTest(target: target, options: [])
doBasicTest(target: target, options: .caseInsensitive)
doBasicTest(target: target, options: .overKill)
doBasicTest(target: target, options: [.overKill, .caseInsensitive])
}
func doExtraTest(target: String) {
let text = "π§ π¨ π§ π¨βπ©βπ¦"
do {
try print("Redact '\(target)':\n[w]: \(redact(text: text, target: target, options: .wholeWord))")
} catch {
print(error.localizedDescription)
}
}
doBasicTests(target: "Tom")
print()
doBasicTests(target: "tom")
print()
doExtraTest(target: "π¨")
print()
doExtraTest(target: "π¨βπ©βπ¦")
- Output:
Redact 'Tom': [w|s|n]: XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. [w|i|n]: XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [p|s|n]: XXX? XXXs bottom tomato is in his stomach while playing the "XXX-tom" brand tom-toms. That's so tom. [p|i|n]: XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. [p|s|o]: XXX? XXXX bottom tomato is in his stomach while playing the "XXXXXXX" brand tom-toms. That's so tom. [p|i|o]: XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. Redact 'tom': [w|s|n]: Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [w|i|n]: XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [p|s|n]: Tom? Toms botXXX XXXato is in his sXXXach while playing the "Tom-XXX" brand XXX-XXXs. That's so XXX. [p|i|n]: XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. [p|s|o]: Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. [p|i|o]: XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. Redact 'π¨': [w]: π§ X π§ π¨βπ©βπ¦ Redact 'π¨βπ©βπ¦': [w]: π§ π¨ π§ X
Tailspin
This is using the normal definition of words, i.e. emoji do not form words or word boundaries, so the stretch assignment must be matched as a partial match. This solution parses the flags inline and takes the secret to be redacted as a parameter to illustrate both options, although in a production solution I would imagine one might pass both the same way.
composer redact&{secret:}
@: { fill: '', leftBound: '\b{g}', rightBound: '\b{g}', case: '' };
(<flags> <WS>*) [ <redact|keep>* ] -> '$...;'
rule flags: <='['> <word|partial> <='|'> <insensitive|sensitive> <='|'> <overkill|normal> <=']'>
rule word: (<='w'> -> ..|@:{leftBound: '(?<!-)\b', rightBound: '\b(?!\-)'};)
rule partial: <='p'>
rule insensitive: (<='i'> -> ..|@:{case: '(?i)'};)
rule sensitive: <='s'>
rule overkill: (<='o'> -> ..|@:{leftBound: '(?<!\-)\b', rightBound: '\b(?!\-)', fill: '[\w\-]*'};)
rule normal: <='n'>
rule redact: <'(?uU)$@.case;$@.leftBound;$@.fill;$secret;$@.fill;$@.rightBound;'> -> [$... -> 'X'] -> '$...;'
rule keep: <~redact>
end redact
def target: 'Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That''s so tom.';
def options: ['[w|s|n]', '[w|i|n]', '[p|s|n]', '[p|i|n]', '[p|s|o]', '[p|i|o]'];
'Redacting Tom:
' -> !OUT::write
$options... -> \('$;: ' -> !OUT::write '$; $target;' -> redact&{secret: 'Tom'} -> '$;
' -> !OUT::write \) -> !VOID
'
Redacting tom:
' -> !OUT::write
$options... -> \('$;: ' -> !OUT::write '$; $target;' -> redact&{secret: 'tom'} -> '$;
' -> !OUT::write \) -> !VOID
'π§ π¨ π§ π¨βπ©βπ¦' -> '[p|s|n] $;' -> redact&{secret: 'π¨'} -> '
$;
' -> !OUT::write
'π§ π¨ π§ π¨βπ©βπ¦' -> '[p|s|n] $;' -> redact&{secret: 'π¨βπ©βπ¦'} -> '
$;
' -> !OUT::write
- Output:
Redacting Tom: [w|s|n]: XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. [w|i|n]: XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [p|s|n]: XXX? XXXs bottom tomato is in his stomach while playing the "XXX-tom" brand tom-toms. That's so tom. [p|i|n]: XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. [p|s|o]: XXX? XXXX bottom tomato is in his stomach while playing the "XXXXXXX" brand tom-toms. That's so tom. [p|i|o]: XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. Redacting tom: [w|s|n]: Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [w|i|n]: XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. [p|s|n]: Tom? Toms botXXX XXXato is in his sXXXach while playing the "Tom-XXX" brand XXX-XXXs. That's so XXX. [p|i|n]: XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. [p|s|o]: Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. [p|i|o]: XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. π§ X π§ π¨βπ©βπ¦ π§ π¨ π§ X
Wren
import "./pattern" for Pattern
import "./str" for Str
import "./upc" for Graphemes
var join = Fn.new { |words, seps|
var lw = words.count
var ls = seps.count
if (lw != ls + 1) {
Fiber.abort("Mismatch between number of words and separators.")
}
var sb = ""
for (i in 0...ls) {
sb = sb + words[i]
sb = sb + seps[i]
}
sb = sb + words[lw-1]
return sb
}
var redact = Fn.new { |text, word, opts|
var partial = opts.contains("p")
var overkill = opts.contains("o")
var insensitive = opts.contains("i")
var i = " \t\n\r!\"#$\%&()*+,./:;<=>?@[\\]^`{|}~" // all punctuation except -'_
var p = Pattern.new("+1/i", 0, i)
var matches = p.findAll(text)
var seps = Pattern.matchesText(matches)
var words = p.splitAll(text)
var expr = insensitive ? Str.lower(word) : word
var p2 = Pattern.new(expr)
for (i in 0...words.count) {
var w = words[i]
var wl = insensitive ? Str.lower(w) : w
var m = p2.find(wl)
if (m && wl.indexOf(m.text + "\u200d") == -1 && wl.indexOf("\u200d" + m.text) == -1) {
if (overkill) {
words[i] = "X" * Graphemes.clusterCount(w)
} else if (!partial) {
if (wl == m.text) words[i] = "X" * Graphemes.clusterCount(w)
} else if (partial) {
var repl = "X" * Graphemes.clusterCount(word)
words[i] = p2.replaceAll(wl, repl)
}
}
}
System.print("%(opts) %(join.call(words, seps))\n")
}
var printResults = Fn.new { |text, allOpts, allWords|
System.print("Text: %(text)\n")
for (word in allWords) {
System.print("Redact '%(word)':")
for (opts in allOpts) redact.call(text, word, opts)
}
System.print()
}
var text = "Tom? Toms bottom tomato is in his stomach while playing the \"Tom-tom\" brand tom-toms. That's so tom.
'Tis very tomish, don't you think?"
var allOpts = ["[w|s|n]", "[w|i|n]", "[p|s|n]", "[p|i|n]", "[p|s|o]", "[p|i|o]"]
var allWords = ["Tom", "tom", "t"]
printResults.call(text, allOpts, allWords)
text = "π§ π¨ π§ π¨βπ©βπ¦"
allOpts = ["[w]"]
allWords = ["π¨", "π¨βπ©βπ¦"]
printResults.call(text, allOpts, allWords)
text = "Argentinaπ§π¦πΉ Franceπ¨π«π· Germanyπ§π©πͺ Netherlandsπ¨βπ©βπ¦π³π±"
allOpts = ["[p]", "[p|o]"]
printResults.call(text, allOpts, allWords)
- Output:
Text: Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. 'Tis very tomish, don't you think? Redact 'Tom': [w|s|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. 'Tis very tomish, don't you think? [w|i|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. 'Tis very tomish, don't you think? [p|s|n] XXX? XXXs bottom tomato is in his stomach while playing the "XXX-tom" brand tom-toms. That's so tom. 'Tis very tomish, don't you think? [p|i|n] XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. 'Tis very XXXish, don't you think? [p|s|o] XXX? XXXX bottom tomato is in his stomach while playing the "XXXXXXX" brand tom-toms. That's so tom. 'Tis very tomish, don't you think? [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. 'Tis very XXXXXX, don't you think? Redact 'tom': [w|s|n] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. 'Tis very tomish, don't you think? [w|i|n] XXX? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so XXX. 'Tis very tomish, don't you think? [p|s|n] Tom? Toms botXXX XXXato is in his sXXXach while playing the "Tom-XXX" brand XXX-XXXs. That's so XXX. 'Tis very XXXish, don't you think? [p|i|n] XXX? XXXs botXXX XXXato is in his sXXXach while playing the "XXX-XXX" brand XXX-XXXs. That's so XXX. 'Tis very XXXish, don't you think? [p|s|o] Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. 'Tis very XXXXXX, don't you think? [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing the "XXXXXXX" brand XXXXXXXX. That's so XXX. 'Tis very XXXXXX, don't you think? Redact 't': [w|s|n] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. 'Tis very tomish, don't you think? [w|i|n] Tom? Toms bottom tomato is in his stomach while playing the "Tom-tom" brand tom-toms. That's so tom. 'Tis very tomish, don't you think? [p|s|n] Tom? Toms boXXom XomaXo is in his sXomach while playing Xhe "Tom-Xom" brand Xom-Xoms. ThaX's so Xom. 'Tis very Xomish, don'X you Xhink? [p|i|n] Xom? Xoms boXXom XomaXo is in his sXomach while playing Xhe "Xom-Xom" brand Xom-Xoms. XhaX's so Xom. 'Xis very Xomish, don'X you Xhink? [p|s|o] Tom? Toms XXXXXX XXXXXX is in his XXXXXXX while playing XXX "XXXXXXX" brand XXXXXXXX. XXXXXX so XXX. 'Tis very XXXXXX, XXXXX you XXXXX? [p|i|o] XXX? XXXX XXXXXX XXXXXX is in his XXXXXXX while playing XXX "XXXXXXX" brand XXXXXXXX. XXXXXX so XXX. XXXX very XXXXXX, XXXXX you XXXXX? Text: π§ π¨ π§ π¨βπ©βπ¦ Redact 'π¨': [w] π§ X π§ π¨βπ©βπ¦ Redact 'π¨βπ©βπ¦': [w] π§ π¨ π§ X Text: Argentinaπ§π¦πΉ Franceπ¨π«π· Germanyπ§π©πͺ Netherlandsπ¨βπ©βπ¦π³π± Redact 'π¨': [p] Argentinaπ§π¦πΉ FranceXπ«π· Germanyπ§π©πͺ Netherlandsπ¨βπ©βπ¦π³π± [p|o] Argentinaπ§π¦πΉ XXXXXXXX Germanyπ§π©πͺ Netherlandsπ¨βπ©βπ¦π³π± Redact 'π¨βπ©βπ¦': [p] Argentinaπ§π¦πΉ Franceπ¨π«π· Germanyπ§π©πͺ NetherlandsXπ³π± [p|o] Argentinaπ§π¦πΉ Franceπ¨π«π· Germanyπ§π©πͺ XXXXXXXXXXXXX