Sort an outline at every level: Difference between revisions
mNo edit summary |
(Corrected test outlines (had displayed outputs in lieu of inputs)) |
||
Line 26: | Line 26: | ||
* Sort every level of the (4 space indented) outline below lexically, once ascending and once descending. |
* Sort every level of the (4 space indented) outline below lexically, once ascending and once descending. |
||
<pre> |
<pre>zeta |
||
⚫ | |||
⚫ | |||
⚫ | |||
zeta |
|||
beta |
beta |
||
⚫ | |||
gamma |
gamma |
||
⚫ | |||
lambda |
lambda |
||
kappa |
|||
⚫ | |||
⚫ | |||
alpha |
|||
⚫ | |||
⚫ | |||
⚫ | |||
* Do the same with a tab-indented equivalent of the same outline. |
* Do the same with a tab-indented equivalent of the same outline. |
||
<pre> |
<pre>zeta |
||
⚫ | |||
⚫ | |||
⚫ | |||
zeta |
|||
⚫ | |||
⚫ | |||
gamma |
gamma |
||
mu |
|||
lambda |
lambda |
||
kappa |
|||
mu</pre> |
|||
⚫ | |||
⚫ | |||
alpha |
|||
⚫ | |||
⚫ | |||
⚫ | |||
Line 85: | Line 85: | ||
<pre>alpha |
<pre>alpha |
||
epsilon |
epsilon |
||
iota |
iota |
||
theta |
theta |
||
zeta |
zeta |
||
beta |
|||
delta |
delta |
||
gamma |
gamma |
||
kappa |
|||
lambda |
lambda |
||
mu</pre> |
mu</pre> |
||
<pre> |
<pre>zeta |
||
⚫ | |||
⚫ | |||
⚫ | |||
zeta |
|||
beta |
beta |
||
⚫ | |||
gamma |
gamma |
||
⚫ | |||
lambda |
lambda |
||
kappa |
|||
⚫ | |||
⚫ | |||
alpha |
|||
⚫ | |||
⚫ | |||
⚫ | |||
<br><br> |
<br><br> |
Revision as of 07:19, 18 August 2020
- Task
Write and test a function over an indented plain text outline which either:
- Returns a copy of the outline in which the sub-lists at every level of indentation are sorted, or
- reports that the indentation characters or widths are not consistent enough to make the outline structure clear.
Your code should detect and warn of at least two types of inconsistent indentation:
- inconsistent use of whitespace characters (e.g. mixed use of tabs and spaces)
- inconsistent indent widths. For example, an indentation with an odd number of spaces in an outline in which the minimum indent appears to be 2 spaces, or 4 spaces.
Your code should be able to detect and handle both tab-indented, and space-indented (e.g. 4 space, 2 space etc) outlines, without being given any advance warning of the indent characters used, or the size of the indent units.
You should also be able to specify different types of sort, for example, as a minimum, both ascending and descending lexical sorts.
Your sort should not alter the type or size of the indentation units used in the input outline.
(For an application of Indent Respectful Sort, see the Sublime Text package of that name. The Python source text [1] is available for inspection on Github).
Tests
- Sort every level of the (4 space indented) outline below lexically, once ascending and once descending.
zeta beta gamma lambda kappa mu delta alpha theta iota epsilon
- Do the same with a tab-indented equivalent of the same outline.
zeta gamma mu lambda kappa delta beta alpha theta iota epsilon
The output sequence of an ascending lexical sort of each level should be:
alpha epsilon iota theta zeta beta delta gamma kappa lambda mu
The output sequence of a descending lexical sort of each level should be:
zeta gamma mu lambda kappa delta beta alpha theta iota epsilon
- Attempt to separately sort each of the following two outlines, reporting any inconsistencies in their indentations:
alpha epsilon iota theta zeta beta delta gamma kappa lambda mu
zeta beta gamma lambda kappa mu delta alpha theta iota epsilon
Haskell
<lang haskell>{-# LANGUAGE OverloadedStrings #-}
import Data.Tree (Tree(..), foldTree) import qualified Data.Text.IO as T import qualified Data.Text as T import qualified Data.List as L import Data.Bifunctor (first) import Data.Ord (comparing) import Data.Char (isSpace)
OUTLINE SORTED AT EVERY LEVEL --------------
sortedOutline :: (Tree T.Text -> Tree T.Text -> Ordering)
-> T.Text -> Either T.Text T.Text
sortedOutline cmp outlineText =
let xs = T.lines outlineText in consistentIndentUnit (nonZeroIndents xs) >>= \indentUnit -> let forest = forestFromLineIndents $ indentLevelsFromLines xs sortedForest = subForest $ foldTree (\x xs -> Node x (L.sortBy cmp xs)) (Node "" forest) in Right $ outlineFromForest (<>) indentUnit sortedForest
TESTS --------------------------
main :: IO () main =
mapM_ (T.putStrLn . (<> "\n")) $ concat $ [ \cmp -> either id id . sortedOutline cmp <$> [spacedOutline, tabbedOutline, confusedOutline, raggedOutline] ] <*> -- Two lexical sort comparators: AZ and ZA [comparing rootLabel, flip (comparing rootLabel)]
spacedOutline, tabbedOutline, confusedOutline, raggedOutline :: T.Text spacedOutline =
"zeta\n\ \ beta\n\ \ gamma\n\ \ lambda\n\ \ kappa\n\ \ mu\n\ \ delta\n\ \alpha\n\ \ theta\n\ \ iota\n\ \ epsilon"
tabbedOutline =
"zeta\n\ \\tbeta\n\ \\tgamma\n\ \\t\tlambda\n\ \\t\tkappa\n\ \\t\tmu\n\ \\tdelta\n\ \alpha\n\ \\ttheta\n\ \\tiota\n\ \\tepsilon"
confusedOutline =
"zeta\n\ \ beta\n\ \ gamma\n\ \ lambda\n\ \ \t kappa\n\ \ mu\n\ \ delta\n\ \alpha\n\ \ theta\n\ \ iota\n\ \ epsilon"
raggedOutline =
"zeta\n\ \ beta\n\ \ gamma\n\ \ lambda\n\ \ kappa\n\ \ mu\n\ \ delta\n\ \alpha\n\ \ theta\n\ \ iota\n\ \ epsilon"
OUTLINE TREES :: SERIALIZED AND DESERIALIZED ------
forestFromLineIndents :: [(Int, T.Text)] -> [Tree T.Text] forestFromLineIndents = go
where go [] = [] go ((n, s):xs) = Node s (go subOutline) : go rest where (subOutline, rest) = span ((n <) . fst) xs
indentLevelsFromLines :: [T.Text] -> [(Int, T.Text)] indentLevelsFromLines xs = first (`div` indentUnit) <$> pairs
where pairs = first T.length . T.span isSpace <$> xs indentUnit = maybe 1 fst (L.find ((0 <) . fst) pairs)
outlineFromForest :: (T.Text -> a -> T.Text) -> T.Text -> [Tree a] -> T.Text outlineFromForest showRoot tabString forest = T.unlines $ forest >>= go ""
where go indent node = showRoot indent (rootLabel node) : (subForest node >>= go (T.append tabString indent))
OUTLINE CHECKING - INDENT CHARACTERS AND WIDTHS -----
consistentIndentUnit :: [T.Text] -> Either T.Text T.Text consistentIndentUnit prefixes = minimumIndent prefixes >>= flip checked prefixes
where checked indentUnit xs | all ((0 ==) . (`rem` unitLength) . T.length) xs = Right indentUnit | otherwise = Left ("Inconsistent indent depths: " <> T.pack (show (T.length <$> prefixes))) where unitLength = T.length indentUnit
minimumIndent :: [T.Text] -> Either T.Text T.Text minimumIndent prefixes = go $ T.foldr newChar "" $ T.concat prefixes
where newChar c seen | c `L.elem` seen = seen | otherwise = c : seen go cs | 1 < length cs = Left $ "Mixed indent characters used: " <> T.pack (show cs) | otherwise = Right $ L.minimumBy (comparing T.length) prefixes
nonZeroIndents :: [T.Text] -> [T.Text] nonZeroIndents textLines =
[ s | x <- textLines , s <- [fst (T.span isSpace x)] , 0 /= T.length s ]</lang>
- Output:
alpha epsilon iota theta zeta beta delta gamma kappa lambda mu alpha epsilon iota theta zeta beta delta gamma kappa lambda mu Mixed indent characters used: "\t " Inconsistent indent depths: [4,3,8,9,8,4,4,4,4] zeta gamma mu lambda kappa delta beta alpha theta iota epsilon zeta gamma mu lambda kappa delta beta alpha theta iota epsilon Mixed indent characters used: "\t " Inconsistent indent depths: [4,3,8,9,8,4,4,4,4]