Parsing command-line arguments from a STRING in Clojure

Tags: , , ,



I’m in a situation where I need to parse arguments from a string in the same way that they would be parsed if provided on the command-line to a Java/Clojure application.

For example, I need to turn "foo "bar baz" 'fooy barish' foo" into ("foo" "bar baz" "fooy barish" "foo").

I’m curious if there is a way to use the parser that Java or Clojure uses to do this. I’m not opposed to using a regex, but I suck at regexes, and I’d fail hard if I tried to write one for this.

Any ideas?

Answer

Updated with a new, even more convoluted version. This is officially ridiculous; the next iteration will use a proper parser (or c.c.monads and a little bit of Parsec-like logic on top of that). See the revision history on this answer for the original.

This convoluted bunch of functions seems to do the trick (not at my DRYest with this one, sorry!):

(defn initial-state [input]
  {:expecting nil
   :blocks (mapcat #(str/split % #"(?<=s)|(?=s)")
                   (str/split input #"(?<=(?:'|"|\))|(?=(?:'|"|\))"))
   :arg-blocks []})

(defn arg-parser-step [s]
  (if-let [bs (seq (:blocks s))]
    (if-let [d (:expecting s)]
      (loop [bs bs]
        (cond (= (first bs) d)
              [nil (-> s
                       (assoc-in [:expecting] nil)
                       (update-in [:blocks] next))]
              (= (first bs) "\")
              [nil (-> s
                       (update-in [:blocks] nnext)
                       (update-in [:arg-blocks]
                                  #(conj (pop %)
                                         (conj (peek %) (second bs)))))]
              :else
              [nil (-> s
                       (update-in [:blocks] next)
                       (update-in [:arg-blocks]
                                  #(conj (pop %) (conj (peek %) (first bs)))))]))
      (cond (#{""" "'"} (first bs))
            [nil (-> s
                     (assoc-in [:expecting] (first bs))
                     (update-in [:blocks] next)
                     (update-in [:arg-blocks] conj []))]
            (str/blank? (first bs))
            [nil (-> s (update-in [:blocks] next))]
            :else
            [nil (-> s
                     (update-in [:blocks] next)
                     (update-in [:arg-blocks] conj [(.trim (first bs))]))]))
    [(->> (:arg-blocks s)
          (map (partial apply str)))
     nil]))

(defn split-args [input]
  (loop [s (initial-state input)]
    (let [[result new-s] (arg-parser-step s)]
      (if result result (recur new-s)))))

Somewhat encouragingly, the following yields true:

(= (split-args "asdf 'asdf " asdf' "asdf ' asdf" asdf")
   '("asdf" "asdf " asdf" "asdf ' asdf" "asdf"))

So does this:

(= (split-args "asdf asdf '  asdf " asdf ' " foo bar ' baz " " foo bar \" baz "")
   '("asdf" "asdf" "  asdf " asdf " " foo bar ' baz " " foo bar " baz "))

Hopefully this should trim regular arguments, but not ones surrounded with quotes, handle double and single quotes, including quoted double quotes inside unquoted double quotes (note that it currently treats quoted single quotes inside unquoted single quotes in the same way, which is apparently at variance with the *nix shell way… argh) etc. Note that it’s basically a computation in an ad-hoc state monad, just written in a particularly ugly way and in a dire need of DRYing up. 😛



Source: stackoverflow