Osh 0.8.0 User's Guide

November 20, 2006

Introduction

Osh (Object SHell) is a an object-based set of command-line utilities. As with conventional Unix utilities, output from one command and input to the next can be connected. However, Unix tools are based on strings. It is normally strings that are passed from one command to the next, and the commands themselves -- grep, awk, sed, etc. -- are heavily string-oriented. Osh commands process objects, and it is objects that are sent from one command to the next. Objects may be primitive objects, such as strings and numbers; objects representing files, dates and times; objects representing output from commands such as vmstat and top; and collection objects such as lists and maps. It is easy to define new osh commands, and new types for use by osh commands.

Example:

Suppose you have a cluster named fred, with nodes fred1, fred2, fred3. Each node has a database tracking work requests with a table named request. You can find the total number of open requests in each database as follows:
    [jao@zack] osh @fred [ sql "select count(*) from request where state = 'open'" ] ^ out
    ('fred1', 1)
    ('fred2', 0)
    ('fred3', 5)

Now suppose you want to find the total number of open requests across the cluster. You can pipe the tuples into an aggregation command:

    [jao@zack] osh @fred [ sql "select count(*) from request where state = 'open'" ] ^ agg 0 'total, node, count: total + count' $
    6

Installation

To install osh locally, create a directory, e.g. installosh, and then do the following:
    cd installosh
    tar xzvf osh-0.8.0.tar.gz
    cd osh-0.8.0
    python ./setup.py install
After this is done, you may remove the installosh directory and its contents.

An alternative installation technique is to use the osh command installosh. If you have osh installed locally, then the osh command installosh can do a remote install, assuming you have root access on both machines, and assuming you have ssh configured to not prompt for a password.

More on Objects and Functions

osh is based on Python. The objects that it manipulates are Python objects, and the functions used by f, select and agg are Python lambda expressions, (you can either include or omit the lambda keyword).

Due to the use of Python, the primitive numeric and string types are objects, and familiar operators and methods can be used, (assuming you're familiar with Python). So in the example above, Python subscripting was used to access a field in a tuple inside a nother tuple.

Osh uses tuples and (less often) lists to represent collections of objects. For example, the sql command outputs tuples corresponding to rows retrieved from the database.

Osh Usage Patterns

In a typical sequence of Unix commands, connected using pipes, the first command generates some data, and subsequent commands transform the data. The same pattern applies to osh, but within the context of one osh invocation, piping is done by ^, not |. The command generating command may be a Unix command, in which case input is piped to osh's stdin, e.g.
    [jao@zack] vmstat -n 1 | osh ^ ... $

An alternative approach is to execute the vmstat command from osh, using the sh (shell) function to spawn a command running vmstat:

    [jao@zack] osh sh 'vmstat -n 1' ^ ... $
The escaped command must be quoted.

One other possibility is that the first osh command generates data, in which case osh is the first thing on the command line, e.g.

    [jao@zack] osh sql 'select * from person' ^ ... $

Configuration

The examples above don't specify how the sql command located a database, or how the @ (remote execution) command located the nodes of the cluster. This information is included in the osh configuration file. osh looks for a file named .oshrc in your home directory. Here is a sample .oshrc file:
    from oshconfig import *
    
    osh.sql = 'mydb'
    osh.sql.mydb.dbtype = 'postgres'
    osh.sql.mydb.host = 'localhost'
    osh.sql.mydb.db = 'mydb'
    osh.sql.mydb.user = 'fred'
    osh.sql.mydb.password = 'l3tme1n'

Slice notation can also be used, e.g.

    from oshconfig import *
    
    osh.sql = 'mydb'
    osh.sql['mydb'].dbtype = 'postgres'
    osh.sql['mydb'].host = 'localhost'
    osh.sql['mydb'].db = 'mydb'
    osh.sql['mydb'].user = 'fred'
    osh.sql['mydb'].password = 'l3tme1n'
The .oshrc file contains Python code, which is executed when osh starts. The osh.sql entries configure the connection to the database, (a postgres database on localhost, named mydb, accessed with username fred and password l3tme1n). The first line, osh.sql = 'mydb' configures mydb (described by the osh.sql.mydb lines) as the default database. If no database configuration is specified, as in example 2 above, then mydb will be used. With this configuration, these commands produce the same result:
    [jao@zack] osh sql 'select * from person' $
    [jao@zack] osh sql mydb 'select * from person' $

Remote execution profiles can be created too. Here is the setup for a remote execution profile named flock:

    osh.remote = 'flock'
    osh.remote.flock.user = 'root'
    osh.remote.flock.hosts = ['192.168.100.1',
                              '192.168.100.2',
                              '192.168.100.3',
                              '192.168.100.4']
The hosts can also be specified using map notation, in which the key is a logical name. Example:
    osh.remote = 'flock'
    osh.remote.flock.user = 'root'
    osh.remote.flock.hosts = {'seagull1': '192.168.100.1',
                              'seagull2': '192.168.100.2',
                              'seagull3': '192.168.100.3',
                              'seagull4': '192.168.100.4'}
When referring to individual nodes, osh uses logical names, (e.g. when the copyfrom command creates directories). To execute a command on each node of this cluster:
    [jao@zack] osh @flock [ ... ] ...
Or, because flock is the default cluster:
    [jao@zack] osh @ [ ... ] ...
To execute just on seagull3:
    [jao@zack] osh @flock:3 [ ... ] ...
This selects nodes belonging to the flock cluster containing the substring 3. (So flock:seagull would select all nodes.) A cluster configuration can also specify a default database for each node, e.g.
    osh.remote = 'flock'
    osh.remote.flock.user = 'root'
    osh.remote.flock.hosts = {'seagull1': {'host': '192.168.100.1', 'db_profile': 'db1'},
                              'seagull2': {'host': '192.168.100.2', 'db_profile': 'db2'},
                              'seagull3': {'host': '192.168.100.3', 'db_profile': 'db3'},
                              'seagull4': {'host': '192.168.100.4', 'db_profile': 'db4'}}
This says that on seagull1, the default database profile is db1, on seagull2 it's db2, etc. This specification of a database overrides the default profile specified in the .oshrc file on each node, and can be overridden by a profile name specified with the sql command.

Commands

Information on any command can be obtained by running the help command, e.g.
    osh help f

agg

The agg command does aggregation -- reducing a set of values to a single value by repeated application of an aggregation function.

For example, a list of files can be obtained as follows:

    [jao@zack] find . | osh ^ f 's: path(s)' $
This command is pointless by itself, as it produces the same output as find. agg can be added to this command, to compute the total size of these files, as follows:
    [jao@zack] find . | osh ^ f 's: path(s)' ^ agg 0 'sum, p: sum + p.size' $
The f command outputs a set of path objects. The first argument to agg specifies the initial value of the sum, 0. The aggregation function is:
    sum, p: sum + p.size
This function has two variables, sum and p. sum is the total size, for all paths processed so far. p is a path passed from the f command. sum + p.size adds the size of the file represented by p to the sum. This value will be passed to sum on the next invocation of the aggregation function. Or, if there are no more paths, then the sum is passed to the next command, which prints the result.

There is a second form of the agg command, which computes sums for groups of input objects. For example, suppose we want to compute a histogram of word lengths for the words in /usr/share/dict/words, (i.e., find the number of words of length 1, length 2, length 3, ...).

The osh command to do this is:

    [jao@zack]cat /usr/share/dict/words | osh ^ agg -g 'w: len(w)' 0 'count, n: count + 1' $
    (2, 49)
    (3, 536)
    (4, 2236)
    (5, 4176)
    (6, 6177)
    (7, 7375)
    (8, 7078)
    (9, 6093)
    (10, 4599)
    (11, 3072)
    (12, 1882)
    (13, 1138)
    (14, 545)
    (15, 278)
    (16, 103)
    (17, 57)
    (18, 23)
    (19, 3)
    (20, 3)
    (21, 2)
    (22, 1)
    (28, 1)
(The first number is the length. The second number is the number of words of that length.)

Agg uses this grouping function (specified with the -g flag):

    w: len(w)
which returns the length of word w. This causes agg to define a group for each word length. The remaining arguments to agg describe how to do aggregation for each group. The initial value of the aggregation is zero, and this function:
    count, n: count + 1
increments the groups counter.

agg -g does not generate output until the entire input stream has been processed. It has to be this way because group members are in no particular order. In some situations, group members appear consecutively. In these cases, the -c flag can be used instead of -g. This reduces memory requirements of the agg command (to a single group instead of all groups); and also allows output to be generated sooner. This is important in some applications, e.g. when output from commands such as vmstat and top are being processed.

copyfrom

copyfrom copies files and directories from every node of a cluster to a local directory. For example, this command copies /var/log/messages from every node of cluster flock into /tmp:
    [jao@zack] osh copyfrom -c flock /var/log/messages /tmp
After executing this command, if flock contains nodes seagull1, seagull2, seagull3, then seagull4, then /tmp will contain:
    seagull1/messages
    seagull2/messages
    seagull3/messages
    seagull4/messages

copyto

copyto copies files and directories to every node of a cluster. For example, this command copies the contents of /home/jao/foobar to every node of cluster flock under /tmp:
    [jao@zack] osh copyto -pr -c flock /home/jao/foobar /tmp
-p preserves file modes and times. -r does a recursive copy.

expand

expand expands a sequence or file. For example, if input contains a sequence like this:
    ('a', (1, 2, 3), 'x')
Then this command expands generates one output sequence for each item of the nested sequence, with each output sequence containing one of the items in the nested sequence:
    [jao@zack] osh ^ ... ^ expand 1 $
    ('a', 1, 'x')
    ('a', 2, 'x')
    ('a', 3, 'x')
The argument to expand is the position of the nested sequence to be expanded.

expand can also be used to expand a top-level sequence, by omitting the argument. For example, if the input stream contains these sequences:

    ('a', 1)
    ('b', 2)
    ('c', 3)

then expand with no arguments works as follows:

    [jao@zack] osh ^ ... ^ expand $
    ('a',)
    (1,)
    ('b',)
    (2,)
    ('c',)
    (3,)
expand can also be used to insert the contents of files into an output stream. For example, suppose we have two files, a.txt:
    a1
    a2
    a3

and b.txt:

    b1
    b2
Now suppose that the input stream to expand contains ('a.txt',) and ('b.txt',). Then a listing of the files, with each line including the name of a file and one line from the file as follows:
    [jao@zack] osh ^ ... ^ f 'x: (x, x)' ^ expand 1 $
    ('a.txt', 'a1')
    ('a.txt', 'a2')
    ('a.txt', 'a3')
    ('b.txt', 'b1')
    ('b.txt', 'b2')
f 'x: (x, x)' duplicates the file name x. The first occurrence is kept for the output, and the second occurrence is expanded. When expand is applied to a string, (the filename in position 1), the string is interpreted as a filename and the lines of that file are generated in each output tuple.

f

f applies a function to each input object and passes on the function's output. For example, this command takes files as input, uses f to compute file size, and then sums the file sizes:
    [jao@zack] find /usr/bin | osh ^ f 's: path(s).size' ^ agg 0 'sum, size: sum + size' $
f can also be used as the first osh command, to run a function with no arguments. For example, to get a list of all processes pids and command lines:
    [jao@zack] osh f 'processes()' ^ expand ^ f 'p: (p.pid(), p.command_line())' $
Similarly,
    [jao@zack] osh f path("/etc").walk()' ^ expand ^ ...
would generate a stream of path objects, each representing a file under the directory /etc.

gen

The function n() generates integers in the context of osh functions. The gen command, by contrast, generates integers all by itself, and so can be used as the initial osh command. The gen function has proved useful in debugging osh; it may or may not be useful to osh users.

gen with no arguments generates integers starting at 0. (The end will not be reached for a very, very long time.)

gen N generates integers from 0 through N - 1. gen N S generates N integers starting at S.

Example: This command:

    [jao@zack] osh gen 3 $
    (0,)
    (1,)
    (2,)
The output contains tuples, each containing an integer. This is because osh always pipes tuples or lists of objects between commands.

help

The help command prints documentation on an osh command. With no arguments, it prints a list of all osh commands. Examples:
    [jao@zack] osh help
    Usage: help [OSH_COMMAND]
    
    Print usage information for the named osh command.
    
    Builtin commands:
        agg
        copyfrom
        copyto
        expand
        f
        gen
        help
        install
        installosh
        out
        remote
        reverse
        select
        sh
        sort
        sql
        squish
        stdin
        timer
        unique
        version
        window
    [jao@zack] osh help unique
    Usage: unique [-c]
    
    Copies input objects to output, dropping duplicates. No output is
    generated until the end of the input stream occurs. However, if
    the duplicates are known to be consecutive, then specifying -c
    allows output to be generated sooner.

imp

The imp command imports modules for use in subsequent commands. Input to the imp command is passed through to the output stream. For example, a sequence of 1000 random integers between 1 and 10 can be generated as follows:
    [jao@zack] osh gen 1000 ^ imp random ^ f 'random.randint(1, 10)' $

install

The install command installs your own osh commands remotely. For example, this command:
    [root@zack] osh install -c flock *.py
Will install all .py files in the current directory on the cluster named flock (as configured in .oshrc). This command must be run as root.

If the cluster name is omitted, then installation goes to the remote cluster.

If there is no cluster with the specified name, then the name is interpreted as a hostname, permitting installation on a single host without specifying configuration in .oshrc. In this case, the user must be specified using the -u flag:

    [root@zack] osh install -c seagull99 -u root *.py

installosh

The installosh command can be used to install osh on all nodes of a cluster. This greatly simplifies the use of the @ (remote execution) operator. You will need to run this command as root.

To install osh on a cluster named flock:

    [root@zack] osh installosh flock
This will copy your (i.e., root's) .oshrc file to /root on each node. You can specify a different .oshrc file using the -c flag (but the file still goes to /root/.oshrc), e.g.
    [root@zack] osh installosh -c different_osh_config_file flock
To install to the default cluster, omit the cluster name argument. As with other remote commands, if the cluster name cannot be found in the .oshrc file, it will be interpreted as the name of a single node.

out

out prints input objects to stdout or to a file, and passes the objects on to the next command. If formatting is not specified, then input objects are converted to strings using Python's default formatting function, str.

Formatting, using the Python % operator can be done by providing a formatting string. For example, this command prints rows from the person table, using default tuple formatting:

    [jao@zack] osh sql 'select * from person' ^ out
    ('julia', 6)
    ('hannah', 11)
If a formatting string is used, then the command is:
    [jao@zack] osh sql 'select * from person' ^ out 'The age of %s is %d'
    The age of julia is 6
    The age of hannah is 11
To write to a file, replacing the existing contents, specify the file's name with the -f flag, e.g.
    [jao@zack] osh sql 'select * from person' ^ out -f people.txt
To append to the file instead, use the -a flag, e.g.
    [jao@zack] osh sql 'select * from person' ^ out -a people.txt

Because commands sequences are so often terminated by ^ out, a slightly more convenient piece of syntax is provided. At the end of a command only, ^ out can be replaced by $. If you want to send the output to a file using the -f or -a flags, or specify a format, or if you want to invoke out anywhere but the end of a command, then you must use the out command; $ is not syntactically legal. (Most examples in this document use $.) It is often useful to generate output in a CSV format (comma-separated values). This can be done using a formatting string, but the out command also supports a flag, -c, that generates CSV format

py

The py command executes a line of python code. The intended use is to define symbols for use in subsequent osh commands. For example, this example defines two variable, foo and bar, which are referenced in a later command:
    [jao@zack] osh py 'foo = 123; bar = 456' ^ f 'foo + bar' $
    (579,)

select

select applies a function to each input object, passing on only those objects for which the function returns true. For example, this command locates words in /usr/share/dict/words whose length is at least 20:
    [jao@zack]cat /usr/share/dict/words | osh ^ select 's: len(s) >= 20' $
    ('antidisestablishmentarianism',)
    ('electroencephalogram',)
    ('electroencephalograph',)
    ('electroencephalography',)
    ('Mediterraneanization',)
    ('Mediterraneanizations',)
    ('nondeterministically',)

sh

sh spawns a process to run a command. It is used to "escape" or "shell out" to the OS. One use of sh is to generate data to be operated on by other osh commands, e.g.
    [jao@zack] osh sh 'cat /home/jao/somefile' ^ ...
which is equivalent to
    [jao@zack] cat /home/jao/somefile' | osh ^ ...
sh can also be used to run OS commands, binding input from earlier osh commands. Example:
    [jao@zack] osh gen 5 ^ sh 'mkdir dir%s' 
osh gen 5 generates the integers 0, 1, 2, 3, and 4. Each value is substituted for %s in the mkdir command, and the resulting mkdir command is executed. This results in the creation of directories dir0, dir1, dir2, dir3 and dir4.

sort

sort consumes the entire input stream, sorts it, and then outputs the sorted objects. This command sorts the contents of a file:
    [jao@zack] cat file | osh ^ sort $
The same could be done using the Unix sort command, of course.

Sorting is done using the default Python comparison function cmp. You can also provide your own sorting function. For example, to sort a list of words by length, shortest words first:

    [jao@zack] cat file | osh ^ sort 's: len(s)' $

sql

The sql command runs a query on a database identified by a profile stored in ~/.oshrc. If the query is a SELECT statement, then the rows are sent to output as Python tuples, e.g.
    [jao@zack] osh sql 'select * from person' $
    ('julia', 6)
    ('hannah', 11)
If the query is an INSERT, DELETE or UPDATE statement, then there is no output. In these cases, the query may have variables, denoted by %s, which are assigned values from incoming objects. For example, suppose person.txt contains this data:
    alexander       13
    nathan          11
    zoe             6
This data can be loaded into the database by the following command:
    [jao@zack] cat person.txt | osh ^ f 's: s.split()' ^ sql "insert into person values('%s', %s)"
Splitting the lines of the file results in tuples ('alexander', '13'), ('nathan', '11'), ('zoe', '6'). These tuples are bound to the two %s occurrences in the INSERT statement. Running the SELECT statement again shows that the data from person.txt has been added:
    [jao@zack] osh sql 'select * from person' $
    ('julia', 6)
    ('hannah', 11)
    ('alexander', 13)
    ('nathan', 11)
    ('zoe', 6)
To access a database using a non-default profile, specify the database configuration's name before the query, e.g.
    [jao@zack] osh sql mydb 'select * from person' $
    ('julia', 6)
    ('hannah', 11)
The complete rules for selecting a database profile are as follows:

squish

squish is a convenience function. The Python function reduce is used to apply an operator to a sequence of objects. For example, the sum of numbers in a list L could be computed by reduce(lambda x, y: x + y, L). This is similar to what the osh command agg does, but agg works on consecutive objects in a stream, not on a list or tuple of objects.

If the osh stream of objects contains sequences, then squish could be applied. For example, suppose the input stream contains these sequences:

    (1, 2, 3)
    (4, 5, 6)
    (7, 8, 9)
Then to compute the sum of each sequence, we could do this:
    [jao@zack] osh ^ ... ^ f '*x: reduce(lambda a, b: a + b, x)' $
    (6,)
    (15,)
    (24,)
Osh provides the squish command which does the same sort of thing as applying the Python reduce function using the osh command f, but more concisely. The above command line is equivalent to the following:
    [jao@zack] osh ^ ... ^ squish + $
    (6,)
    (15,)
    (24,)
If the arguments to squish comprise a single occurrence of +, as above, then the + can be omitted, e.g.
    [jao@zack] osh ^ ... ^ squish $
    (6,)
    (15,)
    (24,)
If each input sequence contains nested sequences, then the squish command can be used to do multiple reductions in parallel. For example, suppose the input contains sequences of sequences like this:
   ((1, 2, 3), (10, 20, 30), (100, 200, 300))
To combine items in like positions, (e.g. 1 + 10, + 100, 2 + 20 + 200, 3 + 30 + 300), then we can do this:
    [jao@zack] osh ^ ... ^ squish '+ + +' $
    (111, 222, 333)
The operators that can appear in the argument to squish (and make sense) are +, *, min and max, e.g.
    [jao@zack] osh ^ ... ^ squish '+ min max' $
    (111, 2, 300)
111 is 1 + 10 + 100. 2 is min(2, 20, 200). 300 is max(3, 30, 300).

timer

timer generates a sequence of timestamps on a regular basis. For example, to generate a timestamp every second:
    [jao@zack] osh timer 1 $
    (2005, 9, 18, 23, 55, 57, 6, 261, 1)
    (2005, 9, 18, 23, 55, 58, 6, 261, 1)
    (2005, 9, 18, 23, 55, 59, 6, 261, 1)
    (2005, 9, 18, 23, 56, 0, 6, 261, 1)
    (2005, 9, 18, 23, 56, 1, 6, 261, 1)
...

The tuples generated as output have the same format as time.localtime(), (in fact, time.localtime() is used to generate them.)

timer is useful for running other commands on a regular basis, in particular, monitoring command. For example, the memory footprint of httpd processes could be monitored every 10 seconds as follows:

    [jao@zack] osh timer 10 ^\
    > f 'ts: (strftime("%H:%M:%S", ts), processes())' ^\
    > spread 1 ^\
    > select 'time, proc: proc.command_line().find("httpd") > 0' ^\
    > f 'time, proc: (time, proc.size())' $
    ('00:09:51', 22241280)
    ('00:09:51', 22241280)
    ('00:09:51', 22241280)
    ('00:09:51', 29237248)
    ('00:09:51', 22241280)
    ('00:09:51', 22241280)
    ('00:09:51', 22241280)
    ('00:09:51', 22241280)
    ('00:09:51', 22241280)
    ('00:09:51', 17436672)
    ...

unique

The unique command eliminates duplicates from its input. For example, if the file ~/foo.txt contains this:
    the good
    the bad
    and the ugly
Then this command can be used to obtain the unique words (sorted):
    [jao@zack] cat ~/foo.txt | osh ^ f 's: s.split()' ^ spread ^ unique ^ sort $
    ('and',)
    ('bad',)
    ('good',)
    ('the',)
    ('ugly',)

version

The version command sends the version number on the output stream. Example:
    [jao@zack] osh version $
    ('0.8.0',)
As with other osh commands, if you don't explicitly request output (e.g. using $ or ^ out, there will be no output.

window

The window command groups adjacent input objects into lists. Think of the objects streaming past a window. The objects visible through the window at one time are formed into a list. There are two ways to define windows. One way is to use a predicate which returns true when a new window should be started. For example, if the input sequence contains the numbers 0, 1, 2, ... (as generated by the gen command), then a new window can be started on every multiple of 10 as follows:
    [jao@zack] osh gen 100 ^ window 'n: n % 10 == 0' ^ squish $
    (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
    (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
    (20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
    (30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
    (40, 41, 42, 43, 44, 45, 46, 47, 48, 49)
    (50, 51, 52, 53, 54, 55, 56, 57, 58, 59)
    (60, 61, 62, 63, 64, 65, 66, 67, 68, 69)
    (70, 71, 72, 73, 74, 75, 76, 77, 78, 79)
    (80, 81, 82, 83, 84, 85, 86, 87, 88, 89)
    (90, 91, 92, 93, 94, 95, 96, 97, 98, 99)
Each input to window is a tuple containing a single integer. window combines these into a tuple of tuples. squish concatenates the interior tuples. (Without squish the first output tuple would be ((0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,)).)

Another way to form windows is to specify window sizes. In this example, the gen command is used to generate a stream of numbers, 0 through 9. The window command turns these into two lists of five numbers each. The -d flag specifies window size:

    [jao@zack] osh gen 10 ^ window -d 5 ^ squish $
    (0, 1, 2, 3, 4)
    (5, 6, 7, 8, 9)
The -d flag means that the windows are disjoint -- each input to the window command is assigned to a single output list. Overlapping lists can be created by specifying the -o flag. After a list is formed, the next list is formed by shifting out the first item in the list, and adding a new item at the end, e.g.
    [jao@zack] osh gen 10 ^ window -o 5 ^ squish $
    (0, 1, 2, 3, 4)
    (1, 2, 3, 4, 5)
    (2, 3, 4, 5, 6)
    (3, 4, 5, 6, 7)
    (4, 5, 6, 7, 8)
    (5, 6, 7, 8, 9)
    (6, 7, 8, 9, None)
    (7, 8, 9, None, None)
    (8, 9, None, None, None)
    (9, None, None, None, None)
Notice that the last four lines of output contain padding (None) to fill out each list to 5 items as specified.

Errors and Streams

An osh pipe relays objects from one osh command to another. These objects are organized into streams. Normal command output goes to the stream labeled o (the lower-case letter). Output resulting from errors goes to the stream labeled e. The streams o and e are meant to be used similarly to stdout and stderr.

Every osh command processes objects in one stream only, stream o by default. Example:

    [jao@zack] osh gen 3 $
    (0,)
    (1,)
    (2,)
The gen command writes its output to stream o, and the osh command prints whatever arrives on stream o. The stream processed by a command can be specified explicitly as follows:
    [jao@zack] osh gen 3 ^ o : out
    (0,)
    (1,)
    (2,)
If the string label is changed from o to e, then no output is generated:
    [jao@zack] osh gen 3 ^ e : out
This is because the out command only processes objects in stream e.

Here is an example of a command that generates an error:

    [jao@zack] osh gen 3 ^ f 'x: (x, float(x + 1) / x)' $
    (f#3['x: (x, float(x + 1) / x)'], 0, 'float division')
    (1, 2.0)
    (2, 1.5)
gen 3 generates the integers 0, 1, and 2. The f command generates tuples (x, float(x + 1) / x). For x = 0, division by zero occurs, which is an error. The first line of output describes this error, which shows up on stream e. The next two lines show (non-erroneous) output on stream o.

To make the handling of the o and e streams clearer, consider this command:

    [jao@zack] osh gen 3 ^ f 'x: (x, float(x + 1) / x)' ^ o : out 'OUT: %s', e : out 'ERR: %s'
    ERR: (f#3['x: (x, float(x + 1) / x)'], 0, 'float division')
    OUT: (1, 2.0)
    OUT: (2, 1.5)
After the last ^ there are two out commands, separated by commas:
    o : out 'OUT: %s'
    e : out 'ERR: %s'
The first out command handles the o stream and prints OUT at the beginning of each line. The second out command handles the e stream and prints ERR at the beginning of each line.

So why did the original command, with only a single out command (handling just the o stream) print both streams? Because osh provides a handler of the e stream if you don't. Suppressing error output is dangerous. In a future version of osh you will be able to replace the error handler.

Stream names can be changed using the :: operator. For example, if for some reason you wanted to switch the e and o streams:

    [jao@zack] osh gen 3 ^ f 'x: (x, float(x + 1) / x)' ^ o :: e, e :: o ^ o : out 'OUT: %s', e : out 'ERR: %s'
    OUT: (f#3['x: (x, float(x + 1) / x)'], 0, 'float division')
    ERR: (1, 2.0)
    ERR: (2, 1.5)
o :: e moves everything in the o stream to the e stream, and e :: o does the opposite.

Python API

Osh was designed for use from the command line and in shell scripts. But osh can also be used in python scripts through a conventional API. A python script using osh needs to import the oshapi module. Expected usage is to import oshapi symbols into your applications namespace as follows:
    #!/usr/bin/python
    from osh.oshapi import *
Execution of an osh command is done by a function named osh. Commands are provided using function invocations inside the osh call. Piping of objects from one command to the next is implied by the order of the function invocations. Example:
    #!/usr/bin/python
    from osh.oshapi import *
    osh(gen(3), out('%s'))
gen(3) generates a stream containing the integers 0, 1, 2, exactly as the osh command gen would do, run from the command line. Results are piped to the next function invocation. out('%s') prints to stdout all objects received from the previous command, formatting using %s.

Output from this script is:

    0
    1
    2
Various osh commands take python functions as arguments, e.g. f, select, and agg. In the osh API, a function may be passed by naming a function, providing a lambda expression, or as a string. However, functions that will be invoked remotely must be passed as strings. (This is because remote invocation relies on pickling, and functions and lambdas do not seem to be pickle-able.)

For example, the following two osh invocations produce the same output:

    osh(gen(3), f(lambda x: x * 10), out())
    osh(gen(3), f('x: x * 10'), out())
The use of the 'o' stream for normal output and the 'e' stream for error output is identical to command-line osh. To specify handling for a particular stream, the osh API relies on python dicts in which the key is a stream name, and the value is osh code that handles the stream.

For example, the following osh statement generates integers 0, 1, 2, 3, 4, and computes f(x) = x / (x - 2) for each. x = 2 results in division by zero.

    osh(gen(5),
        f(lambda x: x / (x - 2)),
        out())
Output:
    (0,)
    (-1,)
    ERROR: ('_F#1{}[ at 0xb7f47844>]', (2,), 'integer division or modulo by zero')
    (3,)
    (2,)
We can replace the invocation of out by a dict specifying the handling of the 'o' and 'e' streams:
osh(gen(5),
    f(lambda x: x / (x - 2)),
    {'o': out('OK: %s'),
     'e': [f(lambda command, input, message: input), out('ERR: %s')]})
Now, normal output is handled by out, using the format 'OK: %s'; and error output, on stream 'e', is handled by a sequence of commands which picks the offending input value, and formats it using 'ERR: %s'.

agg

The agg function reduces a set of values to a single value by repeated application of an aggregation function. For example, the following expression computes 10! (1 x 2 x ... x 10):
    osh(gen(10, 1),
        agg(1, lambda fact, x: fact * x),
        out())
gen(10, 1) generates the integers 1, ..., 10. The invocation of agg computes factorial, keeping a partial result, multiplying it by each incoming integer. The partial result is initialized to 1, the first argument to agg; and the multiplication is done by the second argument, a lambda expression. In the lambda expression, fact is the partial result, x is the incoming integer, and fact * x is the next partial result.

agg can also be used to compute an aggregate for "groups" of input values. For example, the following expression computes the sum of the odd and even integers between 0 and 9:

    osh(gen(10),
        agg(group(lambda x: x % 2), 0, lambda sum, x: sum + x),
        out())
There are two groups, 0 and 1, computed by x % 2 for each input value x. group(lambda x: x % 2) specifies the grouping function. Aggregation within a group is done by initializing a partial result to 0 (the second argument to agg, and then applying the aggregation function (the last argument to agg) repeatedly. The results from group 0 and group 1 are kept separate. Output from the above expression is:
    (0, 20)
    (1, 25)
If it is known that group members are consecutive in the input sequence, then the grouping function can be specified using consecutive() instead of group(). This reduces memory requirements (since only one partial result needs to be maintained at a time), and makes results available to the next osh function as each group is completed.

For example, suppose the grouping function is x / 2 instead of x % 2. Then for x = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, the values of the grouping function will be 0, 0, 1, 1, 2, 2, 3, 3, 4, 4. Because the members of each group are adjacent, this expression:

    osh(gen(10),
        agg(group(lambda x: x / 2), 0, lambda sum, x: sum + x),
        out())
is equivalent to this one:
    osh(gen(10),
        agg(consecutive(lambda x: x / 2), 0, lambda sum, x: sum + x),
        out())
Both produce this output:
    (0, 1)
    (1, 5)
    (2, 9)
    (3, 13)
    (4, 17)

copyfrom

copyfrom copies files and directories from every node of a cluster to a local directory. For example, this statement copies /var/log/messages from every node of cluster flock into /tmp:
    osh(copyfrom('flock', '/var/log/messages', '/tmp'))
After executing this command, if flock contains nodes seagull1, seagull2, seagull3, then seagull4, then /tmp will contain:
    seagull1/messages
    seagull2/messages
    seagull3/messages
    seagull4/messages
The scp options -r (recursive) -p (preserve file attributes) and -C (compress) are supported. These can be specified using the scp function inside the copyfrom call. For example, to copy all of /var/log to /tmp, using all of these flags:
    osh(copyfrom('flock', scp('rpC'), '/var/log/messages', '/tmp'))

copyto

copyto copies files and directories to every node of a cluster. For example, this statement copies the contents of /home/jao/foobar to every node of cluster flock under /tmp, preserving file attributes and using compression:
    osh(copyto('flock', scp('rpC'), '/home/jao/foobar', '/tmp'))

expand

expand expands a sequence or file. For example, if input contains a sequence like this:
    ('a', (1, 2, 3), 'x')
Then this command expands generates one output sequence for each item of the nested sequence, with each output sequence containing one of the items in the nested sequence:
    osh(..., expand(1), out())
The output from this statement is:
    ...
    ('a', 1, 'x')
    ('a', 2, 'x')
    ('a', 3, 'x')
    ...
The argument to expand is the position of the nested sequence to be expanded.

expand can also be used to expand a top-level sequence, by omitting the argument. For example, if the input stream contains these sequences:

    ('a', 1)
    ('b', 2)
    ('c', 3)

then expand() (no arguments) generates this output:

    ('a',)
    (1,)
    ('b',)
    (2,)
    ('c',)
    (3,)
expand can also be used to insert the contents of files into an output stream. For example, suppose we have two files, a.txt:
    a1
    a2
    a3

and b.txt:

    b1
    b2
Now suppose that the input stream contains ('a.txt',) and ('b.txt',). Then a listing of the files, with each line including the name of a file and one line from the file, can be generated by this statement:
    osh(..., f(lambda x: (x, x)), expand(1))
This generates the following output:
    ('a.txt', 'a1')
    ('a.txt', 'a2')
    ('a.txt', 'a3')
    ('b.txt', 'b1')
    ('b.txt', 'b2')
f(lambda x: (x, x) duplicates the file name x. The first occurrence is kept for the output, and the second occurrence is expanded. When expand is applied to a string, (the filename in position 1), the string is interpreted as a filename and the lines of that file are generated in each output tuple.

f

f applies a function to each input object and passes on the function's output. The function to be applied can be a symbol resolving to a function, a lambda expression, or a string containing a lambda expression. (The string may include or omit the keyword lambda.)

For example, the following statements both print tuples of the form (x, x * 100), for x = 0, ..., 9:

   osh(gen(10), f(lambda x: x * 100), out())
   osh(gen(10), f('x: x * 100'), out())

gen

The gen function generates a stream of integers. gen(N) generates integers from 0 through N - 1. For example, this statement:
    osh(gen(3), out())
generates this output:
    (0,)
    (1,)
    (2,)
gen(N, S) generates N integers starting at S.

out

out prints input objects to stdout or to a file, and passes the objects on to the next command. If formatting is not specified, then input objects are converted to strings using Python's default formatting function, str. Formatting, using the Python % operator can be done by providing a formatting string. For example, this statement:
    osh(gen(3), out())
prints gen output as tuples:
    (0,)
    (1,)
    (2,)
because osh passes tuples from one function to the next. Formatting using %s:
    osh(gen(3), out('%s'))
generates this output instead:
    0
    1
    2

To write to a file instead of stdout, use the append function to append to a file, or the write function to create or replace it:

    osh(gen(3), out(write('/tmp/numbers.txt')))
    osh(gen(3), out(append('/tmp/numbers.txt')))
It is often useful to generate output in a CSV format (comma-separated values). This can be done using a formatting string, but this can also be done by calling csv() inside of out(), e.g.
    osh(gen(3), out(write('/tmp/numbers.txt'), csv()))

reverse

reverse collects all objects from the input stream. When the stream is complete, the objects are sent out in reverse order. For example, this statement:
    osh(gen(3), reverse(), out('%s'))
generates this output:
    2
    1
    0

select

select applies a function to each input object, passing on only those objects for which the function returns true. For example, this statement prints numbers between 0 and 99 that are divisible by 7:
    osh(gen(100), select(lambda x: (x % 7) == 0), out())

sh

sh spawns a process to run a command. Stdout from the spawned command is sent to the 'o' stream, and stderr goes to the 'e' stream. For example, this statement produces a listing of /tmp:
    osh(sh('ls -l /tmp'), out())
sh can also be used to run OS commands, binding input piped in from other osh functions. For example, this statement creates directories dir0, ..., dir4 in /tmp:
    osh(gen(5), sh('mkdir /tmp/dir%s'), out())

sort

sort consumes the entire input stream, sorts it, and then outputs the sorted objects. For example, to sort stdin:
    osh(stdin(), sort(), out())
Sorting is done using the default Python comparison function cmp. You can also provide your own sorting function. For example, to sort stdin by length of each input line:
    osh(stdin(), sort(lambda line: len(line)), out())

sql

The sql function runs a query on a database identified by the default profile or a named profile. If the query is a SELECT statement, then the rows are sent to output as Python tuples. Example:
    osh(sql('select * from person'), out())
might generate this output:
    ('julia', 6)
    ('hannah', 11)
For INSERT, DELETE or UPDATE statements, there may be variables, denoted by %s, which are assigned values from incoming objects. For example, suppose person.txt contains this data:
    alexander       13
    nathan          11
    zoe             6
This data can be loaded into the database by the following command:
    osh(stdin(), f(lambda s: s.split()), sql("insert into person values('%s', %s)"))
Splitting the lines of the file results in tuples ('alexander', '13'), ('nathan', '11'), ('zoe', '6'). These tuples are bound to the two %s occurrences in the INSERT statement.

To access a database using a non-default configuration (specified in .oshrc), specify the database configuration's name before the query, e.g.

    osh(sql('mydb', 'select * from person'), out())
The complete rules for selecting a database profile are as follows:

squish

squish is a convenience function. The Python function reduce is used to apply an operator to a sequence of objects. For example, the sum of numbers in a list L could be computed by reduce(lambda x, y: x + y, L). This is similar to what the osh function agg does, but agg works on consecutive objects in a stream, not on a list or tuple of objects.

If the osh stream of objects contains sequences, then squish could be applied. For example, suppose the input stream contains these sequences:

    (1, 2, 3)
    (4, 5, 6)
    (7, 8, 9)
Then to compute the sum of each sequence, we could do this:
    osh(..., f(lambda *x: reduce(lambda a, b: a + b, x), out())
producing this output:
    (6,)
    (15,)
    (24,)
Osh provides the squish command which does the same sort of thing as applying the Python reduce function using the osh command f, but more concisely. The above statement is equivalent to:
    osh(..., squish('+'), out())
If the arguments to squish comprise a single occurrence of +, as above, then the + can be omitted, e.g.
    osh(..., squish(), out())
If each input sequence contains nested sequences, then the squish command can be used to do multiple reductions in parallel. For example, suppose the input contains sequences of sequences like this:
   ((1, 2, 3), (10, 20, 30), (100, 200, 300))
To combine items in like positions, (e.g. 1 + 10, + 100, 2 + 20 + 200, 3 + 30 + 300), then we can do this:
    osh(..., squish('+ + +'), out())
which yield this output:
    (111, 222, 333)
The operators that can appear in the argument to squish (and make sense) are +, *, min and max. For example, given the same input as in the preceding example, this statement:
    osh(..., squish('+ min max'), out())
yields this output:
    (111, 2, 300)
111 is 1 + 10 + 100. 2 is min(2, 20, 200). 300 is max(3, 30, 300).

stdin

The stdin function is used to convert each line of stdin into a string object, which is then piped to downstream commands. So osh(stdin(), ...) is equivalent to the osh command line osh ^ ....

timer

timer generates a sequence of timestamps on a regular basis. For example, to generate a timestamp every second:
    osh(timer(1), out())
generates this output, with lines appearing every second:
    (2005, 9, 18, 23, 55, 57, 6, 261, 1)
    (2005, 9, 18, 23, 55, 58, 6, 261, 1)
    (2005, 9, 18, 23, 55, 59, 6, 261, 1)
    (2005, 9, 18, 23, 56, 0, 6, 261, 1)
    (2005, 9, 18, 23, 56, 1, 6, 261, 1)
...
In general, the input to timer is a string of the form HH:MM:SS. An int can be passed for intervals up to 59 seconds.

unique

The unique function eliminates duplicates from its input. For example, if input to unique is this:
    (0,)
    (1,)
    (2,)
    (3,)
    (0,)
    (1,)
    (2,)
    (3,)
then this statement:
   osh(..., unique(), out())
generates this output:
    (1,)
    (0,)
    (3,)
    (2,)
Ordering is not guaranteed. If the input is know to be structured such that all duplicates are consecutive, then this variant can be used:
   osh(..., unique(consecutive()), out())
consecutive() reduces consecutive inputs to a single copy, minimizes memory requirements, and generates output sooner.

version

The version sends the version number on the output stream.

window

The window function groups adjacent input objects into lists. Think of the objects streaming past a window. The objects visible through the window at one time are formed into a list. There are two ways to define windows. One way is to use a predicate which returns true when a new window should be started. For example, if the input sequence contains the numbers 0, 1, 2, ... (as generated by the gen command), then a new window can be started on every multiple of 10 as follows:
    osh(gen(100), window(lambda n: n % 10 == 0), squish(), out())
This statement generates the following output:
    (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
    (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
    (20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
    (30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
    (40, 41, 42, 43, 44, 45, 46, 47, 48, 49)
    (50, 51, 52, 53, 54, 55, 56, 57, 58, 59)
    (60, 61, 62, 63, 64, 65, 66, 67, 68, 69)
    (70, 71, 72, 73, 74, 75, 76, 77, 78, 79)
    (80, 81, 82, 83, 84, 85, 86, 87, 88, 89)
    (90, 91, 92, 93, 94, 95, 96, 97, 98, 99)
Each input to window is a tuple containing a single integer. window combines these into a tuple of tuples. squish concatenates the interior tuples. (Without squish the first output tuple would be ((0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,)).)

Another way to form windows is to specify window sizes. In this example, gen is used to generate a stream of numbers, 0 through 9. The window function turns these into two lists of five numbers each:

    osh(gen(10), window(disjoint(5)), squish(), out())
The output from this statement is:
    (0, 1, 2, 3, 4)
    (5, 6, 7, 8, 9)
disjoint(5) specifies that windows of size 5, with non-overlapping elements should be created.

Overlapping lists can be created by specifying the window size using overlap After one window is formed, the next one is formed by shifting out the first item in the list, and adding a new item at the end. For example, this statement:

    osh(gen(10), window(overlap(5)), out())
    (0, 1, 2, 3, 4)
    (1, 2, 3, 4, 5)
    (2, 3, 4, 5, 6)
    (3, 4, 5, 6, 7)
    (4, 5, 6, 7, 8)
    (5, 6, 7, 8, 9)
    (6, 7, 8, 9, None)
    (7, 8, 9, None, None)
    (8, 9, None, None, None)
    (9, None, None, None, None)
Notice that the last four lines of output contain padding (None) to fill out each list to 5 items as specified.

Builtin Functions and Types

osh provides a few useful functions and types.

n(start = 0)

n() generates a sequence of ints, starting at the specified start value, or 0 if no start value is specified.

Example:

    [jao@zack] find /usr/bin | osh ^ f 's: (n(), path(s)) $
    (0, path('/usr/bin'))
    (1, path('/usr/bin/consolehelper'))
    (2, path('/usr/bin/catchsegv'))
    (3, path('/usr/bin/gencat'))
    (4, path('/usr/bin/getconf'))
    (5, path('/usr/bin/getent'))
    (6, path('/usr/bin/glibcbug'))
    ...
Each osh command has its own copy of the n() function. So using n() multiple times in the same command will yield different values on each call, e.g.
    [jao@zack] find /usr/bin | osh ^ f 's: (n(), n(), path(s)) $
    (0, 1, path('/usr/bin'))
    (2, 3, path('/usr/bin/consolehelper'))
    (4, 5, path('/usr/bin/catchsegv'))
    (6, 7, path('/usr/bin/gencat'))
    (8, 9, path('/usr/bin/getconf'))
    (10, 11, path('/usr/bin/getent'))
    (12, 13, path('/usr/bin/glibcbug'))
    ...
But calls in different commands are independent of one another, e.g.
    [jao@zack] find /usr/bin | osh ^ f 's: (n(), path(s))' ^ f 't: (n(),) + t' $
    (0, 0, path('/usr/bin'))
    (1, 1, path('/usr/bin/consolehelper'))
    (2, 2, path('/usr/bin/catchsegv'))
    (3, 3, path('/usr/bin/gencat'))
    (4, 4, path('/usr/bin/getconf'))
    (5, 5, path('/usr/bin/getent'))
    (6, 6, path('/usr/bin/glibcbug'))
    ...

ifelse(predicate, thenExpr, elseExpr)

Python has no operator comparable to ?: in C and Java. Such an operator is very useful. osh provides a function named ifelse that does approximately the same thing. The difference is that ifelse, being a functoin, function must evaluate all its arguments.

ifelse(predicate, thenExpr, elseExpr) returns thenExpr if predicate is true, elseExpr otherwise.

For example, this function finds the longest word in /usr/share/dict/words:

    [jao@zack] cat /usr/share/dict/words | osh ^ agg '""' 'longest, w: ifelse(len(w) > len(longest), w, longest)' $
    ('antidisestablishmentarianism',)

path

The path module is a more object-oriented interface to the filesystem than is provided with the standard Python libraries. This module was written by Jason Orendorff, and modified very slightly for use in osh. The homepage for path is http://www.jorendorff.com/articles/python/path. Refer to this page for detailed information on path.

The path module can be used inside osh commands to obtain objects representing files and directories. For example, here is a command to print a list of files under your home directory:

    [jao@zack] osh f 'path("/etc").walk()' ^ spread $
    /etc/wgetrc
    /etc/pnm2ppa.conf
    /etc/a2ps.cfg
    /etc/security
    /etc/security/group.conf
    /etc/security/chroot.conf
    /etc/security/time.conf
    ...
This is obviously pointless, since osh adds little to what find already does. Here is another example which includes file size and sorts by descending file size:
    [jao@zack] osh f 'path("/etc").walk() ^ spread ^ f 'p: (p.size, p)' ^ sort 't: -t[0]' $
    (129993L, path('/etc/lynx.cfg.sk'))
    (115004L, path('/etc/squid/squid.conf.default'))
    (115004L, path('/etc/squid/squid.conf'))
    (91259L, path('/etc/ld.so.cache'))
    (26104L, path('/etc/squid/mib.txt'))
    (23735L, path('/etc/webalizer.conf'))
    (15276L, path('/etc/a2ps.cfg'))
    (11651L, path('/etc/squid/mime.conf.default'))
    (11651L, path('/etc/squid/mime.conf'))
    (6300L, path('/etc/pnm2ppa.conf'))
    (4096L, path('/etc/security'))
    ...

process

The process object represents running processes. It has the following methods: process objects are generated by the processes command. It is possible that a process will terminate after a process object representing it has been generated, and before one of the methods above has been called. In this case, methods reflecting process properties (e.g. parent(), env()) will return None. Needless to say, race conditions are unavoidable.

stat

The version of path included with osh is slightly modified. path.stat returns a stat object instead of a tuple of values. For example, here is the previous example, modified to include stat output:
    [jao@zack] find /usr/bin | osh ^ f 's: path(s).abspath()' ^ f 'p: (p, p.stat())' ^ sort 't: -t[1].size' $
    (path('/usr/bin/gmplayer'), stat{mode:33261, inode:345865, device:770, hardLinks:1, uid:0, gid:0, size:4975052, atime:1098396622, mtime:1065382845, ctime:1098221020})
    (path('/usr/bin/mplayer'), stat{mode:33261, inode:345865, device:770, hardLinks:1, uid:0, gid:0, size:4975052, atime:1098396622, mtime:1065382845, ctime:1098221020})
    (path('/usr/bin/mencoder'), stat{mode:33261, inode:345864, device:770, hardLinks:1, uid:0, gid:0, size:4331308, atime:1065382845, mtime:1065382845, ctime:1098221019})
    (path('/usr/bin/emacs'), stat{mode:33261, inode:345185, device:770, hardLinks:2, uid:0, gid:0, size:4093052, atime:1105552462, mtime:1045723299, ctime:1076337118})
    (path('/usr/bin/emacs-21.2'), stat{mode:33261, inode:345185, device:770, hardLinks:2, uid:0, gid:0, size:4093052, atime:1105552462, mtime:1045723299, ctime:1076337118})
    (path('/usr/bin/gs'), stat{mode:33261, inode:344661, device:770, hardLinks:1, uid:0, gid:0, size:3233020, atime:1105475462, mtime:1053254951, ctime:1076346525})
    (path('/usr/bin/ghostscript'), stat{mode:33261, inode:344661, device:770, hardLinks:1, uid:0, gid:0, size:3233020, atime:1105475462, mtime:1053254951, ctime:1076346525})
    (path('/usr/bin/doxygen'), stat{mode:33261, inode:345455, device:770, hardLinks:1, uid:0, gid:0, size:3120192, atime:1043451839, mtime:1043451839, ctime:1076337735})
    ...
(p.size is no longer needed because the same information can be obtained from the stat object.)

top

top is an object that represents output from the Unix top command, e.g.
    [jao@zack]top -n 1 b | grep httpd | osh ^ f 's: top(s.split())' $
    top{pid:4123, user:root, priority:15, nice:0, size:2048, rss:76, share:48, stat:S, cpu_pct:0.0, mem_pct:0.0, time:0:00, cpu:0, command:httpd}
    top{pid:4137, user:apache, priority:15, nice:0, size:2040, rss:0, share:0, stat:SW, cpu_pct:0.0, mem_pct:0.0, time:0:00, cpu:1, command:httpd}
    top{pid:4138, user:apache, priority:15, nice:0, size:2052, rss:0, share:0, stat:SW, cpu_pct:0.0, mem_pct:0.0, time:0:00, cpu:1, command:httpd}
    top{pid:4139, user:apache, priority:15, nice:0, size:2048, rss:0, share:0, stat:SW, cpu_pct:0.0, mem_pct:0.0, time:0:00, cpu:1, command:httpd}
    top{pid:4140, user:apache, priority:15, nice:0, size:2048, rss:0, share:0, stat:SW, cpu_pct:0.0, mem_pct:0.0, time:0:00, cpu:1, command:httpd}
    top{pid:4141, user:apache, priority:15, nice:0, size:2056, rss:0, share:0, stat:SW, cpu_pct:0.0, mem_pct:0.0, time:0:00, cpu:0, command:httpd}
    top{pid:4142, user:apache, priority:15, nice:0, size:2052, rss:0, share:0, stat:SW, cpu_pct:0.0, mem_pct:0.0, time:0:00, cpu:1, command:httpd}
    top{pid:4143, user:apache, priority:15, nice:0, size:2116, rss:0, share:0, stat:SW, cpu_pct:0.0, mem_pct:0.0, time:0:00, cpu:1, command:httpd}
    top{pid:4144, user:apache, priority:15, nice:0, size:2116, rss:0, share:0, stat:SW, cpu_pct:0.0, mem_pct:0.0, time:0:00, cpu:1, command:httpd}

vmstat

vmstat is an object that represents output from the Unix vmstat command. For example, suppose you want to examine vmstat output every second, attach the time of the measurement, and then print a line of output when CPU idle time is less than 20%:
    [jao@zack] vmstat -n 1 | osh ^ select 's: n() > 1' ^ f 's: (strftime("%H:%M:%S"), vmstat(tuple(s.split())))' ^ select 't, v: v.id < 20' $

Known Issues

  1. The lexer is not very good. Osh tokens must be separated by spaces. So this is OK:
        [jao@zack] osh sql 'select * from person' ^ out
    
    but this is not (no space after ^):
        [jao@zack] osh sql 'select * from person' ^out
    
    and neither is this (no space before or after ^):
        [jao@zack] osh sql 'select * from person'^out
    

  2. Remote execution requires ssh, configured to avoid prompts for password and pass-phrase. (You need to set up the .ssh/authorized_hosts file on each node of a cluster.)

  3. The sort command does an in-memory sort, which limits the amount of data that can be practically sorted. A future release of osh will drop this limitation by using a disk-based sort for large inputs. The reverse command has a similar limitation.

  4. Nested pipelines [ ... ] are undocumented, (except for the special case of remote execution).

  5. Writing an osh command is not documented.

  6. Replacement of default error handler.