(Follow this link to go back to the README file.)
Stew is a JavaScript library, implemented in CoffeeScript. It is primarily intended to be used in a Node.js environment.1
Both the (original) CoffeeScript and (generated) JavaScript files are included in the binary distribution, so clients can use whichever they prefer.
Stew's source code is hosted at github.com/rodw/stew. Any issues or pull-requests you'd like to submit are appreciated.
Stew is published under an MIT license.
This document provides information that is primarily of interest to those that want to make changes to Stew. Most clients (users) of the Stew library will be more intersted in the README file.
Stew is partioned into three classes: DOMUtil, PredicateFactory and Stew.
Stew is the real driver behind the API, parsing CSS selector expressions and collecting matching nodes from the DOM tree.
PredicateFactory defines methods that implement indvidual CSS selection rules.
DOMUtil provides fairly generic utility methods for working with DOM structures.
We'll cover those bottom-up, from the most generic to the most specific.
DOMUtil provides generic utilities for working with the DOM (Document Object Model) structure generated by node-htmlparser.
For our purposes, the most important of these utilities is the walk_dom
method, which implements a depth-first walk of a given DOM tree. walk_dom
will invoke the given visit
(callback) method for every node in the DOM.
For example, to convert a DOM structure into text, we might create a visit
method like this:
var buffer = "";
var visit = new function(node,node_metadata,all_metadata) {
if(node.type === 'text') {
buffer = buffer + node.raw;
}
return true;
};
and invoke it like this:
domutil.walk_dom(dom,visit);
console.log("The text was");
console.log(buffer);
Stew uses DOMUtil.walk_dom
to transverse the DOM tree.
(node_metadata
and all_metadata
contain metadata about the current node, and all previously visited nodes, respectively. For example, node_metadata.parent
contains the parent of the current node and node_metadata.siblings
contains an array of all of node_metadata.parent
's children. See the comments with dom-util.coffee
for more detail.)
See the annotated source for more detail.
PredicateFactory generates predicate functions that test whether a given node matches a specific CSS selector.
For example, the "universal selector" (*
) matches any and every "tag" node. Here's a predicate function that implements the *
selector:
function universal_selector_predicate(node) {
return node.type === 'tag';
}
Here's a predicate that implements a "tag" selector, selecting all tags with the type (name) foo
:
function foo_tag_predicate(node) {
return node.type === 'tag' && node.name === 'foo';
}
PredicateFactory methods generate functions like these (bound to particular input parameters such as tag or attribute names).
PredicateFactory includes generators for each of the core CSS selectors (tag, ID, class, attribute name and attribute value) as well as combinators such as "and" (no space), "or" (,
), "descendant" (space), "direct descendant" (>
), "adjacent sibling" (+
).
Stew uses these predicates to implement the CSS selection logic.
See the annotated source for more detail.
Stew is the main entry point for the overall library. Stew parses a String
representation of a CSS Selector, generate the appropriate predicates (using PredicateFactory) and then processes the DOM tree (using DOMUtil) to select the matching nodes.
The CSS parsing is primarily accomplished via regular expressions. This is a multi-step process.
For example, lets assume complicated CSS expression such as:
'div#main .sidebar ul.links li:first-child a[rel="author"][href]'
The expression is split into individual selectors by _parse_selectors
using _SPLIT_ON_WS_REGEXP
. Naively this the same as splitting the expression on white-space characters, but we also need to take into account the use of spaces within "quoted strings"
and /regular expressions/
and non-whitespace delimiters like ,
or +
. In our example, we obtain these five tokens:
[ 'div#main', '.sidebar', 'ul.links', 'li:first-child', 'a[rel="author"][href]' ]
Each of these tokens is then parsed into one or more CSS specific selectors by _parse_selector
using _CSS_SELECTOR_REGEXP
(and where needed, _ATTRIBUTE_CLAUSE_REGEXP
). For example, from the first token (div#main
) we identify two individual predicates, one that implements "tag name is div
" and another that implements "node id is main
". These two predicates are then joined by an "and" predicate. All together, these five tokens are converted into predicates (something) like these:
div#main
becomes and( tag_name_is_div(), node_id_is_main() )
.sidebar
becomes class_name_is_sidebar()
ul.links
becomes and( tag_name_is_ul(), class_name_is_links() )
li:first-child
becomes and( tag_name_is_li(), tag_is_parents_first_child() )
a[rel="author"][href]
becomes and( tag_name_is_a(), rel_attr_is_author(), has_href_attr() )
Back in _parse_selectors
these five predicates are joined into a "descendant selector" predicate, yielding a single predicate that returns true
if and only if the current node matches the complete CSS expression.
CSS-Selector-implementing predicate in hand, Stew's select
method then visits every node in the DOM tree, collecting each node that matches the predicate.
See the annotated source for more detail.
The ./test
directory contains unit tests for each of these types. These tests can be executed by running
make test
or
npm test
The test-coverage report identifies the lines of code2 that are exerciesd by the test suite. These report can be generated by running:
make coverage
Your contributions, bug reports and pull-requests are greatly appreciated.
If you're looking for areas in which to contribute, here are a few ideas:
Documenation and examples are always welcome. There are several Markdown-format files within ./docs/ that are always in need of editing and improvement, and please feel free to plug any documentation gaps that you see.
New and improved unit-tests are also always welcome. You could help us ensure we've tested all the relevant parts of the CSS selector specification, or review the test coverage report to identify areas that aren't currently exercised by our unit test suite.
Stew has a few known limitations we'd like to eliminate. See the "Limitations" section of the README file for details.
Browser-side Stew isn't yet supported, or at least not fully tested. This probably doesn't require substantial changes, but no one has gotten around to it just yet.
Run the target make todo
to see a list of TODO
, FIXME
and similiar comments within the code and documenation.
We're happy to accept any help you can offer, but the following guidelines can help streamline the process for everyone.
You can report any bugs at github.com/rodw/stew/issues.
Our preferered channel for contributions or changes to Stew's source code and documenation is as a Git "patch" or "pull-request".
If you've never submitted a pull-request, here's one way to go about it:
git checkout -b my-new-branch
).If you'd rather use a private (or just non-GitHub) repository, you might find these generic instructions on creating a "patch" with Git helpful.
If you are making changes to the code please ensure that the unit test suite still passes.
If you are making changes to the code to address a bug or introduce new features, we'd greatly appreciate it if you can provide one or more unit tests that demonstrate the bug or exercise the new feature.
Please Note: We'd rather have a contribution that doesn't follow these guidelines than no contribution at all. If you are confused or put-off by any of the above, your contribution is still welcome. Feel free to contribute or comment in whatever channel works for you.
Technically Stew doesn't have any run-time dependencies. No external libraries are required.
Practically speaking, Stew depends upon Chris Winberry's node-htmlparser. Stew assumes the structure of the DOM object passed to select
and similiar methods is compatible with that generated by node-htmlparser.
If node-htmlparser
is available (via a require
call) then some (optional) DOMUtil
methods will make use of it.
Stew makes use of several libraries to support development, documentation and testing. These are enumerated in the package.json
file.
Downloading
You can clone Stew's Git repository via:
git clone git@github.com:rodw/stew.git
You can also download a ZIP archive of the latest source.
Installing
Once you have Stew cloned into a local working directory, you can use npm to install any build-time dependencies, as follows:
npm install
(This may take a few minutes, as some external libraries may need to be downloaded and natively compiled.)
Testing
Once installed, you can also run Stew's unit test suite using npm:
npm test
If everything is working properly, you should expect to see a message like 68 tests complete (633 ms)
(although the specific numbers might be different, of course).
Compiling the CoffeeScript files into JavaScript
You can run
npm run-script compile
to generate JavaScript files from the CoffeeScript files in ./lib
.
If you have GNU Make installed, the best and easiest way to work with Stew's source code is using the provided makefile.
You can use:
make install
and:
make test
and:
make js
in place of the npm equivalents above, but the makefile can help you to do much more than that.
make markdown
will generate Stew's HTML documention from various Markdown files in the repository. Most of these files will be written to the ./docs
directory. Note that the Makefile uses Pandoc to generate HTML from the Markdown sources, but in theory other Markdown processors could be used.
make docco
will generate an annotated version of Stew's source code using the nifty Docco documentation generator. These files will be written to ./docs/docco/
.
make docs
will do both of these at once.
make coverage
will generate a report that shows which source code lines are touched (and not touched) by the test suite. This runs the same unit tests as make test
, but uses JSCoverage to evaluate the test coverage. The coverage report is written to ./docs/coverage.html
.
make module
will generate a package suitable for distribution via npm (into a directory called ./module
).
make test-module-install
will generate the ./module
directory and then validate it by trying to install it into a temporary directory. You should expect to see It worked!
as the last line of output.
make clean
will remove various generated files.
make todo
will display a list of "TODO" and related comments found in the source code.
make targets
will list all available targets.
Although it probably wouldn't be difficult to make Stew work in a browser context, we haven't had any need for that, and so we haven't (yet) attempted to do it. Drop us a note if this is something you'd like to see Stew support.↩
The generated JavaScript code, not source CoffeeScript, for better or worse.↩