- Add
.learn()to generate a selector for a selected node - Add
.listen()for easily creating DOM event listeners - Add
.trigger()for easily triggering DOM events - Add
.on()for binding callback to a local-only event - Add
.url()to set the current URL - Add
.params()to set the current URL parameters - Add
.save()to save response data to a file - Add
.add(),.remove()for node creation/deletion? - Add
.scroll()to scrape infinite scroll pages - Add warnings for parser errors?
- Switch to semantic versioning?
- Event/error handling
- Error.code = 404, 'timeout', etc.
- Error.module = 'http', 'dom', etc.
- return true = retry, false = stop, anything else = continue
- Event for discontinued context/data
- Module system using osmosis.require and modules prefixed with
osmosis- - Way to trigger DOM
- Throw unhandled errors?
.while()to do things more than once as long as they call next()
- Fixed bug where .get() without
paramscaused empty query string ('?') - Preserve sort order for
.follow()results within.set()
- Removed
optsandcallbackarguments
- Supports an array as the root data object
- Fixed case where nested
.findsearches the entire document
- parseHtml uses
hugeoption by default - Fixed nested Osmosis instances inside
set - Update to
libxmljs-domv0.0.5
- Fixed nested Osmosis instances inside
set - Added tests for nested set data
- Proper
submitbutton handling - Accepts a
submitbutton selector as the first argument - Supports
submitbutton attributes: "form", "formaction", "formenctype" and "formmethod" - Added tests for
submitbutton handling
- Update to
libxmljs-domv0.0.4
proxyoption can now be an array of multiple proxies
- Added
.proxy()to easily set theproxyconfiguration option
- If the first argument's name is:
- "document" - The callback is given the current document
- "window" - The callback is given the Window object
- "$" - The callback is given a jQuery object (if available)
- Uses 'use strict'
- Minimize use of array.forEach
- Added libxml specific memoryUsage monitoring
- Switched to static
libxmljs-domversion
- Added
ignore_http_errorsoption - Added
:internalfor selecting internal links - Added
:externalfor selecting external links - Added
:domainfor searching by domain name - Added
:pathfor searching by path
- Configuration options are inherited down the chain
- Added
.contains(string)to discard nodes whose contents do not matchstring
- Added
.do()to call one or more commands using the current context
- Added
.failure(selector)to discard nodes that match the given selector
- Added
.filter(selector)to discard nodes that do not match the given selector
- Accepts a tokenized URL string
- @{...} - Request info (url, method, params, headers, etc.)
- %{...} -
dataobject - ${...} -
contextsearch
- Added
headers({ key: value })andheader(key, value)to set HTTP headers
- Added
.match([selector], RegExp)to discard nodes whose contents do not match
- Added
.rewrite(callback)to set a URL rewriting function for the preceding request
promise.argsis now an object (used to be an array)- HTTP 400 errors are now logged and the requests are retried.
- DOM and css2xpath functionality have been moved to
libxmljs-dom - Added
keep_dataoption to retain the original HTTP response - Added
process_responseoption for processing data before parsing - Added test suite
- Added
.click()for interacting with JS-only content
- Added
.delay(n)for waiting n seconds before calling next. Accepts a decimal value.
- Accepts an array of selectors as the first argument
- Accepts second argument. Boolean (true = follow external links) or a URL rewriting function.
- Accepts
function(context, data)as the first argument. The function must return a URL string.
- Added second argument to associate a base-url to the document
- Added optional
doneargument
- Added
.selectfor finding elements within the current context
- Replaces previously set values
- Enhanced stack counting
- Added data object ref counting
- Added domain specific cookie handling
- Improved stability of deep instance nesting with
.set() - Osmosis instances operate more independently
- Request queues are now a single array for each instance
- Promises must accept and call
doneif they asynchronously send more than one output context per input context - If
.thensends more than one output context per input context, then it must acceptdone()as its last argument and call it after callingnext()for the last time.
- Ensure non-default
needleoptions propagate
- Added a more intuitive method for pagination
- Added easy form submission
- Added easy login support
- Added pause, resume, and stop functionality
- Searches the entire document by default
- Supports innerHTML using
:htmlor:sourcein selectors - Supports deep JSON structures and nested Osmosis instances
.data(null)clears the data object.data({})appends keys to data object
.dom()is continuing progress and can now run jQuery