GNU GENERAL PUBLIC LICENSE + Version 3, 29 June 2007 + + Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/> + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The GNU General Public License is a free, copyleft license for +software and other kinds of works. + + The licenses for most software and other practical works are designed +to take away your freedom to share and change the works. By contrast, +the GNU General Public License is intended to guarantee your freedom to +share and change all versions of a program--to make sure it remains free +software for all its users. We, the Free Software Foundation, use the +GNU General Public License for most of our software; it applies also to +any other work released this way by its authors. You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +them if you wish), that you receive source code or can get it if you +want it, that you can change the software or use pieces of it in new +free programs, and that you know you can do these things. + + To protect your rights, we need to prevent others from denying you +these rights or asking you to surrender the rights. Therefore, you have +certain responsibilities if you distribute copies of the software, or if +you modify it: responsibilities to respect the freedom of others. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must pass on to the recipients the same +freedoms that you received. You must make sure that they, too, receive +or can get the source code. And you must show them these terms so they +know their rights. + + Developers that use the GNU GPL protect your rights with two steps: +(1) assert copyright on the software, and (2) offer you this License +giving you legal permission to copy, distribute and/or modify it. + + For the developers' and authors' protection, the GPL clearly explains +that there is no warranty for this free software. For both users' and +authors' sake, the GPL requires that modified versions be marked as +changed, so that their problems will not be attributed erroneously to +authors of previous versions. + + Some devices are designed to deny users access to install or run +modified versions of the software inside them, although the manufacturer +can do so. This is fundamentally incompatible with the aim of +protecting users' freedom to change the software. The systematic +pattern of such abuse occurs in the area of products for individuals to +use, which is precisely where it is most unacceptable. Therefore, we +have designed this version of the GPL to prohibit the practice for those +products. If such problems arise substantially in other domains, we +stand ready to extend this provision to those domains in future versions +of the GPL, as needed to protect the freedom of users. + + Finally, every program is threatened constantly by software patents. +States should not allow patents to restrict development and use of +software on general-purpose computers, but in those that do, we wish to +avoid the special danger that patents applied to a free program could +make it effectively proprietary. To prevent this, the GPL assures that +patents cannot be used to render the program non-free. + + The precise terms and conditions for copying, distribution and +modification follow. + + TERMS AND CONDITIONS + + 0. Definitions. + + "This License" refers to version 3 of the GNU General Public License. + + "Copyright" also means copyright-like laws that apply to other kinds of +works, such as semiconductor masks. + + "The Program" refers to any copyrightable work licensed under this +License. Each licensee is addressed as "you". "Licensees" and +"recipients" may be individuals or organizations. + + To "modify" a work means to copy from or adapt all or part of the work +in a fashion requiring copyright permission, other than the making of an +exact copy. The resulting work is called a "modified version" of the +earlier work or a work "based on" the earlier work. + + A "covered work" means either the unmodified Program or a work based +on the Program. + + To "propagate" a work means to do anything with it that, without +permission, would make you directly or secondarily liable for +infringement under applicable copyright law, except executing it on a +computer or modifying a private copy. Propagation includes copying, +distribution (with or without modification), making available to the +public, and in some countries other activities as well. + + To "convey" a work means any kind of propagation that enables other +parties to make or receive copies. Mere interaction with a user through +a computer network, with no transfer of a copy, is not conveying. + + An interactive user interface displays "Appropriate Legal Notices" +to the extent that it includes a convenient and prominently visible +feature that (1) displays an appropriate copyright notice, and (2) +tells the user that there is no warranty for the work (except to the +extent that warranties are provided), that licensees may convey the +work under this License, and how to view a copy of this License. If +the interface presents a list of user commands or options, such as a +menu, a prominent item in the list meets this criterion. + + 1. Source Code. + + The "source code" for a work means the preferred form of the work +for making modifications to it. "Object code" means any non-source +form of a work. + + A "Standard Interface" means an interface that either is an official +standard defined by a recognized standards body, or, in the case of +interfaces specified for a particular programming language, one that +is widely used among developers working in that language. + + The "System Libraries" of an executable work include anything, other +than the work as a whole, that (a) is included in the normal form of +packaging a Major Component, but which is not part of that Major +Component, and (b) serves only to enable use of the work with that +Major Component, or to implement a Standard Interface for which an +implementation is available to the public in source code form. A +"Major Component", in this context, means a major essential component +(kernel, window system, and so on) of the specific operating system +(if any) on which the executable work runs, or a compiler used to +produce the work, or an object code interpreter used to run it. + + The "Corresponding Source" for a work in object code form means all +the source code needed to generate, install, and (for an executable +work) run the object code and to modify the work, including scripts to +control those activities. However, it does not include the work's +System Libraries, or general-purpose tools or generally available free +programs which are used unmodified in performing those activities but +which are not part of the work. For example, Corresponding Source +includes interface definition files associated with source files for +the work, and the source code for shared libraries and dynamically +linked subprograms that the work is specifically designed to require, +such as by intimate data communication or control flow between those +subprograms and other parts of the work. + + The Corresponding Source need not include anything that users +can regenerate automatically from other parts of the Corresponding +Source. + + The Corresponding Source for a work in source code form is that +same work. + + 2. Basic Permissions. + + All rights granted under this License are granted for the term of +copyright on the Program, and are irrevocable provided the stated +conditions are met. This License explicitly affirms your unlimited +permission to run the unmodified Program. The output from running a +covered work is covered by this License only if the output, given its +content, constitutes a covered work. This License acknowledges your +rights of fair use or other equivalent, as provided by copyright law. + + You may make, run and propagate covered works that you do not +convey, without conditions so long as your license otherwise remains +in force. You may convey covered works to others for the sole purpose +of having them make modifications exclusively for you, or provide you +with facilities for running those works, provided that you comply with +the terms of this License in conveying all material for which you do +not control copyright. Those thus making or running the covered works +for you must do so exclusively on your behalf, under your direction +and control, on terms that prohibit them from making any copies of +your copyrighted material outside their relationship with you. + + Conveying under any other circumstances is permitted solely under +the conditions stated below. Sublicensing is not allowed; section 10 +makes it unnecessary. + + 3. Protecting Users' Legal Rights From Anti-Circumvention Law. + + No covered work shall be deemed part of an effective technological +measure under any applicable law fulfilling obligations under article +11 of the WIPO copyright treaty adopted on 20 December 1996, or +similar laws prohibiting or restricting circumvention of such +measures. + + When you convey a covered work, you waive any legal power to forbid +circumvention of technological measures to the extent such circumvention +is effected by exercising rights under this License with respect to +the covered work, and you disclaim any intention to limit operation or +modification of the work as a means of enforcing, against the work's +users, your or third parties' legal rights to forbid circumvention of +technological measures. + + 4. Conveying Verbatim Copies. + + You may convey verbatim copies of the Program's source code as you +receive it, in any medium, provided that you conspicuously and +appropriately publish on each copy an appropriate copyright notice; +keep intact all notices stating that this License and any +non-permissive terms added in accord with section 7 apply to the code; +keep intact all notices of the absence of any warranty; and give all +recipients a copy of this License along with the Program. + + You may charge any price or no price for each copy that you convey, +and you may offer support or warranty protection for a fee. + + 5. Conveying Modified Source Versions. + + You may convey a work based on the Program, or the modifications to +produce it from the Program, in the form of source code under the +terms of section 4, provided that you also meet all of these conditions: + + a) The work must carry prominent notices stating that you modified + it, and giving a relevant date. + + b) The work must carry prominent notices stating that it is + released under this License and any conditions added under section + 7. This requirement modifies the requirement in section 4 to + "keep intact all notices". + + c) You must license the entire work, as a whole, under this + License to anyone who comes into possession of a copy. This + License will therefore apply, along with any applicable section 7 + additional terms, to the whole of the work, and all its parts, + regardless of how they are packaged. This License gives no + permission to license the work in any other way, but it does not + invalidate such permission if you have separately received it. + + d) If the work has interactive user interfaces, each must display + Appropriate Legal Notices; however, if the Program has interactive + interfaces that do not display Appropriate Legal Notices, your + work need not make them do so. + + A compilation of a covered work with other separate and independent +works, which are not by their nature extensions of the covered work, +and which are not combined with it such as to form a larger program, +in or on a volume of a storage or distribution medium, is called an +"aggregate" if the compilation and its resulting copyright are not +used to limit the access or legal rights of the compilation's users +beyond what the individual works permit. Inclusion of a covered work +in an aggregate does not cause this License to apply to the other +parts of the aggregate. + + 6. Conveying Non-Source Forms. + + You may convey a covered work in object code form under the terms +of sections 4 and 5, provided that you also convey the +machine-readable Corresponding Source under the terms of this License, +in one of these ways: + + a) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by the + Corresponding Source fixed on a durable physical medium + customarily used for software interchange. + + b) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by a + written offer, valid for at least three years and valid for as + long as you offer spare parts or customer support for that product + model, to give anyone who possesses the object code either (1) a + copy of the Corresponding Source for all the software in the + product that is covered by this License, on a durable physical + medium customarily used for software interchange, for a price no + more than your reasonable cost of physically performing this + conveying of source, or (2) access to copy the + Corresponding Source from a network server at no charge. + + c) Convey individual copies of the object code with a copy of the + written offer to provide the Corresponding Source. This + alternative is allowed only occasionally and noncommercially, and + only if you received the object code with such an offer, in accord + with subsection 6b. + + d) Convey the object code by offering access from a designated + place (gratis or for a charge), and offer equivalent access to the + Corresponding Source in the same way through the same place at no + further charge. You need not require recipients to copy the + Corresponding Source along with the object code. If the place to + copy the object code is a network server, the Corresponding Source + may be on a different server (operated by you or a third party) + that supports equivalent copying facilities, provided you maintain + clear directions next to the object code saying where to find the + Corresponding Source. Regardless of what server hosts the + Corresponding Source, you remain obligated to ensure that it is + available for as long as needed to satisfy these requirements. + + e) Convey the object code using peer-to-peer transmission, provided + you inform other peers where the object code and Corresponding + Source of the work are being offered to the general public at no + charge under subsection 6d. + + A separable portion of the object code, whose source code is excluded +from the Corresponding Source as a System Library, need not be +included in conveying the object code work. + + A "User Product" is either (1) a "consumer product", which means any +tangible personal property which is normally used for personal, family, +or household purposes, or (2) anything designed or sold for incorporation +into a dwelling. In determining whether a product is a consumer product, +doubtful cases shall be resolved in favor of coverage. For a particular +product received by a particular user, "normally used" refers to a +typical or common use of that class of product, regardless of the status +of the particular user or of the way in which the particular user +actually uses, or expects or is expected to use, the product. A product +is a consumer product regardless of whether the product has substantial +commercial, industrial or non-consumer uses, unless such uses represent +the only significant mode of use of the product. + + "Installation Information" for a User Product means any methods, +procedures, authorization keys, or other information required to install +and execute modified versions of a covered work in that User Product from +a modified version of its Corresponding Source. The information must +suffice to ensure that the continued functioning of the modified object +code is in no case prevented or interfered with solely because +modification has been made. + + If you convey an object code work under this section in, or with, or +specifically for use in, a User Product, and the conveying occurs as +part of a transaction in which the right of possession and use of the +User Product is transferred to the recipient in perpetuity or for a +fixed term (regardless of how the transaction is characterized), the +Corresponding Source conveyed under this section must be accompanied +by the Installation Information. But this requirement does not apply +if neither you nor any third party retains the ability to install +modified object code on the User Product (for example, the work has +been installed in ROM). + + The requirement to provide Installation Information does not include a +requirement to continue to provide support service, warranty, or updates +for a work that has been modified or installed by the recipient, or for +the User Product in which it has been modified or installed. Access to a +network may be denied when the modification itself materially and +adversely affects the operation of the network or violates the rules and +protocols for communication across the network. + + Corresponding Source conveyed, and Installation Information provided, +in accord with this section must be in a format that is publicly +documented (and with an implementation available to the public in +source code form), and must require no special password or key for +unpacking, reading or copying. + + 7. Additional Terms. + + "Additional permissions" are terms that supplement the terms of this +License by making exceptions from one or more of its conditions. +Additional permissions that are applicable to the entire Program shall +be treated as though they were included in this License, to the extent +that they are valid under applicable law. If additional permissions +apply only to part of the Program, that part may be used separately +under those permissions, but the entire Program remains governed by +this License without regard to the additional permissions. + + When you convey a copy of a covered work, you may at your option +remove any additional permissions from that copy, or from any part of +it. (Additional permissions may be written to require their own +removal in certain cases when you modify the work.) You may place +additional permissions on material, added by you to a covered work, +for which you have or can give appropriate copyright permission. + + Notwithstanding any other provision of this License, for material you +add to a covered work, you may (if authorized by the copyright holders of +that material) supplement the terms of this License with terms: + + a) Disclaiming warranty or limiting liability differently from the + terms of sections 15 and 16 of this License; or + + b) Requiring preservation of specified reasonable legal notices or + author attributions in that material or in the Appropriate Legal + Notices displayed by works containing it; or + + c) Prohibiting misrepresentation of the origin of that material, or + requiring that modified versions of such material be marked in + reasonable ways as different from the original version; or + + d) Limiting the use for publicity purposes of names of licensors or + authors of the material; or + + e) Declining to grant rights under trademark law for use of some + trade names, trademarks, or service marks; or + + f) Requiring indemnification of licensors and authors of that + material by anyone who conveys the material (or modified versions of + it) with contractual assumptions of liability to the recipient, for + any liability that these contractual assumptions directly impose on + those licensors and authors. + + All other non-permissive additional terms are considered "further +restrictions" within the meaning of section 10. If the Program as you +received it, or any part of it, contains a notice stating that it is +governed by this License along with a term that is a further +restriction, you may remove that term. If a license document contains +a further restriction but permits relicensing or conveying under this +License, you may add to a covered work material governed by the terms +of that license document, provided that the further restriction does +not survive such relicensing or conveying. + + If you add terms to a covered work in accord with this section, you +must place, in the relevant source files, a statement of the +additional terms that apply to those files, or a notice indicating +where to find the applicable terms. + + Additional terms, permissive or non-permissive, may be stated in the +form of a separately written license, or stated as exceptions; +the above requirements apply either way. + + 8. Termination. + + You may not propagate or modify a covered work except as expressly +provided under this License. Any attempt otherwise to propagate or +modify it is void, and will automatically terminate your rights under +this License (including any patent licenses granted under the third +paragraph of section 11). + + However, if you cease all violation of this License, then your +license from a particular copyright holder is reinstated (a) +provisionally, unless and until the copyright holder explicitly and +finally terminates your license, and (b) permanently, if the copyright +holder fails to notify you of the violation by some reasonable means +prior to 60 days after the cessation. + + Moreover, your license from a particular copyright holder is +reinstated permanently if the copyright holder notifies you of the +violation by some reasonable means, this is the first time you have +received notice of violation of this License (for any work) from that +copyright holder, and you cure the violation prior to 30 days after +your receipt of the notice. + + Termination of your rights under this section does not terminate the +licenses of parties who have received copies or rights from you under +this License. If your rights have been terminated and not permanently +reinstated, you do not qualify to receive new licenses for the same +material under section 10. + + 9. Acceptance Not Required for Having Copies. + + You are not required to accept this License in order to receive or +run a copy of the Program. Ancillary propagation of a covered work +occurring solely as a consequence of using peer-to-peer transmission +to receive a copy likewise does not require acceptance. However, +nothing other than this License grants you permission to propagate or +modify any covered work. These actions infringe copyright if you do +not accept this License. Therefore, by modifying or propagating a +covered work, you indicate your acceptance of this License to do so. + + 10. Automatic Licensing of Downstream Recipients. + + Each time you convey a covered work, the recipient automatically +receives a license from the original licensors, to run, modify and +propagate that work, subject to this License. You are not responsible +for enforcing compliance by third parties with this License. + + An "entity transaction" is a transaction transferring control of an +organization, or substantially all assets of one, or subdividing an +organization, or merging organizations. If propagation of a covered +work results from an entity transaction, each party to that +transaction who receives a copy of the work also receives whatever +licenses to the work the party's predecessor in interest had or could +give under the previous paragraph, plus a right to possession of the +Corresponding Source of the work from the predecessor in interest, if +the predecessor has it or can get it with reasonable efforts. + + You may not impose any further restrictions on the exercise of the +rights granted or affirmed under this License. For example, you may +not impose a license fee, royalty, or other charge for exercise of +rights granted under this License, and you may not initiate litigation +(including a cross-claim or counterclaim in a lawsuit) alleging that +any patent claim is infringed by making, using, selling, offering for +sale, or importing the Program or any portion of it. + + 11. Patents. + + A "contributor" is a copyright holder who authorizes use under this +License of the Program or a work on which the Program is based. The +work thus licensed is called the contributor's "contributor version". + + A contributor's "essential patent claims" are all patent claims +owned or controlled by the contributor, whether already acquired or +hereafter acquired, that would be infringed by some manner, permitted +by this License, of making, using, or selling its contributor version, +but do not include claims that would be infringed only as a +consequence of further modification of the contributor version. For +purposes of this definition, "control" includes the right to grant +patent sublicenses in a manner consistent with the requirements of +this License. + + Each contributor grants you a non-exclusive, worldwide, royalty-free +patent license under the contributor's essential patent claims, to +make, use, sell, offer for sale, import and otherwise run, modify and +propagate the contents of its contributor version. + + In the following three paragraphs, a "patent license" is any express +agreement or commitment, however denominated, not to enforce a patent +(such as an express permission to practice a patent or covenant not to +sue for patent infringement). To "grant" such a patent license to a +party means to make such an agreement or commitment not to enforce a +patent against the party. + + If you convey a covered work, knowingly relying on a patent license, +and the Corresponding Source of the work is not available for anyone +to copy, free of charge and under the terms of this License, through a +publicly available network server or other readily accessible means, +then you must either (1) cause the Corresponding Source to be so +available, or (2) arrange to deprive yourself of the benefit of the +patent license for this particular work, or (3) arrange, in a manner +consistent with the requirements of this License, to extend the patent +license to downstream recipients. "Knowingly relying" means you have +actual knowledge that, but for the patent license, your conveying the +covered work in a country, or your recipient's use of the covered work +in a country, would infringe one or more identifiable patents in that +country that you have reason to believe are valid. + + If, pursuant to or in connection with a single transaction or +arrangement, you convey, or propagate by procuring conveyance of, a +covered work, and grant a patent license to some of the parties +receiving the covered work authorizing them to use, propagate, modify +or convey a specific copy of the covered work, then the patent license +you grant is automatically extended to all recipients of the covered +work and works based on it. + + A patent license is "discriminatory" if it does not include within +the scope of its coverage, prohibits the exercise of, or is +conditioned on the non-exercise of one or more of the rights that are +specifically granted under this License. You may not convey a covered +work if you are a party to an arrangement with a third party that is +in the business of distributing software, under which you make payment +to the third party based on the extent of your activity of conveying +the work, and under which the third party grants, to any of the +parties who would receive the covered work from you, a discriminatory +patent license (a) in connection with copies of the covered work +conveyed by you (or copies made from those copies), or (b) primarily +for and in connection with specific products or compilations that +contain the covered work, unless you entered into that arrangement, +or that patent license was granted, prior to 28 March 2007. + + Nothing in this License shall be construed as excluding or limiting +any implied license or other defenses to infringement that may +otherwise be available to you under applicable patent law. + + 12. No Surrender of Others' Freedom. + + If conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot convey a +covered work so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you may +not convey it at all. For example, if you agree to terms that obligate you +to collect a royalty for further conveying from those to whom you convey +the Program, the only way you could satisfy both those terms and this +License would be to refrain entirely from conveying the Program. + + 13. Use with the GNU Affero General Public License. + + Notwithstanding any other provision of this License, you have +permission to link or combine any covered work with a work licensed +under version 3 of the GNU Affero General Public License into a single +combined work, and to convey the resulting work. The terms of this +License will continue to apply to the part which is the covered work, +but the special requirements of the GNU Affero General Public License, +section 13, concerning interaction through a network will apply to the +combination as such. + + 14. Revised Versions of this License. + + The Free Software Foundation may publish revised and/or new versions of +the GNU General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + + Each version is given a distinguishing version number. If the +Program specifies that a certain numbered version of the GNU General +Public License "or any later version" applies to it, you have the +option of following the terms and conditions either of that numbered +version or of any later version published by the Free Software +Foundation. If the Program does not specify a version number of the +GNU General Public License, you may choose any version ever published +by the Free Software Foundation. + + If the Program specifies that a proxy can decide which future +versions of the GNU General Public License can be used, that proxy's +public statement of acceptance of a version permanently authorizes you +to choose that version for the Program. + + Later license versions may give you additional or different +permissions. However, no additional obligations are imposed on any +author or copyright holder as a result of your choosing to follow a +later version. + + 15. Disclaimer of Warranty. + + THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY +APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT +HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY +OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, +THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM +IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF +ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + + 16. Limitation of Liability. + + IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS +THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY +GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE +USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF +DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD +PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), +EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF +SUCH DAMAGES. + + 17. Interpretation of Sections 15 and 16. + + If the disclaimer of warranty and limitation of liability provided +above cannot be given local legal effect according to their terms, +reviewing courts shall apply local law that most closely approximates +an absolute waiver of all civil liability in connection with the +Program, unless a warranty or assumption of liability accompanies a +copy of the Program in return for a fee. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +state the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + <one line to give the program's name and a brief idea of what it does.> + Copyright (C) <year> <name of author> + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. + +Also add information on how to contact you by electronic and paper mail. + + If the program does terminal interaction, make it output a short +notice like this when it starts in an interactive mode: + + <program> Copyright (C) <year> <name of author> + This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, your program's commands +might be different; for a GUI interface, you would use an "about box". + + You should also get your employer (if you work as a programmer) or school, +if any, to sign a "copyright disclaimer" for the program, if necessary. +For more information on this, and how to apply and follow the GNU GPL, see +<http://www.gnu.org/licenses/>. + + The GNU General Public License does not permit incorporating your program +into proprietary programs. If your program is a subroutine library, you +may consider it more useful to permit linking proprietary applications with +the library. If this is what you want to do, use the GNU Lesser General +Public License instead of this License. But first, please read +<http://www.gnu.org/philosophy/why-not-lgpl.html>. ++ +
vignettes/data_format/dsm-data-formatting.Rmd
+ dsm-data-formatting.RmdThe dsm package expects data to be in a specific format. In this example I’ll show how to get data into that format and give some explanation as to why each of these requirements exist.
I recommend getting to grips with fitting models in dsm before reading this tutorial so you know what the end point will look like before trying to grapple with the data. An example for Gulf of Mexico pantropical spotted dolphins is available here.
To run the code below, you will need to have the ggplot2 and sf libraries installed. You can do this with the following code:
+install.packages(c("ggplot2", "sf"))The Rhode Island data required downloaded below:
+These should be placed in a folder called data for the code below to work.
When we collect distance sampling data, we need to record (1) the distance to an observation (along with other detectability covariates like weather), (2) the transect we are currently on, and (3) how much effort we expended on that transect (its length or number of visits).
+If we want to fit a DSM that information needs to be expanded to include where on the transect we are too, so we can build the spatial model. For point transects, this is simple as we are always at the point, but for line transects we need to know the position of the detection too (or equivalently, how far along the transect we are).
+More abstractly, we need information about both detections and effort. We also need to link these two. In dsm we use three objects to hold this information:
ddf.obj) holds all the detections but ignores spatial informationsegment.data) holds the chunks of effort (parts of the transect or point visits) and their locations. It might also include environmental covariate information.observation.data), making sure each detection has a corresponding segment.
dsm tablesThe above image gives an overview of how the different data.frames interact, along with how that corresponds to the real life situation of collecting data on a transect.
The rest of this example goes through each table in turn, explaining their construction.
+The data requirements are very similar to those for packages mrds and Distance. In both cases, each detection has a corresponding row. For an analysis using dsm we have additional requirements in the data (which can be auto-generated by Distance but we recommend against this to ensure that you know that records link up correctly).
For Distance we need:
distance : the distance to the detectionobject : a unique detection identifier (which will link to the observation table)size : the number of individuals in the detection (1 if objects occur alone)mrds requires the following extra columns (to allow for double observer surveys):
detected : 1 if detected by the observer and 0 if missed (always 1 for single observer)observer : observer number (1 or 2) (always 1 for single observer)If other covariates that affect detectability are recorded (e.g., sex or weather) then these can be included in this data for analysis in mrds or Distance.
Once the model is fitted, we deal only with that fitted model object in the dsm analysis.
I will use the term “segment” here to refer to both points for point transects and small chunks of transect for the line transect case. I’ll assume that you’ve already segmented your lines while first describing the data format, then go on to an example of how to segment line transect data.
+The data.frame required for the segments needs the following columns:
Effort : the effort expended on this segment (either the length of the segment for lines, or the number of visits for points)Sample.Label : a unique identifier for the segment (if you are engaged in a complicated survey, this might take a format like “YEAR-AREA-LINE-SEGMENT”, so you have labels like “2020-North-A-15”, which can be useful to keep track of where data came from later).In addition, environmental covariates like location and relevant covariates (sea surface temperature, foliage type, altitude, bathymetry, etc) can be included in this data.frame if they are to be used in the spatial model.
If you have data as lines and you want to chop them up into segments, this section will give some sample R code to do this. How things work is extremely dependent on the input data, but hopefully this gives a template for what you want to do at least.
+Before you jump in: If you are fluent in ArcGIS or QGIS I recommend doing this task in the software you are familiar with at first. A simple guide to doing this task in QGIS is here and for ArcGIS I recommend the MGET toolbox.
+As an example, we will look at an aerial survey of seabirds off the coast of Rhode Island, USA. Data from these surveys were collected by the University of Rhode Island and analysed in K. Winiarski, Miller, Paton, & McWilliams (2013), K. J. Winiarski, Burt, et al. (2014) and K. J. Winiarski, Miller, Paton, & McWilliams (2014). I have the transects saved as a shapefile that I will then “chop-up” into segments. To start with let’s plot the data with the coastline:
+
+library(ggplot2)
+library(sf)
+
+# coast data, just for reference
+coastline <- read_sf("data/ri_coast.shp")
+
+# our transect lines
+transects <- read_sf("data/ri_transects.shp")
+
+# now make the plot
+p <- ggplot() +
+ # geom_sf knows what to do with spatial data
+ geom_sf(data=coastline) +
+ geom_sf(data=transects, aes(colour=Transect)) +
+ # chop to be on a better scale
+ coord_sf(xlim=c(-72, -70.8), ylim=c(40.8, 41.5), expand=FALSE) +
+ # make the plot look simpler
+ theme_minimal()
+
+print(p)
In the above code we have loaded the sf package. This is the package we will use to do most of our spatial work in segmenting the data. It has most of the tools expected from a GIS, though the package is still being developed so its functionality is always increasing.
Investigating the transects data we loaded, we can see it consists of a data.frame with some extra bits (the spatial information, including locations (geometry) and projection (CRS)):
+transects## Simple feature collection with 26 features and 1 field
+## Geometry type: LINESTRING
+## Dimension: XY
+## Bounding box: xmin: -71.86948 ymin: 40.82944 xmax: -70.8944 ymax: 41.46521
+## Geodetic CRS: WGS 84
+## # A tibble: 26 × 2
+## Transect geometry
+## <dbl> <LINESTRING [°]>
+## 1 1 (-71.86948 41.29969, -71.86581 41.25891)
+## 2 2 (-71.82872 41.30738, -71.82145 41.2258)
+## 3 3 (-71.78805 41.31562, -71.75676 40.94846)
+## 4 4 (-71.74674 41.31728, -71.71394 40.92971)
+## 5 5 (-71.70589 41.3241, -71.67156 40.91612)
+## 6 6 (-71.66578 41.33943, -71.62329 40.82944)
+## 7 7 (-71.5435 41.36397, -71.5064 40.91516)
+## 8 8 (-71.5015 41.35994, -71.45266 40.82978)
+## 9 9 (-71.46131 41.37378, -71.42458 40.92496)
+## 10 10 (-71.421 41.38723, -71.37562 40.83641)
+## # ℹ 16 more rows
+We have one row per transect (for a total of 24 transects), each of which consists of a line joining the start and end points that make up the transects. If we want we can plot individual rows via plot(transects[1,]); this can be useful to check that each row is a single line, or if further processing is needed (this can be tricky and is not covered in this tutorial but see “More information” below).
This data looks in good shape, so we can use the st_segmentize function to take each transect and make segments from them. To make sure everything works correctly, we need to project the data first. Here I’m using an appropriate projected coordinate system (EPSG code 6348), which is the Rhode Island State Plane.
+# project transects
+transects <- st_transform(transects, 6348)
+
+# do the segmenting
+segs <- st_segmentize(transects, dfMaxLength=units::set_units(2000, "metres"))
+
+# transform back to lat/long
+segs <- st_transform(segs, 4326)
+transects <- st_transform(transects, 4326)Here we know the truncation used for the detection function was 1000m (the distances were collected in bins and this was the further bin), and since we’re trying to make our segments approximately square, we set the length to be twice that (since the truncation applies to either side of the transect), so 2000m.
+Looking at what that did:
+
+segs## Simple feature collection with 26 features and 1 field
+## Geometry type: LINESTRING
+## Dimension: XY
+## Bounding box: xmin: -71.86948 ymin: 40.82944 xmax: -70.8944 ymax: 41.46521
+## Geodetic CRS: WGS 84
+## # A tibble: 26 × 2
+## Transect geometry
+## * <dbl> <LINESTRING [°]>
+## 1 1 (-71.86948 41.29969, -71.86826 41.2861, -71.86703 41.2725, -71.8658…
+## 2 2 (-71.82872 41.30738, -71.82727 41.29106, -71.82581 41.27475, -71.82…
+## 3 3 (-71.78805 41.31562, -71.78654 41.29813, -71.78505 41.28065, -71.78…
+## 4 4 (-71.74674 41.31728, -71.74524 41.29967, -71.74373 41.28205, -71.74…
+## 5 5 (-71.70589 41.3241, -71.70438 41.30636, -71.70288 41.28863, -71.701…
+## 6 6 (-71.66578 41.33943, -71.6643 41.32185, -71.66283 41.30426, -71.661…
+## 7 7 (-71.5435 41.36397, -71.542 41.34602, -71.54051 41.32807, -71.53901…
+## 8 8 (-71.5015 41.35994, -71.49986 41.34227, -71.49821 41.3246, -71.4965…
+## 9 9 (-71.46131 41.37378, -71.45983 41.35583, -71.45835 41.33788, -71.45…
+## 10 10 (-71.421 41.38723, -71.41952 41.36946, -71.41804 41.3517, -71.41656…
+## # ℹ 16 more rows
+See now that each row has many coordinates attached to it, just looking at the first row (transect 1) and comparing the coordinates between transects and segs
+st_coordinates(transects[1,])## X Y L1
+## [1,] -71.86948 41.29969 1
+## [2,] -71.86581 41.25891 1
+transects has just two coordinates in it (the start and end points of the line). Whereas:
+st_coordinates(segs[1,])## X Y L1
+## [1,] -71.86948 41.29969 1
+## [2,] -71.86826 41.28610 1
+## [3,] -71.86703 41.27250 1
+## [4,] -71.86581 41.25891 1
+segs has 5, corresponding to our segment cut points.
We now need to break up the rows of segs into multiple rows, one per segment. This is a bit fiddly. We use a function provided by dieghernan on their site to do this. We first load the function (don’t worry about understanding the code!):
+stdh_cast_substring <- function(x, to = "MULTILINESTRING") {
+ ggg <- st_geometry(x)
+
+ if (!unique(st_geometry_type(ggg)) %in% c("POLYGON", "LINESTRING")) {
+ stop("Input should be LINESTRING or POLYGON")
+ }
+ for (k in 1:length(st_geometry(ggg))) {
+ sub <- ggg[k]
+ geom <- lapply(
+ 1:(length(st_coordinates(sub)[, 1]) - 1),
+ function(i)
+ rbind(
+ as.numeric(st_coordinates(sub)[i, 1:2]),
+ as.numeric(st_coordinates(sub)[i + 1, 1:2])
+ )
+ ) %>%
+ st_multilinestring() %>%
+ st_sfc()
+
+ if (k == 1) {
+ endgeom <- geom
+ }
+ else {
+ endgeom <- rbind(endgeom, geom)
+ }
+ }
+ endgeom <- endgeom %>% st_sfc(crs = st_crs(x))
+ if (class(x)[1] == "sf") {
+ endgeom <- st_set_geometry(x, endgeom)
+ }
+ if (to == "LINESTRING") {
+ endgeom <- endgeom %>% st_cast("LINESTRING")
+ }
+ return(endgeom)
+}We can then use it to separate the segments and look at the results:
+
+segs <- stdh_cast_substring(segs, to="LINESTRING")## Warning in st_cast.sf(., "LINESTRING"): repeating attributes for all
+## sub-geometries for which they may not be constant
+
+segs## Simple feature collection with 590 features and 1 field
+## Geometry type: LINESTRING
+## Dimension: XY
+## Bounding box: xmin: -71.86948 ymin: 40.82944 xmax: -70.8944 ymax: 41.46521
+## Geodetic CRS: WGS 84
+## # A tibble: 590 × 2
+## Transect geometry
+## <dbl> <LINESTRING [°]>
+## 1 1 (-71.86948 41.29969, -71.86826 41.2861)
+## 2 1 (-71.86826 41.2861, -71.86703 41.2725)
+## 3 1 (-71.86703 41.2725, -71.86581 41.25891)
+## 4 2 (-71.82872 41.30738, -71.82727 41.29106)
+## 5 2 (-71.82727 41.29106, -71.82581 41.27475)
+## 6 2 (-71.82581 41.27475, -71.82436 41.25843)
+## 7 2 (-71.82436 41.25843, -71.8229 41.24212)
+## 8 2 (-71.8229 41.24212, -71.82145 41.2258)
+## 9 3 (-71.78805 41.31562, -71.78654 41.29813)
+## 10 3 (-71.78654 41.29813, -71.78505 41.28065)
+## # ℹ 580 more rows
+We now have 590 segments between 1km and 2km long. We can check the lengths using the st_length function and plot a histogram of the segment lengths:

Note that in setting the dfMaxLength argument to st_segmentize we are giving a rough guide to the segment length and the algorithm inside st_segmentize tries to get segments to be as equal in length as possible (note the \(x\) axis in the histogram).
Note also that the Transect column in segs now has duplicate entries as each transect has multiple segments in it. We can now create our required columns for dsm.
First, the Effort column can be generated using st_length:
+segs$Effort <- st_length(segs)Then we can generate the Sample.Labels. Here I’ll use a more complicated naming scheme to show how that’s done, but one could simply write segs$Sample.Label <- 1:nrow(segs) and that would be sufficient. Instead, I would like to have my Sample.Label be the “YEAR-AREA-LINE-SEGMENT” scheme I suggested above.
There are fancier ways to do this, but for clarity, let’s use a for loop:
+# create a dummy column that we can fill in as we go
+segs$Sample.Label <- NA
+
+# set the year and area once for this data
+year <- 2010
+area <- "RI"
+
+# loop over the transect IDs
+for(this_transect in unique(segs$Transect)){
+ # how many segments in this transect?
+ n_segs <- nrow(subset(segs, Transect==this_transect))
+ # generate the n_segs labels that we need
+ segs$Sample.Label[segs$Transect==this_transect] <- paste(year, area, this_transect,
+ 1:n_segs, sep="-")
+}Now we can check that looks right:
+
+segs## Simple feature collection with 590 features and 3 fields
+## Geometry type: LINESTRING
+## Dimension: XY
+## Bounding box: xmin: -71.86948 ymin: 40.82944 xmax: -70.8944 ymax: 41.46521
+## Geodetic CRS: WGS 84
+## # A tibble: 590 × 4
+## Transect geometry Effort Sample.Label
+## * <dbl> <LINESTRING [°]> [m] <chr>
+## 1 1 (-71.86948 41.29969, -71.86826 41.2861) 1515. 2010-RI-1-1
+## 2 1 (-71.86826 41.2861, -71.86703 41.2725) 1515. 2010-RI-1-2
+## 3 1 (-71.86703 41.2725, -71.86581 41.25891) 1515. 2010-RI-1-3
+## 4 2 (-71.82872 41.30738, -71.82727 41.29106) 1818. 2010-RI-2-1
+## 5 2 (-71.82727 41.29106, -71.82581 41.27475) 1818. 2010-RI-2-2
+## 6 2 (-71.82581 41.27475, -71.82436 41.25843) 1818. 2010-RI-2-3
+## 7 2 (-71.82436 41.25843, -71.8229 41.24212) 1818. 2010-RI-2-4
+## 8 2 (-71.8229 41.24212, -71.82145 41.2258) 1818. 2010-RI-2-5
+## 9 3 (-71.78805 41.31562, -71.78654 41.29813) 1948. 2010-RI-3-1
+## 10 3 (-71.78654 41.29813, -71.78505 41.28065) 1948. 2010-RI-3-2
+## # ℹ 580 more rows
+With the required columns taken care of, we can also calculate the centroids of our segments to get the locations of each segment to use in the spatial model. Handily we can use st_centroid to do this in one step and keep all our data at the same time.
We get a warning about how centroids might not be calculated correctly when latitude/longitude are used. So again we need to project the data first (to Rhode Island State Plane) for this step using st_transform.
+# save the line version of segs for plotting later
+segs_lines <- segs
+
+# project segs
+segs <- st_transform(segs, 6348)
+
+# find centroids
+segs <- st_centroid(segs)## Warning: st_centroid assumes attributes are constant over geometries
+
+# project back to lat/long
+segs <- st_transform(segs, 4326)
+
+# how does that look?
+segs## Simple feature collection with 590 features and 3 fields
+## Geometry type: POINT
+## Dimension: XY
+## Bounding box: xmin: -71.86887 ymin: 40.83823 xmax: -70.89506 ymax: 41.45646
+## Geodetic CRS: WGS 84
+## # A tibble: 590 × 4
+## Transect geometry Effort Sample.Label
+## * <dbl> <POINT [°]> [m] <chr>
+## 1 1 (-71.86887 41.29289) 1515. 2010-RI-1-1
+## 2 1 (-71.86764 41.2793) 1515. 2010-RI-1-2
+## 3 1 (-71.86642 41.2657) 1515. 2010-RI-1-3
+## 4 2 (-71.82799 41.29922) 1818. 2010-RI-2-1
+## 5 2 (-71.82654 41.2829) 1818. 2010-RI-2-2
+## 6 2 (-71.82508 41.26659) 1818. 2010-RI-2-3
+## 7 2 (-71.82363 41.25028) 1818. 2010-RI-2-4
+## 8 2 (-71.82218 41.23396) 1818. 2010-RI-2-5
+## 9 3 (-71.78729 41.30687) 1948. 2010-RI-3-1
+## 10 3 (-71.7858 41.28939) 1948. 2010-RI-3-2
+## # ℹ 580 more rows
+Notice that our geometry type is now POINT. We can plot these centroids on our previous map:
+p <- ggplot() +
+ # geom_sf knows what to do with spatial data
+ geom_sf(data=coastline) +
+ geom_sf(data=transects) +
+ geom_sf(data=segs) +
+ # chop to be on a better scale
+ coord_sf(xlim=c(-72, -70.8), ylim=c(40.8, 41.5), expand=FALSE) +
+ # make the plot look simpler
+ theme_minimal()
+print(p)
At present, the dsm package doesn’t know how to talk to sf objects, so as a final step we need to simplify segs to be a regular data.frame, which dsm can interpret. We can extract the coordinates of the centroids using st_coordinates and then append them onto a data.frame version of the object with the geometry dropped, like so:
+segs_df <- cbind(as.data.frame(st_drop_geometry(segs)),
+ st_coordinates(segs))We can double check that looks right:
+
+head(segs_df)## Transect Effort Sample.Label X Y
+## 1 1 1515.147 [m] 2010-RI-1-1 -71.86887 41.29289
+## 2 1 1515.151 [m] 2010-RI-1-2 -71.86764 41.27930
+## 3 1 1515.155 [m] 2010-RI-1-3 -71.86642 41.26570
+## 4 2 1818.179 [m] 2010-RI-2-1 -71.82799 41.29922
+## 5 2 1818.184 [m] 2010-RI-2-2 -71.82654 41.28290
+## 6 2 1818.190 [m] 2010-RI-2-3 -71.82508 41.26659
+and we are done. See the next section for how to link the segments and the detections.
+If we want to include spatial covariate information in the segment table, we could then use (for example) rerrdap to obtain remotely-sensed data. For in situ data (i.e., weather conditions recorded while on effort), this can be more complicated and will require the use of summarize, see this thread for more information on how to deal with this situation.
To link together the data in the detection function and the spatial data giving the effort, we use the observation data.frame. This is really just a cross-reference between those two tables, so each detection lives in one segment.
The observation data.frame must have (at least) the following columns:
object : unique object identifier (corresponding to those identifiers in the detection function)Sample.Label : the identifier for the segment that the observation occurred insize : the size of each observed group (e.g., 1 if all animals occurred individually)distance : distance to observationThe observation data should have as many rows as there are in the detection function.
+There are a few methods for building the observation table. Here I’ll illustrate one assigning detections to segments based on their distance (i.e., detections are associated with their closest segment centroid). This is appropriate when you don’t see forwards too far, so it’s unlikely that detections will be miss assigned. This is true, for instance, in an aerial survey where observers look out windows located on the sides of the plane but might be inappropriate for a shipboard survey where observers on the flying bridge can see way ahead of the ship.
+First loading the detection data for this survey (this would be what we use to fit the detection function, post-processing):
+ +## Date Time Bin Lat Long Observer object size
+## 1 2011-04-19 18991230 B 40.85296 -71.45511 JV 46 1
+## 2 2011-02-17 18991230 A 40.87022 -71.54842 JV 106 1
+## 3 2011-05-02 18991230 A 40.87595 -71.30304 KW 121 1
+## 4 2011-12-04 18991230 C 40.91753 -71.38273 KW 286 1
+## 5 2011-02-23 18991230 A 40.94153 -71.38517 KW 446 1
+## 6 2011-02-07 18991230 A 40.94225 -71.46438 KW 457 1
+We can see there are columns for latitude and longitude (Lat and Long). We can make this data.frame be an sf object by telling sf that these columns contain spatial coordinates (in sf lingo, “geometry”):
+obs <- st_as_sf(obs, coords=c("Long", "Lat"))
+# set the coordinate system
+st_crs(obs) <- 4326
+obs## Simple feature collection with 941 features and 6 fields
+## Geometry type: POINT
+## Dimension: XY
+## Bounding box: xmin: -71.86962 ymin: 40.85296 xmax: -70.89175 ymax: 41.46472
+## Geodetic CRS: WGS 84
+## First 10 features:
+## Date Time Bin Observer object size geometry
+## 1 2011-04-19 18991230 B JV 46 1 POINT (-71.45511 40.85296)
+## 2 2011-02-17 18991230 A JV 106 1 POINT (-71.54842 40.87022)
+## 3 2011-05-02 18991230 A KW 121 1 POINT (-71.30304 40.87595)
+## 4 2011-12-04 18991230 C KW 286 1 POINT (-71.38273 40.91753)
+## 5 2011-02-23 18991230 A KW 446 1 POINT (-71.38517 40.94153)
+## 6 2011-02-07 18991230 A KW 457 1 POINT (-71.46438 40.94225)
+## 7 2011-04-19 18991230 A JV 500 2 POINT (-71.59495 40.94761)
+## 8 2011-05-02 18991230 A KW 506 1 POINT (-71.552 40.94795)
+## 9 2011-02-23 18991230 B JV 814 1 POINT (-71.51172 40.97912)
+## 10 2011-04-19 18991230 A KW 911 1 POINT (-71.59762 40.98571)
+We can overlay this on our previous map of transects:
+
+p <- ggplot() +
+ # geom_sf knows what to do with spatial data
+ geom_sf(data=coastline) +
+ geom_sf(data=transects) +
+ geom_sf(data=obs, size=0.5) +
+ # chop to be on a better scale
+ coord_sf(xlim=c(-72, -70.8), ylim=c(40.8, 41.5), expand=FALSE) +
+ # make the plot look simpler
+ theme_minimal()
+print(p)
Going back to our segs object above, we can use the st_join function, combined with the st_nearest_feature function to control how the tables are linked. Again, we need to project the data to avoid issues.
+# project segs
+segs <- st_transform(segs, 6348)
+obs <- st_transform(obs, 6348)
+
+# do the join
+obs <- st_join(obs, segs, join=st_nearest_feature)
+
+
+# project back to lat/long
+segs <- st_transform(segs, 4326)
+obs <- st_transform(obs, 4326)
+
+# how does that look?
+obs## Simple feature collection with 941 features and 9 fields
+## Geometry type: POINT
+## Dimension: XY
+## Bounding box: xmin: -71.86962 ymin: 40.85296 xmax: -70.89175 ymax: 41.46472
+## Geodetic CRS: WGS 84
+## First 10 features:
+## Date Time Bin Observer object size Transect Effort
+## 1 2011-04-19 18991230 B JV 46 1 8 1969.873 [m]
+## 2 2011-02-17 18991230 A JV 106 1 26 1894.100 [m]
+## 3 2011-05-02 18991230 A KW 121 1 12 1988.809 [m]
+## 4 2011-12-04 18991230 C KW 286 1 10 1979.630 [m]
+## 5 2011-02-23 18991230 A KW 446 1 10 1979.624 [m]
+## 6 2011-02-07 18991230 A KW 457 1 8 1969.839 [m]
+## 7 2011-04-19 18991230 A JV 500 2 24 1923.218 [m]
+## 8 2011-05-02 18991230 A KW 506 1 26 1894.075 [m]
+## 9 2011-02-23 18991230 B JV 814 1 7 2000.139 [m]
+## 10 2011-04-19 18991230 A KW 911 1 24 1923.199 [m]
+## Sample.Label geometry
+## 1 2010-RI-8-29 POINT (-71.45511 40.85296)
+## 2 2010-RI-26-16 POINT (-71.54842 40.87022)
+## 3 2010-RI-12-31 POINT (-71.30304 40.87595)
+## 4 2010-RI-10-27 POINT (-71.38273 40.91753)
+## 5 2010-RI-10-26 POINT (-71.38517 40.94153)
+## 6 2010-RI-8-24 POINT (-71.46438 40.94225)
+## 7 2010-RI-24-12 POINT (-71.59495 40.94761)
+## 8 2010-RI-26-12 POINT (-71.552 40.94795)
+## 9 2010-RI-7-22 POINT (-71.51172 40.97912)
+## 10 2010-RI-24-9 POINT (-71.59762 40.98571)
+Now obs has acquired additional columns from segs, including the Sample.Label (which we want) and Effort (which we don’t). To check this worked okay, we can plot the obs and segs_lines (the line version of the segments which we saved earlier) with colour coding for the Sample.Label. I randomized the colour order so it would be easier to tell if observations were misallocated.
+# make a random colour palette to avoid similar colours being near each other
+pal <- rainbow(nrow(segs), s=.6, v=.9)[sample(1:nrow(segs),nrow(segs))]
+p <- ggplot() +
+ # geom_sf knows what to do with spatial data
+ geom_sf(data=coastline) +
+ geom_sf(data=segs_lines, aes(colour=Sample.Label), pch=21) +
+ geom_sf(data=obs, size=0.5, aes(colour=Sample.Label)) +
+ # chop to be on a better scale
+ coord_sf(xlim=c(-72, -70.8), ylim=c(40.8, 41.5), expand=FALSE) +
+ scale_colour_manual(values=pal) +
+ # make the plot look simpler
+ theme_minimal() +
+ theme(legend.position = "none")
+print(p)
As with segs we can remove unnecessary columns and geometry to get the required columns only:
+# get rid of "spatialness"
+obs <- st_drop_geometry(obs)
+# select the columns we need
+obs <- obs[, c("object", "Sample.Label", "size", "Bin")]
+head(obs)## object Sample.Label size Bin
+## 1 46 2010-RI-8-29 1 B
+## 2 106 2010-RI-26-16 1 A
+## 3 121 2010-RI-12-31 1 A
+## 4 286 2010-RI-10-27 1 C
+## 5 446 2010-RI-10-26 1 A
+## 6 457 2010-RI-8-24 1 A
+(Note here since we have binned data, we just keep the Bin column and need to process this further later.)
A different way to approach this would be if there were timestamps on waypoints from the GPS, which could be related to the segments. One could then look at whether an observation was made between the start and end times of a segment.
+We have provided some information about segmenting on this wiki page, as part of the US Navy-funded project DenMod.
+This information is summarized at ?"dsm-data" in the dsm package. It may also be useful to look at the data in the example dataset mexdolphins which can be loaded with data(mexdolphins). The vignette for an analysis of those data is available here.
It is hard to give very general information on how to segment lines, as to some extent it depends on how the data were originally formatted (going all the way back to the make and model of the GPS unit used). More information on dealing with spatial data in R using the sf package is available at the R spatial website (see the “Articles” drop down in the header).
Another source of help is the distance sampling mailing list. Be sure to search the archives prior to posting as there have been several threads on segmenting previously that might be helpful.
+Thanks to Phil Bouchet for providing helpful comments to an early version of this document and to Iúri Correia for finding an important bug.
+‘Making sure your data is in the correct format for dsm.’
Example spatial analysis of pantropical spotted dolphins.
+vignettes/lines_gomex/mexico-analysis.Rmd
+ mexico-analysis.RmdThe analysis is based on a dataset of observations of pantropical dolphins in the Gulf of Mexico (shipped with Distance 6.0 and later). For convenience the data are bundled in an R-friendly format, although all of the code necessary for creating the data from the Distance project files is available on github. The OBIS-SEAMAP page for the data may be found at the SEFSC GoMex Oceanic 1996 survey page.
The intention here is to highlight the features of the dsm package, rather than perform a full analysis of the data. For that reason, some important steps are not fully explored. Some familiarity with density surface modelling (Miller, Burt, Rexstad, & Thomas, 2013) (Hedley & Buckland, 2004) is assumed.
Before we start, we load the dsm package (and its dependencies) and set some options:
+library(dsm)
+library(ggplot2)
+
+# plotting options
+gg.opts <- theme(panel.grid.major=element_blank(),
+ panel.grid.minor=element_blank(),
+ panel.background=element_blank())In order to run this vignette, you’ll need to install a few R packages. This can be done via the following call to install.packages:
install.packages(c("dsm", "Distance", "knitr", "distill", "ggplot2", "rgdal",
+ "maptools", "plyr", "tweedie"))
+Most of the data we need is included in the dsm package, but two additional objects needed for plotting are required and can be downloaded here and should be put into the same directory as this file. The data can then be loaded into R using the following code:
+load("mexdolphins-extra.rda")This should add the objects survey.area and pred.polys to your environment.
All of the data for this analysis has been nicely pre-formatted and is shipped with dsm. Loading that data, we can see that we have four data.frames, the first few lines of each are shown:
+data(mexdolphins)segdata holds the segment data: the transects have already been “chopped” into segments.
+head(segdata)## longitude latitude x y Effort Transect.Label Sample.Label depth
+## 1 -86.92712 29.94378 836105.9 -1011416 13800 19960417 19960417-1 135.0
+## 2 -86.83176 29.84030 846012.9 -1021407 14000 19960417 19960417-2 147.7
+## 3 -86.74445 29.75279 855022.6 -1029785 14000 19960417 19960417-3 152.1
+## 4 -86.65230 29.65522 864610.3 -1039168 13900 19960417 19960417-4 163.8
+## 5 -86.56648 29.56088 873598.1 -1048266 13800 19960417 19960417-5 179.7
+## 6 -86.49290 29.49000 881203.7 -1055004 13800 19960417 19960417-6 188.5
+distdata holds the distance sampling data that will be used to fit the detection function.
+head(distdata)## object size distance Effort detected beaufort latitude longitude
+## 45 45 21 3296.6363 36300 1 4 27.72872 -86.00159
+## 61 61 150 929.1937 17800 1 4 25.99896 -87.62712
+## 63 63 125 6051.0009 21000 1 2 26.00693 -87.94881
+## 85 85 75 5499.6971 21800 1 1 27.50344 -90.44891
+## 114 114 50 7258.9837 13400 1 3 27.40568 -94.99483
+## 120 120 45 1454.7962 20900 1 5 26.01765 -95.97449
+## x y
+## 45 948000.065 -1236192
+## 61 812161.653 -1436899
+## 63 780969.520 -1438985
+## 85 528656.807 -1297833
+## 114 95910.149 -1324562
+## 120 2477.665 -1473909
+obsdata links the distance data to the segments.
+head(obsdata)## object Sample.Label size distance Effort
+## 45 45 19960421-9 21 3296.6363 36300
+## 61 61 19960423-7 150 929.1937 17800
+## 63 63 19960423-9 125 6051.0009 21000
+## 85 85 19960427-1 75 5499.6971 21800
+## 114 114 19960430-8 50 7258.9837 13400
+## 120 120 19960501-5 45 1454.7962 20900
+preddata holds the prediction grid (which includes all the necessary covariates).
+head(preddata)## latitude longitude x y depth area
+## 1 30.08333 -87.58333 774402.9 -1002759.1 35 271236913
+## 2 30.08333 -87.41667 789688.6 -1001264.5 30 271236913
+## 3 30.08333 -87.25000 804971.3 -999740.6 27 271236913
+## 4 30.08333 -87.08333 820251.1 -998187.5 22 271236913
+## 5 30.08333 -86.91667 835528.0 -996605.2 46 271236913
+## 6 29.91667 -87.75000 760783.1 -1021810.3 14 271236913
+Typically (i.e. for other datasets) it will be necessary divide the transects into segments, and allocate observations to the correct segments using a GIS or other similar package1, before starting an analysis using dsm.
Often data in a spatial analysis comes from many different sources. It is important to ensure that the measurements to be used in the analysis are in compatible units, otherwise the resulting estimates will be incorrect or hard to interpret. Having all of our measurements in SI units from the outset removes the need for conversion later, making life much easier.
+The data are already in the appropriate units (Northings and Eastings: kilometres from a centroid, projected using the North American Lambert Conformal Conic projection).
+There is extensive literature about when particular projections of latitude and longitude are appropriate and we highly recommend the reader review this for their particular study area; (Bivand, Pebesma, & Gómez-Rubio, 2008) is a good starting point. The other data frames have already had their measurements appropriately converted. By convention the directions are named x and y.
Using latitude and longitude when performing spatial smoothing can be problematic when certain smoother bases are used. In particular when bivariate isotropic bases are used the non-isotropic nature of latitude and longitude is inconsistent (moving one degree in one direction is not the same as moving one degree in the other).
+We give an example of projecting the polygon that defines the survey area (which as simply been read into R using readShapeSpatial from a shapefile produced by GIS).
+library(sf)
+library(plyr)
+
+# tell R that the survey.area object is currently in lat/long
+sp::proj4string(survey.area) <- sp::CRS("+proj=longlat +datum=WGS84")
+
+predsf <- st_as_sf(pred.polys)
+
+area.sf <- st_as_sf(survey.area)
+st_crs(area.sf) <- "WGS84"
+area.sf.proj <- st_transform(area.sf, crs = st_crs(predsf))
+
+# Convert preddata to a spatial object
+preddata_sf <- st_as_sf(preddata, coords=c("x", "y"))
+st_crs(preddata_sf) <- st_crs(area.sf.proj)
+# Perform the intersection
+preddata_sf <- st_intersection(preddata_sf, area.sf.proj)## Warning: attribute variables are assumed to be spatially constant throughout
+## all geometries
+
+coords_preddata <- data.frame(st_coordinates(preddata_sf))
+
+preddata_sf$x <- coords_preddata$X
+preddata_sf$y <- coords_preddata$Y Next bit of code works with the study.area outline, massaging it for use with soap film smoother. If you will not be using soap film smoothers, the code is extraneous.
+# proj 4 string
+# using http://spatialreference.org/ref/esri/north-america-lambert-conformal-conic/
+lcc_proj4 <- sp::CRS("+proj=lcc +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs ")
+
+# project using LCC
+survey.area <- sp::spTransform(survey.area, CRSobj=lcc_proj4)
+
+# simplify the object
+survey.area <- data.frame(survey.area@polygons[[1]]@Polygons[[1]]@coords)
+names(survey.area) <- c("x", "y")The below code generates this plot, which shows the survey area with the transect lines overlaid (using data from segdata).
+segdata_sf <- st_as_sf(segdata, coords = c("x","y"))
+st_crs(segdata_sf) <- st_crs(area.sf.proj)
+# study area outline and segment centres
+plot_segments <- ggplot() +
+ geom_sf(data = area.sf.proj, fill="lightblue", color = "blue", linewidth=.1) +
+ geom_sf(data=segdata_sf, fill=NA, color="black", linewidth=.3) +
+ labs(title="1996 SE Fisheries Science Center Gulf of Mexico cruise",
+ subtitle = "Points are segment centres") +
+ scale_fill_viridis_c(option = "magma", guide = "none")
+plot_segments
Also note that since we have projected our prediction grid, the “squares” do not look quite like squares. So for plotting we will use the polygons that we have saved, these polygons (stored in pred.polys) are read from a shapefile created in GIS, the object itself is of class SpatialPolygons from the sp package. It needs project for proper plotting using functions in the sf package. predsf <- st_as_sf(pred.polys)
+predsf <- st_as_sf(pred.polys)
+# plot as projected
+plot(st_geometry(predsf), axes=TRUE)
Tips on plotting polygons are available from the ggplot2 pkg_down.
The top panels of the EDA plots below show histograms of observed distances and cluster size, while the bottom panels show the relationship between observed distance and observed cluster size, and the relationship between observed distance and Beaufort sea state. The plots show that there is some relationship between cluster size and observed distance (fewer smaller clusters seem to be seen at larger distances).
+The following code generates the EDA plots:
+
+par(mfrow=c(2,2))
+
+# histograms
+hist(distdata$distance,main="",xlab="Distance (m)")
+hist(distdata$size,main="",xlab="Cluster size")
+
+# plots of distance vs. cluster size
+plot(distdata$distance, distdata$size, main="", xlab="Distance (m)",
+ ylab="Group size", pch=19, cex=0.5, col=gray(0.7))
+
+# lm fit
+l.dat <- data.frame(distance=seq(0,8000,len=1000))
+lo <- lm(size~distance, data=distdata)
+lines(l.dat$distance, as.vector(predict(lo,l.dat)))
+
+plot(distdata$distance,distdata$beaufort, main="", xlab="Distance (m)",
+ ylab="Beaufort sea state", pch=19, cex=0.5, col=gray(0.7))
Top row, left to right: histograms of distance and cluster size; bottom row: plot of distance against cluster size and plot of distances against Beaufort sea state.
+Looking separately at the spatial data without thinking about the distances, we can plot the observed group sizes in space (point size is proportional to the group size for each observation). Circle size indicates the size of the group in the observation. There are rather large areas with no observations, which might cause our variance estimates for abundance to be rather large. We also see the depth data which we will use depth later as an explanatory covariate in our spatial model.
+
+prediction_grid <- st_make_grid(area.sf.proj, cellsize = c(9000,9000))
+prediction_grid_sf <- st_sf(geometry = prediction_grid)
+cropped_grid <- st_join(prediction_grid_sf, preddata_sf, join = st_nearest_feature)
+cropped_grid <- st_intersection(cropped_grid, area.sf.proj)## Warning: attribute variables are assumed to be spatially constant throughout
+## all geometries
+
+depth <- ggplot() +
+ geom_sf(data=cropped_grid, aes(fill=depth), color=NA) +
+ labs(title = "Spotted dolphins, Gulf of Mexico",
+ subtitle = "Depth in meters, size of detected dolphin groups") +
+ xlab("Longitude") + ylab("Latitude") +
+ geom_point(aes(x, y, size=size), data=distdata, colour="red",alpha=I(0.5)) +
+ scale_fill_viridis_c(option = "viridis", direction = 1)
+depth
We use the ds function in the package Distance to fit the detection function. (The Distance package is intended to make standard distance sampling in R relatively straightforward. For a more flexible but more complex alternative, see the function ddf in the mrds library.)
First, loading the Distance library:
We can then fit a detection function with hazard-rate key with no adjustment terms:
+ +## Warning in ddf.ds(dsmodel = dsmodel, data = data, meta.data = meta.data, :
+## Estimated hazard-rate scale parameter close to 0 (on log scale). Possible
+## problem in data (e.g., spike near zero distance).
+## Warning in ddf.ds(dsmodel = dsmodel, data = data, meta.data = meta.data, :
+## Estimated hazard-rate scale parameter close to 0 (on log scale). Possible
+## problem in data (e.g., spike near zero distance).
+## Warning in ds(distdata, max(distdata$distance), key = "hr", adjustment = NULL):
+## Estimated hazard-rate scale parameter close to 0 (on log scale). Possible
+## problem in data (e.g., spike near zero distance).
+Calling summary gives us information about parameter estimates, probability of detection, AIC, etc:
+summary(detfc.hr.null)##
+## Summary for distance analysis
+## Number of observations : 47
+## Distance range : 0 - 7847.467
+##
+## Model : Hazard-rate key function
+## AIC : 841.2528
+## Optimisation: mrds (nlminb)
+##
+## Detection function parameters
+## Scale coefficient(s):
+## estimate se
+## (Intercept) 7.98244 0.9533779
+##
+## Shape coefficient(s):
+## estimate se
+## (Intercept) 0 0.7833406
+##
+## Estimate SE CV
+## Average p 0.5912252 0.2224714 0.3762887
+## N in covered region 79.4959326 30.8184416 0.3876732
+The following code generates a plot of the fitted detection function (left) and quantile-quantile plot (right):
+ +##
+## Goodness of fit results for ddf object
+##
+## Distance sampling Cramer-von Mises test (unweighted)
+## Test statistic = 0.0972004 p-value = 0.598774
+

The quantile-quantile plot show relatively good goodness of fit for the hazard-rate detection function.
+It is common to include covariates in the detection function (so-called Multiple Covariate Distance Sampling or MCDS). In this dataset there are two covariates that were collected on each individual: Beaufort sea state and size. For brevity we fit only a hazard-rate detection functions with the sea state included as a factor covariate as follows:
+
+detfc.hr.beau<-ds(distdata, max(distdata$distance), formula=~as.factor(beaufort),
+ key="hr", adjustment=NULL)Again looking at the summary,
+summary(detfc.hr.beau)##
+## Summary for distance analysis
+## Number of observations : 47
+## Distance range : 0 - 7847.467
+##
+## Model : Hazard-rate key function
+## AIC : 843.7115
+## Optimisation: mrds (nlminb)
+##
+## Detection function parameters
+## Scale coefficient(s):
+## estimate se
+## (Intercept) 7.66291936 1.078972
+## as.factor(beaufort)2 2.24809809 14.754677
+## as.factor(beaufort)3 0.28581820 1.020809
+## as.factor(beaufort)4 0.07133776 1.225360
+## as.factor(beaufort)5 -0.36421115 1.540324
+##
+## Shape coefficient(s):
+## estimate se
+## (Intercept) 0.2994644 0.5184204
+##
+## Estimate SE CV
+## Average p 0.5420677 0.1753555 0.3234937
+## N in covered region 86.7050383 29.4646397 0.3398262
+Here the detection function with covariates does not give a lower AIC than the model without covariates (843.71 vs. 841.25 for the hazard-rate model without covariates). Looking back to the bottom-right panel of the EDA plots, we can see there is not a discernible pattern in the plot of Beaufort vs distance.
+For brevity, detection function model selection has been omitted here. In practise we would fit many different forms for the detection function (and select a model based on goodness of fit testing and AIC).
+Before fitting a dsm model, the data must be segmented; this consists of chopping up the transects and attributing counts to each of the segments. As mentioned above, these data have already been segmented.
We begin with a very simple model. We assume that the number of individuals in each segment are quasi-Poisson distributed and that they are a smooth function of their spatial coordinates (note that the formula is exactly as one would specify to gam in mgcv). By setting group=TRUE, the abundance of clusters/groups rather than individuals can be estimated (though we ignore this here). Note we set method="REML" to ensure that smooth terms are estimated reliably.
Running the model:
+ +We can then obtain a summary of the fitted model:
+
+summary(dsm.xy)##
+## Family: quasipoisson
+## Link function: log
+##
+## Formula:
+## count ~ s(x, y) + offset(off.set)
+##
+## Parametric coefficients:
+## Estimate Std. Error t value Pr(>|t|)
+## (Intercept) -18.20 0.53 -34.34 <2e-16 ***
+## ---
+## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+##
+## Approximate significance of smooth terms:
+## edf Ref.df F p-value
+## s(x,y) 24.8 27.49 2.354 0.000733 ***
+## ---
+## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+##
+## R-sq.(adj) = 0.121 Deviance explained = 43.4%
+## -REML = 936.04 Scale est. = 94.367 n = 387
+The exact interpretation of the model summary results can be found in (S. N. Wood, 2017); here we can see various information about the smooth components fitted and general model statistics.
+We can use the deviance explained to compare between models2.
+We can also get a rough idea of what the smooth of space looks like using vis.gam (white/yellow indicates high values, red low indicates low values):
+vis.gam(dsm.xy, plot.type="contour", view=c("x","y"), asp=1, type="response", contour.col="black", n.grid=100)
The type="response" argument ensures that the plot is on the scale of abundance but the values are relative (as the offsets are set to be their median values). This means that the plot is useful to get an idea of the general shape of the smooth but cannot be interpreted directly.
The data set also contains a depth covariate (which we plotted above). We can include in the model very simply:
+dsm.xy.depth <- dsm(count~s(x,y,k=10) + s(depth,k=20), detfc.hr.null, segdata, obsdata, method="REML")
+summary(dsm.xy.depth)##
+## Family: quasipoisson
+## Link function: log
+##
+## Formula:
+## count ~ s(x, y, k = 10) + s(depth, k = 20) + offset(off.set)
+##
+## Parametric coefficients:
+## Estimate Std. Error t value Pr(>|t|)
+## (Intercept) -18.740 1.236 -15.16 <2e-16 ***
+## ---
+## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+##
+## Approximate significance of smooth terms:
+## edf Ref.df F p-value
+## s(x,y) 6.062 7.371 0.923 0.5038
+## s(depth) 9.443 11.466 1.585 0.0824 .
+## ---
+## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+##
+## R-sq.(adj) = 0.0909 Deviance explained = 34.3%
+## -REML = 939.52 Scale est. = 124.9 n = 387
+Here we see a drop in deviance explained, so perhaps this model is not as useful as the first. We discuss setting the k parameter in Model checking, below.
Setting select=TRUE here (as an argument to gam) would impose extra shrinkage terms on each smooth in the model (allowing smooth terms to be removed from the model during fitting; see ?gam for more information). This is not particularly useful here, so we do not include it. However when there are many environmental predictors is in the model this can be a good way (along with looking at \(p\)-values) to perform term selection.
Simply calling plot on the model object allows us to look at the relationship between depth and the linear predictor:
+plot(dsm.xy.depth, select=2)
Omitting the argument select in the call to plot will plot each of the smooth terms, one at a time.
The code to fit the DSM when there are covariates in the detection function is similar to the other models, above. However since the detection function has observation-level covariates, we must estimate the abundance per segment using a Horvitz-Thompson-like estimator before modelling, so we change the response to be abundance.est:
As we can see, the summary results are rather similar:
+summary(dsm.est.xy)##
+## Family: quasipoisson
+## Link function: log
+##
+## Formula:
+## abundance.est ~ s(x, y) + offset(off.set)
+##
+## Parametric coefficients:
+## Estimate Std. Error t value Pr(>|t|)
+## (Intercept) -18.0233 0.4739 -38.03 <2e-16 ***
+## ---
+## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+##
+## Approximate significance of smooth terms:
+## edf Ref.df F p-value
+## s(x,y) 24.64 27.44 2.349 0.00029 ***
+## ---
+## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+##
+## R-sq.(adj) = 0.129 Deviance explained = 42.2%
+## -REML = 1043.6 Scale est. = 174.1 n = 387
+As is the resulting spatial smooth (though the resulting surface is somewhat “amplified”):
+
+vis.gam(dsm.est.xy, plot.type="contour", view=c("x","y"), asp=1, type="response", zlim=c(0, 300), contour.col="black", n.grid=100)
Often the quasi-Poisson distribution doesn’t give adequate flexibility and does not capture the overdispersion in the response data (see Model checking and Model selection below), so below we illustrate two additional distributions that can be used with count data.
+For the models in this section, we’ll move back to the count response, though the estimated abundance would also work.
Response distributions other than the quasi-Poisson can be used, for example the Tweedie distribution. The Tweedie distribution is available in dsm by setting family=tw().
+dsm.xy.tweedie <- dsm(count~s(x,y), detfc.hr.null, segdata, obsdata, family=tw(), method="REML")
+summary(dsm.xy.tweedie)##
+## Family: Tweedie(p=1.347)
+## Link function: log
+##
+## Formula:
+## count ~ s(x, y) + offset(off.set)
+##
+## Parametric coefficients:
+## Estimate Std. Error t value Pr(>|t|)
+## (Intercept) -17.2640 0.2363 -73.07 <2e-16 ***
+## ---
+## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+##
+## Approximate significance of smooth terms:
+## edf Ref.df F p-value
+## s(x,y) 12.82 17.13 1.628 0.0562 .
+## ---
+## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+##
+## R-sq.(adj) = 0.0684 Deviance explained = 26.8%
+## -REML = 351.49 Scale est. = 60.26 n = 387
+Though not used here there are, similarly, two options for the negative binomial distribution: negbin and nb. The former requires the user specification single parameter theta or a range of values for the parameter (specified as a vector), the latter estimates the value of theta during the model fitting process (and is generally faster). The latter is recommended for most users.
There is a large literature on spatial modelling using GAMs, much of which can be harnessed in a DSM context. Here are a few highlights.
+To account for a complex region (e.g., a region that includes peninsulae) we can use the soap film smoother (Simon N. Wood, Bravington, & Hedley, 2008).
+To use a soap film smoother for the spatial part of the model we must create a set of knots for the smoother to use. This is easily done using the make.soapgrid() function in dsm:
+soap.knots <- make.soapgrid(survey.area,c(15,10))where the second argument specifies the number of points (in each direction) in the grid that will be used to create the knots (knots in the grid outside of survey.area are removed).
As we saw in the exploratory analysis, some of the transect lines are outside of the survey area. These will cause the soap film smoother to fail, so we remove them:
+
+x <- segdata$x; y<-segdata$y
+onoff <- inSide(x=x,y=y, bnd=as.list(survey.area))
+rm(x,y)
+segdata.soap <- segdata[onoff,]Note that the soap_checker script available here can be useful in ensuring that the boundary, data and knots are in the correct format to use with the soap film smoother.
We can run a model with both the depth covariate along with a spatial (soap film) smooth. Note that the k argument now refers to the complexity of the boundary smooth in the soap film, and the complexity of the film is controlled by the knots given in the xt argument.
+dsm.xy.tweedie.soap<-dsm(count~s(x, y, bs="so", k=15, xt=list(bnd=list(survey.area))) +
+ s(depth),
+ family=tw(), method="REML",
+ detfc.hr.null, segdata.soap, obsdata, knots=soap.knots)
+summary(dsm.xy.tweedie.soap)##
+## Family: Tweedie(p=1.356)
+## Link function: log
+##
+## Formula:
+## count ~ s(x, y, bs = "so", k = 15, xt = list(bnd = list(survey.area))) +
+## s(depth) + offset(off.set)
+##
+## Parametric coefficients:
+## Estimate Std. Error t value Pr(>|t|)
+## (Intercept) -18.018 0.405 -44.49 <2e-16 ***
+## ---
+## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+##
+## Approximate significance of smooth terms:
+## edf Ref.df F p-value
+## s(x,y) 3.050 72.000 0.064 0.114944
+## s(depth) 5.135 6.196 3.895 0.000793 ***
+## ---
+## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+##
+## R-sq.(adj) = 0.0653 Deviance explained = 33.1%
+## -REML = 339.26 Scale est. = 54.022 n = 365
+Fitting models is all well and good, but we’d like to confirm that the models we have are reasonable; dsm provides some functions for model checking.
We can use gam.check to generate diagnostic plots:
+gam.check(dsm.xy)
##
+## Method: REML Optimizer: outer newton
+## full convergence after 6 iterations.
+## Gradient range [-7.105956e-08,-9.826305e-09]
+## (score 936.0363 & scale 94.3674).
+## Hessian positive definite, eigenvalue range [5.800328,192.5304].
+## Model rank = 30 / 30
+##
+## Basis dimension (k) checking results. Low p-value (k-index<1) may
+## indicate that k is too low, especially if edf is close to k'.
+##
+## k' edf k-index p-value
+## s(x,y) 29.0 24.8 1.01 0.99
+These show that there is some deviation in the Q-Q plot. The “line” of points in the plot of the residuals vs. linear predictor plot corresponds to the zeros in the data.
+Note that as well as the plots, gam.check also produces information about the model fitting. Of particular interest to us is the last few lines that tell us about the basis size.
The k parameter provided to s (and te) terms in dsm controls the complexity of the smooths in the model.
By setting the k parameter we specify the largest complexity for that smooth term in the model; as long as this is high enough, we can be sure that there is enough flexibility. In the output from gam.check above, we can see that there is a “p-value” calculated for the size of the basis, this can be a good guide as to whether the basis size needs to be increased.
The ?choose.k manual page from mgcv gives further guidance and technical details on this matter.
We can look at the same model form but with a Tweedie distribution specified as the response:
+
+gam.check(dsm.xy.tweedie)
##
+## Method: REML Optimizer: outer newton
+## full convergence after 8 iterations.
+## Gradient range [-1.513175e-06,3.981813e-06]
+## (score 351.486 & scale 60.26034).
+## Hessian positive definite, eigenvalue range [0.6230615,103.5094].
+## Model rank = 30 / 30
+##
+## Basis dimension (k) checking results. Low p-value (k-index<1) may
+## indicate that k is too low, especially if edf is close to k'.
+##
+## k' edf k-index p-value
+## s(x,y) 29.0 12.8 0.73 0.34
+The Q-Q plot now seems much better (closer to the \(y=x\) line). In both plots the histogram of residuals is rather hard to interpret due to the large number of zeros in the data.
+Further guidance on interpreting gam.check output can be found in (S. N. Wood, 2017).
In the top right panel of the above gam.check plots the residuals vs. linear predictor plot includes a odd line of predictions. These are an artifact of the link function, showing the exact zeros in the data. These can be misleading and distracting, making it difficult to see whether residuals show heteroskedasticity.
Randomised quantile residuals (Dunn & Smyth, 1996) avoid this issue by transforming the residuals to be exactly normally distributed. This makes the residuals vs. linear predictor plot much easier to interpret as it therefore doesn’t include the artifacts generated by the link function. These plots can be produced using rqgam.check in dsm:
+rqgam_check(dsm.xy.tweedie)
Here we can see that there is no issue with heteroskedasticity (no increase in spread in the residuals vs. linear predictor plot with increasing values of the linear predictor). One can also plot these residuals against covariate values to check for pattern in the residuals.
+Note that in general, plots other than “Resids vs. linear pred.” should be interpreted with caution in the output of rqgam.check as the residuals generated are normal by construction (so for example the Q-Q plot and histogram of residuals will always look fine).
Assuming that models have “passed” the checks in gam.check, rqgam.check and are sufficiently flexible, we may be left with a choice of which model is “best”. There are several methods for choosing the best model – AIC, REML/GCV scores, deviance explained, full cross-validation with test data and so on.
Though this document does not intend to be a full analysis of the pantropical dolphin data, we can create a results table to compare the various models that have been fitted so far in terms of their abundance estimates and associated uncertainties.
+
+# make a data.frame to print out
+mod_results <- data.frame("Model name" = c("`dsm.xy`", "`dsm.xy.depth`", "`dsm.xy.tweedie`", "`dsm.xy.tweedie.soap`",
+ "`dsm.est.xy`"),
+ "Description" = c("Bivariate smooth of location, quasipoisson",
+ "Bivariate smooth of location, smooth of depth, quasipoisson",
+ "Bivariate smooth of location, smooth of depth, Tweedie",
+ "Soap film smooth of location, smooth of depth, Tweedie",
+ "Bivariate smooth of location, smooth of depth, Tweedie, Beaufort covariate in detection function"),
+ "Deviance explained" = c(unlist(lapply(list(dsm.xy,
+ dsm.xy.depth,
+ dsm.xy.tweedie,
+ dsm.xy.tweedie.soap),
+ function(x){paste0(round(summary(x)$dev.expl*100,2),"%")})),NA))We can then use the resulting data.frame to build a table of results using the kable function:
+kable(mod_results, col.names=c("Model name", "Description", "Deviance explained"))| Model name | +Description | +Deviance explained | +
|---|---|---|
dsm.xy |
+Bivariate smooth of location, quasipoisson | +43.38% | +
dsm.xy.depth |
+Bivariate smooth of location, smooth of depth, quasipoisson | +34.32% | +
dsm.xy.tweedie |
+Bivariate smooth of location, smooth of depth, Tweedie | +26.78% | +
dsm.xy.tweedie.soap |
+Soap film smooth of location, smooth of depth, Tweedie | +33.05% | +
dsm.est.xy |
+Bivariate smooth of location, smooth of depth, Tweedie, Beaufort covariate in detection function | +NA | +
Once a model has been checked and selected, we can make predictions over the grid and calculate abundance. The offset is stored in the area column3.
+dsm.xy.pred <- predict(dsm.xy, preddata, preddata$area)Given predicted abundance estimates for each cell in the prediction grid, these are connected to the prediction grid object for mapping:
+
+prediction_grid <- st_make_grid(area.sf.proj, cellsize = c(9000,9000))
+prediction_grid_sf <- st_sf(geometry = prediction_grid)
+
+preddata_sf$Prediction_xy <- dsm.xy.pred
+
+cropped_grid <- st_join(prediction_grid_sf, preddata_sf, join = st_nearest_feature)
+cropped_grid <- st_intersection(cropped_grid, area.sf.proj)## Warning: attribute variables are assumed to be spatially constant throughout
+## all geometries
+
+pred <- ggplot() +
+ geom_sf(data = cropped_grid, aes(fill = Prediction_xy), color = NA) +
+ geom_sf(data=segdata_sf, fill=NA, color="white", linewidth=.001) +
+ labs(title="Spotted dolphins, Gulf of Mexico, abundance estimates",
+ subtitle = "Bivariate smooth of location, quasipoisson") +
+ scale_fill_viridis_c(option = "viridis", direction = 1)
+pred
We can calculate abundance over the survey area by simply summing these predictions:
+
+sum(dsm.xy.pred)## [1] 28462.22
+We can compare this with a plot of the predictions from this dsm.xy.depth:
+dsm.xy.depth.pred <- predict(dsm.xy.depth, preddata, preddata$area)
+
+preddata_sf$Prediction_xy_depth <- dsm.xy.depth.pred
+
+cropped_grid <- st_join(prediction_grid_sf, preddata_sf, join = st_nearest_feature)
+cropped_grid <- st_intersection(cropped_grid, area.sf.proj)## Warning: attribute variables are assumed to be spatially constant throughout
+## all geometries
+
+pred <- ggplot() +
+ geom_sf(data = cropped_grid, aes(fill = Prediction_xy_depth), color = NA) +
+ geom_sf(data=segdata_sf, fill=NA, color="white", linewidth=.001) +
+ labs(title="Spotted dolphins, Gulf of Mexico, abundance estimates",
+ subtitle = "Bivariate smooth of location, smooth of depth, quasipoisson") +
+ scale_fill_viridis_c(option = "viridis", direction = 1)
+pred
We can see the inclusion of depth into the model has had a noticeable effect on the distribution (note the difference in legend scale between the two plots). We can again also look at the total abundance:
+
+sum(dsm.xy.depth.pred)## [1] 27084.54
+Here we see that there is not much of a change in the abundance, so in terms of abundance alone there isn’t much between the two models. Next we’ll go on to look at variance next where we can see bigger differences between the models.
+Obviously point estimates of abundance are important, but we should also calculate uncertainty around these abundance estimates. Fortunately dsm provides functions to perform these calculations and display the resulting uncertainty estimates.
There are two approaches to estimating the uncertainty of an abundance estimate in dsm.
dsm_var_gam which assumes that the spatial model and detection function parts of the model are independent of each other. In this case the squared coefficients of variation for each model component are added.dsm_var_prop which takes into account the fact that detection probability may be correlated with the spatial part of the model. It uses methods described in (Bravington, Miller, & Hedley, 2021).dsm_var_prop can only be applied when there is a covariate in the detection function that varies at the level of the segments, and is recorded at each segment (for example Beaufort). We don’t have that situation here, so we opt for dsm_var_gam.
Both methods estimate the variance of the abundance for each element in the list provided in pred.data. In our case we wish to obtain an abundance for each of the prediction cells, so we use split to chop our data set into list elements to give to dsm_var_gam (or dsm_var_prop).
+preddata.var <- split(preddata, 1:nrow(preddata))
+dsm.xy.var <- dsm_var_gam(dsm.xy, pred.data=preddata.var,
+ off.set=preddata$area)Calling summary will give some information about uncertainty estimation:
+summary(dsm.xy.var)## Summary of uncertainty in a density surface model calculated
+## analytically for GAM, with delta method
+##
+## Approximate asymptotic confidence interval:
+## 2.5% Mean 97.5%
+## 16091.72 28430.76 50231.32
+## (Using log-Normal approximation)
+##
+## Point estimate : 28430.76
+## CV of detection function : 0.2471842
+## CV from GAM : 0.164
+## Total standard error : 8433.356
+## Total coefficient of variation : 0.2966
+We can also make a plot of the CVs (with transect lines and observations overlaid) using the following code.
+
+preddata_sf$CV <- sqrt(dsm.xy.var$pred.var)/preddata_sf$Prediction_xy
+
+cropped_grid <- st_join(prediction_grid_sf, preddata_sf, join = st_nearest_feature)
+cropped_grid <- st_intersection(cropped_grid, area.sf.proj)## Warning: attribute variables are assumed to be spatially constant throughout
+## all geometries
+
+CV <- ggplot() +
+ geom_sf(data = cropped_grid, aes(fill = CV), color = NA) +
+ geom_sf(data=segdata_sf, fill=NA, color="white", linewidth=.001) +
+ labs(title="Spotted dolphins, Gulf of Mexico, uncertainty (CV)",
+ subtitle = "Bivariate smooth of location, quasipoisson") +
+ scale_fill_viridis_c(option = "viridis", direction = 1)
+CV
Note the increase in CV away from the transect lines.
+We can revisit the model that included both depth and location smooths and observe that the coefficient of variation for that model is larger than that of the model with only the location smooth.
+
+dsm.xy.depth.var <- dsm_var_gam(dsm.xy.depth, pred.data=preddata.var,
+ off.set=preddata$area)
+summary(dsm.xy.depth.var)## Summary of uncertainty in a density surface model calculated
+## analytically for GAM, with delta method
+##
+## Approximate asymptotic confidence interval:
+## 2.5% Mean 97.5%
+## 14377.57 27054.60 50909.23
+## (Using log-Normal approximation)
+##
+## Point estimate : 27054.6
+## CV of detection function : 0.2816462
+## CV from GAM : 0.1741
+## Total standard error : 8958.444
+## Total coefficient of variation : 0.3311
+This document has outlined an analysis of spatially-explicit distance sampling data using the dsm package. Note that there are many possible models that can be fitted using dsm and that the aim here was to show just a few of the options. Results from the models can be rather different, so care must be taken in performing model selection, discrimination and criticism.
Distance is available at http://github.com/DistanceDevelopment/Distance as well as on CRAN.dsm is available at http://github.com/DistanceDevelopment/dsm, as well as on CRAN.