Raw Dataset Requests
Data Access Policy:
The Cornell Lab of Ornithology & Bird Studies Canada are committed to making data gathered through our citizen science programs freely accessible. For Project FeederWatch, basic maps, graphs, tables, and other simple summaries of the data are accessible online in the Explore section of the FeederWatch web site.
Researchers seeking to conduct formal analyses using FeederWatch data can access the raw data as outlined below. As with use of any data set, knowing the data structure, understanding the metadata, grasping the data collection protocols, and being cognizant of the unique aspects of the program are all critical for conducting analyses and interpreting results in ways that provide meaningful insights. Although the data are freely available, we strongly encourage researchers to consult with researchers at the Cornell Lab of Ornithology or Bird Studies Canada (contact information below) to ensure that the data are being handled and analyzed in a meaningful way.
Journalists or students looking to access data for articles or class projects should first consider using the summary data that are available online. Using the raw data requires proficiency in statistical software such as R or SAS and we do not have the staff available to assist with this process or to create custom subsets of the raw data. Nonetheless we are happy to provide access to the full dataset and instructions for how to use and interpret it, as we do for researchers.
Below are a series of considerations to which we specifically want to draw the attention of anyone interested in analysis of FeederWatch data. Other considerations, not listed here, may apply with some specific uses of the data.
Note that it is impossible to validate each of the millions of records submitted to FeederWatch. This problem is shared by all large-scale citizen science programs. Although we attempt to minimize errors, a small percentage of FeederWatch reports are incorrect and analysts must be aware that misidentifications, data entry errors, and other sources of error can evade our data validation system.
All FeederWatch data are passed through a series of geographically and temporally specific filters that “flag” reports of species (or high counts) that are unexpected at a given location at a certain time of the year. The geographic resolution is relatively coarse (one filter per state/province) and the temporal resolution is monthly. Only reports that are flagged by the filters undergo a systematic manual review. A flag may be removed by the expert reviewer without a request for supporting information, or additional evidence may be requested. If additional information is requested but is insufficient to validate the report, that record remains in the database and is identified as an unconfirmed report. Flagged records are identified using a combination of the Valid field and the Reviewed field as defined here:
Valid = 1; Reviewed = 0
Interpretation: Report did not trigger the automatic flagging system and was accepted into the database without review
Valid = 1; Reviewed = 1
Interpretation: Report triggered the flagging system and was approved by an expert reviewer
Valid = 0; Reviewed = 1
Interpretation: Report triggered a flag by the automated system and was reviewed; insufficient evidence was provided to confirm the report
Valid = 0; Reviewed = 0
Interpretation: Report triggered a flag by the automated system and awaits the review process
Potential Errors Not Captured by Automated Filters:
Note that the flagging system does not identify all potential errors. For instance, if a species is misidentified as another species that could occur in the region, that report will not be flagged for review. In other words, a Downy Woodpecker may be misidentified as a Hairy Woodpecker as these species are often sympatric. As such, we recommend that data analysts carefully consider which species are included in their analyses, and we often lump difficult-to-distinguish species in our analyses. For instance, Carolina Chickadee and Black-capped Chickadee reports are analyzed as “chickadee species” in regions of geographic overlap. Similar lumping is suggested for Sharp-shinned and Cooper’s Hawks (Accipiter sp.), and for House, Purple, and Cassin’s Finch (Haemorhous sp.).
Additionally, errors in reporting can mimic errors in identification. Participants may intend to report one species but enter their information for the wrong species. The evolution of the data-entry process has created designs for paper forms and web pages that minimize the likelihood of such errors. Nevertheless, such errors are possible.
While we know that errors exist in the data, our experience based on handling and use of these data lead us to believe that such errors are generally minimal, and that biologically real patterns will emerge from analysis of these data. All large data sets contain errors. We strive to minimize such errors, but nevertheless advise anyone analyzing these data to handle, analyze and interpret these data with the understanding that these data are not perfect.
As with any monitoring data, a recorded observation is a function of both the biological event (number of species actually present) and the observation process (probability that an individual, when present, will be observed). Detection probabilities can be formally estimated with FeederWatch data (see Zuckerberg et al. 2011 paper in list of FeederWatch publications). When this cannot be done, we strongly suggest that analysts minimally include predictors of the observation process, the effort expended by participants (number of half-days and/or number of hours of observation), as predictors in their statistical models, in order to describe increasing probabilities of observing birds with increasing time spent in making observations.
Our unique dataset is completely dependent on the efforts of our network of volunteer participants. We ask that all data analysts give credit to the thousands of participants who have made FeederWatch possible, as well as to Bird Studies Canada and the Cornell Lab of Ornithology for developing and managing the program.
Scientific Publications as a Resource for Analysts:
Analysts will find previous publications informative in providing more detailed information on the process of analyzing FeederWatch data. See a list of scientific articles using FeederWatch data.
FeederWatch participants are identified in the database by their unique Cornell Lab of Ornithology or Bird Studies Canada identification number. We do not share names, addresses, contact information, or any personal information about our participants without express permission from each individual participant. For confirmed rare bird reports, we do post the name, city, and state of the observer along with the report on the FeederWatch website. We will not post this information without first contacting the observer and we will withhold any such reports from public view when asked.
Consulting with Cornell or BSC Staff:
Analyzing large data sets is complicated, and requires skill in both conducting the analyses themselves but also in manipulating the data into the appropriate form for analysis. These data are best analyzed in collaboration or consultation with CLO or BSC research staff who have experience working with FeederWatch data. Please note that our resources are limited, so the responsiveness and extent of support may be constrained. We will do our best to meet all requests as time and resources permit. We will concentrate our efforts on answering questions about the general processes of analyzing data from FeederWatch rather than the mechanics of using specific software to work through this process. Suggested contacts include:
David Bonter, Asst. Director Citizen Science, Cornell Lab of Ornithology: dnb23 at cornell.edu
Wesley Hochachka, Senior Research Associate, Cornell Lab of Ornithology: wmh6 at cornell.edu
Denis Lepage, Senior Scientist, Bird Studies Canada: dlepage at bsc-eoc.org
Datasets are available upon request from the staff listed above. Please note that data files are large (> 1.8 million checklists) so you must be able to use advanced database tools (e.g. MySQL, Microsoft Access) and statistical software (e.g. SAS or R) in order to handle the data. Please also note that data extraction takes staff time, so be clear and as specific as possible when requesting data. Thank you for your patience while waiting for data to be retrieved.