Raw Dataset Requests
Public access to FeederWatch data
The Cornell Lab of Ornithology and Bird Studies Canada are committed to making data gathered through our citizen science programs freely accessible to students, journalists, and the general public. Basic maps, trend graphs, summary tables by state, and more are all accessible online in the Explore section of the FeederWatch web site.
Raw data access for research scientists
Researchers seeking to conduct formal analyses using FeederWatch data are invited to request access to the raw FeederWatch dataset after reviewing the considerations outlined below. As with use of any data set, knowing the data structure, understanding the metadata, grasping the data collection protocols, and being cognizant of the unique aspects of the program are all critical for conducting analyses and interpreting results in ways that provide meaningful insights. Although the data are freely available, we strongly encourage researchers to consult with researchers at the Cornell Lab of Ornithology or Bird Studies Canada (contact information below) to ensure that the data are being handled and analyzed in a meaningful way.
Note that raw data files are large (> 1.8 million checklists) and require proficiency in statistical software (e.g. SAS or R) or advanced database tools (e.g. MySQL, Microsoft Access). Project FeederWatch does not have the staff available to assist with these tools or to create custom subsets of the raw data. Nonetheless we are happy to provide access to the full dataset and instructions for how to use and interpret it.
Considerations to review before requesting raw data
As with all large-scale citizen science programs, it is impossible to validate each of the millions of records submitted to FeederWatch. Although we attempt to minimize errors, a small percentage of FeederWatch reports are incorrect and analysts must be aware that misidentifications, data entry errors, and other sources of error can evade our data validation system.
All FeederWatch data are passed through a series of geographically and temporally specific filters that “flag” reports of species (or high counts) that are unexpected at a given location at a certain time of the year. The geographic resolution is relatively coarse (one filter per state/province), and the temporal resolution is monthly. Only reports that are flagged by the filters undergo a systematic manual review. A flag may be removed by the expert reviewer without a request for supporting information, or additional evidence may be requested. If additional information is requested but is insufficient to validate the report, that record remains in the database and is identified as an unconfirmed report. Flagged records are identified using a combination of the Valid field and the Reviewed field as defined here:
Valid = 1; Reviewed = 0
Interpretation: Report did not trigger the automatic flagging system and was accepted into the database without review
Valid = 1; Reviewed = 1
Interpretation: Report triggered the flagging system and was approved by an expert reviewer
Valid = 0; Reviewed = 1
Interpretation: Report triggered a flag by the automated system and was reviewed; insufficient evidence was provided to confirm the report
Valid = 0; Reviewed = 0
Interpretation: Report triggered a flag by the automated system and awaits the review process
Potential errors not captured by automated filters
The flagging system does not identify all potential errors. For instance, if a species is misidentified as another species that could occur in the region, that report will not be flagged for review. In other words, a Downy Woodpecker may be misidentified as a Hairy Woodpecker as these species are often sympatric. As such, we recommend that data analysts carefully consider which species are included in their analyses. We often lump difficult-to-distinguish species in our analyses. For instance, Carolina Chickadee and Black-capped Chickadee reports are analyzed as “chickadee species” in regions of geographic overlap. Similar lumping is suggested for Sharp-shinned and Cooper’s Hawks (Accipiter sp.), and for House, Purple, and Cassin’s Finches (Haemorhous sp.).
Additionally, errors in reporting can mimic errors in identification. Participants may intend to report one species but enter their information for the wrong species. The evolution of the data-entry process has created designs for paper forms and web pages that minimize the likelihood of such errors. Nevertheless, such errors are possible.
While we know that errors exist in the data, our experience based on handling and use of these data lead us to believe that such errors are generally minimal and that biologically real patterns will emerge from analysis of these data. All large data sets contain errors. We strive to minimize such errors, but nevertheless advise anyone analyzing these data to handle, analyze, and interpret these data with the understanding that these data are not perfect.
As with any monitoring data, a recorded observation is a function of both the biological event (number of species actually present) and the observation process (probability that an individual, when present, will be observed). Detection probabilities can be formally estimated with FeederWatch data (see Zuckerberg et al. 2011 paper in list of FeederWatch publications). When formal estimation cannot be done, we strongly suggest that analysts minimally include predictors of the observation process, the effort expended by participants (number of half-days and/or number of hours of observation), as predictors in their statistical models, in order to describe increasing probabilities of observing birds with increasing time spent in making observations.
Scientific Publications as a Resource for Analysts
Analysts will find previous publications informative in providing more detailed information on the process of analyzing FeederWatch data. See a list of scientific articles using FeederWatch data.
FeederWatch participants are identified in the database by their unique Cornell Lab of Ornithology (CLO) or Bird Studies Canada (BSC) identification number. We do not share names, addresses, contact information, or any personal information about our participants without express permission from each individual participant. For confirmed rare bird reports, we post the name, city, and state of the observer with the report on the FeederWatch website after first contacting the observer, and we withhold any such reports from public view when asked.
Our unique dataset is completely dependent on the efforts of our network of volunteer participants. We ask that all data analysts give credit to the thousands of participants who have made FeederWatch possible, as well as to Bird Studies Canada and the Cornell Lab of Ornithology for developing and managing the program.
Consulting with CLO or BSC staff
Analyzing large data sets is complicated and requires skill in both conducting the analyses themselves but also in manipulating the data into the appropriate form for analysis. These data are best analyzed in collaboration or consultation with CLO or BSC research staff who have experience working with FeederWatch data. Please note that our resources are limited, so the responsiveness and extent of support may be constrained. We will do our best to meet all requests as time and resources permit. We will concentrate our efforts on answering questions about the general processes of analyzing data from FeederWatch rather than the mechanics of using specific software to work through this process. Suggested contacts include:
David Bonter, Director Citizen Science, Cornell Lab of Ornithology: dnb23 at cornell.edu
Wesley Hochachka, Senior Research Associate, Cornell Lab of Ornithology: wmh6 at cornell.edu
Denis Lepage, Senior Scientist, Bird Studies Canada: dlepage at bsc-eoc.org
Raw data access instructions
Datasets are available upon request from the staff listed above. Please note that data extraction takes significant staff time, so be clear and as specific as possible when requesting data. Thank you for your patience while waiting for data to be retrieved.