Analysis and Visualization of Animal Behavior dataset


This page describes various tools and practices used for the analysis and mapping the domain of Animal Behavior studies. Links to results and source code are available for download.

Index


I. Authors and Affiliations

II. Introduction

Summary

This work describes application of latest analysis and visualization techniques to the domain of Animal Behavior studies. The motivation emerged from the fact that the field has remained uncharted and only been probed using conventional reviews and smaller subset of published research. We demonstrate that using knowledge-domain visualization techniques can reveal interesting details about the prominent and emerging research areas over the years.

Overview

Traditional methods of mapping and understanding growth of scientific research and literature in the past have usually lacked comprehensive coverage and have been quite painstaking. This changed considerably with the advent of electronic databases and easy access to published research on the world wide web. Particularly, journal citation related information can now provide valuable information for finding associations within a field.

Information gleaned from such electronic records is very helpful in mapping domains and charting various relationships that evolve over a period of time. These associations provide a view of the evolution of the field in the past as well as, emergence and growth of promising areas and opportunities for collaboration. In order to demonstrate the potential of these latest analysis and visualization techniques, we used citation records for the domain focusing on animal behavior studies. The fact that the field has rich publications and has never been mapped in this way provided ample motivation for the study.

Procedure: Our analysis was based on published records belonging to a set of core journals derived from the Biological Abstracts database. These journal records were analyzed using techniques like Latent Semantic Analysis (LSA) and Co-word analysis for representative time slices. Latent semantic analysis was applied to determine semantic similarity among the documents based on their keywords. Geographical information was used to obtain interesting results regarding hotspots of published and ongoing research.

Outcome of study: Areas that emerged corresponded to the leading research fields for all the journals. Further, co-word analysis gave us the changes in topics covered in the years under consideration. Graph visualization software, Pajek [1], provided interesting visual insights into relationships within document and keyword spaces. These analytical and visual tools helped domain experts in identifying the focus areas of each journal, their coverage and changes in their dynamic field and have established a foothold for deeper exploration and mapping of the domain.

Initial work was presented as a poster during the Conference of Visualization and Data Analysis, 2004 (view paper here). Subsequently, a paper titled "Trends in animal behaviour research (1968–2002): ethoinformatics and the mining of library databases" has been published in Animal Behavior (journal of animal behavior studies; view paper here).

Source: The download directory is located here . This directory contains source code and data pertaining to the analysis.


III. Tools used

We employed several statistical techniques for the work. Specific algorithms and tools were used to analyze the results as graph visualizations. The various techniques that were used are described below along with a brief mention of tools they used:


IV. Dataset

Description: The primary source of the journal citation records was the Biological Abstract dataset, which consists of about 200 journals grouped into three broad categories. The categories are derived on the basis of classification of Animal Behavior Abstracts at the Cambridge Scientific Abstracts website. They include journals categorized as: Core, Priority and Selective set of journals. The Animal Behavior Abstracts page links to the complete source serial list of the abstracts.

Procedure: Using International Standard Serial Numbers (ISSN) to identify journals, we downloaded all available citation records as ASCII text files from the Biological Abstracts database (BIOSIS, Inc 2002). The standard library web interface prevents data streaming by restricting the number of records that can be downloaded at a time, and can result in a large number of network timeouts and failures. Instead, we used access client software SilverPlatter WinSPIRS v4.01 (Ovid Technologies, Inc 1999) to download records directly from the server of Indiana University's library database.

Several Perl parsers were used to extract relevant data from downloaded files. Parsers are available on this page via links. Using these parsers, we converted downloaded files to text files (delimited by the pipe symbol or '|') and removed duplicate documents (identified by the string 'title#year#author#keywords'). To verify that our dataset was complete, we noted the number of documents listed by Biological Abstracts during each ISSN search and later crosschecked this number with those files actually downloaded. (Please refer section on Data extraction and cleaning for link to parsers).

Finally, we calculated summary information on the number of unique records for each journal in each year. Publication records for several journals were incomplete, ranging from approximately one year (e.g., Behavioural Processes - 1977) to thirteen years (e.g., Behavioral Ecology and Sociobiology - 1977-1982, 1985-1991; Table 1). Before proceeding with our analyses, we confirmed these records were missing from Biological Abstracts (E. Ten Have, BIOSIS, pers. comm.) and not the consequence of errors accumulated during data acquisition. Our final dataset covered 25 journals from 1968 to 2002 totaling 42,836 records of published material (Table 1; Fig. 1).

Results: In all, there are about half-million records, the bulk of which span the years 1968 through 2002. The downloaded, unparsed data was organized as follows:

The journals were classified into three categories:

  1. Core set: 14 journals; 15282 records. Some of the core journals are:
    Anim. Behav., Anim. Learn. Behav., Appl. Anim. Behav. Sci., Behaviour, Behav. Ecology, Behav. Ecology and Sociobiology, Behav. Processes, Bird Behav., Birds N. America, Ethology, Journal of Ethology, Journal of Experimental Psychology: Anim. Behav. Processes, Journal of Insect Behav., Learning and Motivation (subsequently, a set of 25 journals was also used as core-journals, see the statistical analysis section below for the listing)
  2. Priority set: 38 journals
  3. Selective set: 166 journals

The complete list of journals as categorized using the above is available at this link (text file, 8 Kb).

Format of a typical Record: The downloaded records primarily contain journal citation information that corresponds to the ISI (Institute for Scientific Information) data format. Prominent fields include: Title, Keywords, Author(s), Affiliation (addresses), Abstract, Species taxonomic information, publication Year, etc. among forty fields.

The complete description of most of the fields occurring in the data set is available at this link (html).

A typical record from the original downloaded journal-data is described next. Records are organized by serial numbers and the record header includes information about the source in the Biological Abstracts database. For example, the following is a record header -

Record 1 of 29 - Biological Abstracts 2001/07-2001/09

Record header is followed by the citation details namely, Title, Author, Address, Source or Journal name, Publication year, and so on. This data is in the following format -

Table 1:

  --------------------------------------------------------------------------------
Field-name    Data
--------------------------------------------------------------------------------
TI: Additional material of the enigmatic golden mole Cryptochloris zyli, with notes on ...
AU: Helgen-K-M {a}; Wilson-D-E
AD: {a} Mammal Department, Museum of Comparative Zoology, Harvard University, Cambridge, MA, 02138: helgen@fas.harvard.edu, USA
SO: African-Zoology. [print] April, 2001; 36 (1): 10-112.
PY: 2001

--------------------------------------------------------------------------------

There were several issues with the format and availability of the main fields within the records. The description of the main fields and issues pertaining to their extraction are available at this link (html file) .


V. Data Extraction and Cleaning

Description: The analysis focused on the vocabulary chosen for titles and keywords of animal behavior publications. We used title-words to explore major changes in topics and study organisms of interest across the three decades included in our dataset. Keywords contained more detailed information and had the advantage of retaining compound terms, such as 'sexual selection' and 'parental care'. We used keywords for more detailed analysis of journal and geographic trends.

Several issues existed with the raw data which had to be addressed before proceeding with the analysis. These are mentioned here -

  1. Missing Data - The quality of downloaded data was dependent on the active sessions while using the Winspire software. Some data might have been lost if the connection with the BA database broke prematurely. This led to some missing data or gaps in the dataset which were filled at a later stage.
  2. Lack of cited references - the data set did not include the cited references. The analysis did not focus on cited references.
  3. Duplicate records - duplicate records existed for several entries and these had to be eliminated before analyses could be carried out.
  4. Missing entries - most records did not contain all the forty fields mentioned in the links above. Further, there were certain fields like the URL that occurred only in a very small percentage of the records.
  5. Split files - journals that had very large number of records were split into smaller files. These files were combined into a single file when extracting data for individual journals.

Procedure: In our analysis of title-words, we began by using parsers to remove uninformative words such as 'of' and 'the', identified from stop-word lists (available from Börner & Zhou 2001; Börner 2004). Keyword information is presented in Biological Abstracts as a string of semi-colon separated compound words (e.g., "anthophilous insects; breeding systems; climate severity; disturbance; evolution; habitat") and are found in two separate fields: DE or 'descriptors', and MI or 'misc. indicators'. We combined keywords from DE and MI fields before conducting further analysis. For both title and keyword analyses, we also removed "behavio/ur" and "behavio/ural", which occurred often enough that they might obscure more subtle document associations.

Data was extracted from the ISI-formatted files and extracted in the following steps:

  1. The pipe-character ('|') was used as the record delimiter.
  2. The parser for data-extraction (perl file) was employed to obtain the records from all the 340 files into a single '|'-delimited files
  3. The parser for duplicate-removal (perl file) was used to extract the unique records. Records were identified as duplicate if they contained the same string from the concatenation of following fields (ISI acronyms)
    1. Title (TI)
    2. Year (PY)
    3. Author (AU)
    4. Descriptors (DE) - keywords
    5. Miscellaneous Descriptors (MI) - keywords
  4. All common and uninformative words were eliminated. The list of such stop-words is available at this link (text file)
  5. All the text characters were converted to upper-case to maintain uniformity.

VI. Statistical Analysis

Description: We began our analysis by ascertaining the trends in volume of publications over the years spanned by the available records. We hoped to gain valuable information regarding spans of year that could be of particular interest.

Procedure: Unique records were isolated and counted for individual journal and all the journals respectively to generate the distributions discussed below.

Results:

Set of 13 core journals - The initial data set consisted of records from 1968-2002. The number of records for various years for the following journals selected from the core set of journals is shown below:
Anim. Behav., Appl. Anim. Behav. SCI, Behav. Ecology, Behav. Ecology and Sociobiology, Behav. Processes, Journal of Ethology, Journal of, Journal of Insect Behav., Learning and Motivation

Trends in core journals
(click to enlarge) Figure 1: Number of journal publications for journals selected from the set of core journals

The following figure shows the trends in core journal group. Two groups of years were focused upon namely, 1972-77 and 1996-2001. The blue bars correspond to these two groups: :

Trends in core journals
(click to enlarge) Figure 2: Trends in publication in select core journals within Animal Behavior journals

The above figures show the growth of the Animal Behavior domain over the years.

The initial statistics focused on gathering information about the extent of the dataset and the possible fields that could be of interest. The data gathered is presented below:

  1. Publication Years: 43, spanning 1915-2002 (only partial data for the year 2003 was available)
  2. Titles: about 443,000 records that had Titles
  3. Keywords: 103,000 DE terms and 600,000 MI terms
  4. Authors: about 537,000 entries (duplicate or alternate author names were not removed)
  5. Journal Source (SO): about 240
  6. Institution addresses (AD): about 296,000 (geographical distribution was analyzed in detail later)

Parsers like get-unique_years (perl file) and get-frequency-yearwise (perl file) were used to determine the frequency of records pertaining to the above fields.

As a first analysis, journal articles published in three years namely, 1994, 1997 and 2000 were selected. This was done to uncover the pattern of growth over the recent decade and have a manageable dataset to work with initially. The motivation was to discover whether the techniques could faithfully display the trends of growth in the field as compared to the known trends in the domain. The following table presents the year-wise distribution of documents, keywords, etc.:

Table 2:

  Year 1994 1997 2000
  Number of documents 648
778 740
  Number of unique keywords 1244 2324 2269
  Average number of unique keywords 1.92 2.99 3.06

Several documents having zero or one keyword were excluded from the above set of documents. It can be observed that the average number of keywords increases over time while the number of paper and the number of unique keywords roughly stay the same.

The gaps mentioned in the above figures were filled by downloading latest data and appending to the dataset. With a more complete dataset, analysis was carried out again. For the purpose of our analysis, the journals were divided into four groups. The groups comprised of journals selected from particular year-spans. The groups and the number of unique records available for these groups are as follows:

Table 3:

  Year span (Groups) 2000-2002 1990-1992 1980-1982 1970-1972
  Number of documents 5668 4223 2681 1345

The parser get-records-batchwise (perl file) was used to extract the data into the above mentioned groups.

Results:
Set of 25 core journals
- The core journals were further organized into a set having 13 of the original core journals (see listing in Datasets above) and a more comprehensive set of 25 journals. The set of 25 journals that were selected into the core set were as follows:

Journal of comparative psychology, Learning & Memory, Behavioral Ecology, Journal of Insect Behavior, Behavior Research Methods, Behavioral Neuroscience, Ethology Ecology & Evolution, Behavioural Processes, Behavioral Ecology and Sociobiology, Journal of Ethology, Ethology, Applied Animal Behaviour Science, Behavioural Brain Research, Bird Behavior, Behavioral and Brain Sciences, Journal of Experimental Psychology: Animal Behavior Processes, Aggressive Behavior, Animal Learning & Behavior, Physiology & Behavior, Learning and Motivation, Journal of the Experimental Analysis of Behavior, Hormones and Behavior, Behaviour, Animal Behaviour, Behavior Genetics.

Our final dataset covered 25 journals from 1968 to 2002 totaling 42,836 records of published material, see figure .

Trends in publications.
(click to enlarge) Figure 3: Trends in publications in the larger set of core Animal Behavior journals (black dots).
White dots indicate journal count from the set of 13 journals after database was updated for incomplete records.

The number of documents for each year for the journals in the two sets (set of 13 and 25 journals) were determined. The results are linked below:

  1. Year-wise document-frequency for set of 13 journals in the core set at this link
  2. Year-wise document-frequency for set of 25 journals in the core set at this link

VII. Analysis of Journal Coverage

Description: Journal coverage was determined by observing the terms occurring with high frequencies.

Preliminary observations -

Procedure: The words in bold in the above table represent the top keyword for the corresponding journal. The parser get-top-words-journal-wise (perl file) was used to determine the distribution of terms for individual journals.

Results: The first part of the domain analysis focused on identifying the major keywords within the core journals. The top ten keywords based on frequency of occurrence were obtained for the following core journals:

Table 4:

  Journal Keywords
  Journal Of Animal Behavior Behavior, Aggression, Animal-Behavior, Evolution, Male, Female, Sexual Selection, Body-Size, Foraging, Reproductive-Success, Mate-Choice
  Journal Of Bird Behavior Aggression, Adult, Feeding, Male, Reproductive Success, Species Interaction, Vocalization, Brood Parasitism, Brooding, Brood Mates
  Journal Of Behavioral Ecology Behavior, Sexual Selection, Body Size, Predation Risk, Reproductive Success, Mate-Choice, Fitness, Parental Care, Male, Reproduction
  Journal Of Learning And Motivation Behavior, Learning, Neural Coordination, Conditioning, Rat, Conditioned Stimulus, Motivation, Pavlovian-Conditioning
  Journal Of Applied Animal Behavior Science Behavior, Animal Welfare, Animal-Behavior, Animal Husbandry, Stress, Aggression, Housing, Meeting-Abstract, Social-Behavior, Grazing, Feeding
  Journal Of Insect Behavior Oviposition, Female, Foraging-Behavior, Body-Size, Sexual Selection, Foraging, Reproduction, Male, Parasitoid
  Journal Of Behavioral Process Behavior, Learning, Abstract, Animal-Behavior, Nervous System, Reinforcement, Social-Behavior, Memory, Female, Foraging-Behavior, Aggression
  Journal Of Behavioral Ecology And Sociobiology Sexual Selection, Reproductive-Success, Body-Size, Evolution, Female, Aggression, Male, Competition, Reproduction, Sperm Competition
  Journal Of Ethology Female, Aggression, Body-Size, Male, Dominance, Copulation, Spawning, Foraging, Mating-Behavior, Social Behavior

Further, top words for the three years (1994, 1997, 2000) were also determined likewise.

Table 5

  Year Keywords
  1994 Behavior, Animal-Behavior, Animal-Communication, Evolution, Mathematical-Model, Aggression, Foraging, Predation, Seasonality, Learning
  1997 Behavior, Male, Female, Reproduction, Ecology, Adult, Sexual Selection, Animal Husbandry, Evolution
  2000 Sexual Selection, Body-Size, Aggression, Reproductive-Success, Animal-Welfare, Predation-Risk, Territoriality, Mating-System, Mate-Choice, Competition

The words in bold represent the top keywords for the corresponding year group. The parser get-top-words-year-wise (perl file) was used to determine the above distribution.

Journal coverage of wider scope of journals -

Procedure: Word frequencies were determined by isolating unique terms from the keyword fields. For each of the 25 journals, the top ten most frequently occurring words were extracted. A total of 143 different terms were found and only a moderate degree of overlap across the journals was observed.

Results: When the most common keywords for all journals are pooled and ranked, terms referring to aggression, learning/memory, foraging and sexual selection top the list, see figure .

Journal keywords also show the existence of a continuum between serials that emphasize evolutionary ethology and those that publish comparative psychology. On one extreme, Behavioral Ecology and Behavioral Ecology & Sociobiology place an unusually strong emphasis on “sexual selection” and other terms related to reproduction and mating, see figure . Animal Behaviour, Behaviour and Ethology also publish articles on sexual selection, but add social behaviour, predation, foraging, communication (usually “vocalization”), and evolution. At the other extreme, nearly all popular keywords reported by Behavioural Processes and Animal Learning & Behavior address animal learning and memory, see figure .

Interestingly, Applied Animal Behaviour Science lies somewhere in between, including “aggression” and “vocalization” as prominent keywords, but also “stress” and
“motivation”. Most of the journals that appear on our supplemented list of 25, but not in the core list from Animal Behaviour Abstracts, also appear somewhere in the middle. For example, physiology journals like Hormones & Behavior frequently publish studies about “sexual behaviour” and “aggression”, in addition to “stress”, “learning” and “photoperiod”.

The Journal of Comparative Psychology shows an unusually high peak with “animal communication” appearing along side research relating to “habituation”, “development” and “imitative learning”. These journals therefore form an important bridge between the academic descendants of early ethologists and comparative psychologists. They also tend to be more specialized than journals at the extremes, leading to a more highly skewed distribution of keywords and/or a larger proportion of unique terms, see figure . For example, the focus of Applied Animal Behaviour Science is on “animal welfare”, whereas Hormones and Behaviour publishes more papers described by “neuro/endocrinology”.

 


VIII. Analysis of Semantic Document Space

Description: The semantic similarity between the documents representing the records of the journals were determined. Latent semantic analysis (LSA) [2], also called latent semantic indexing, was applied to determine semantic similarity among the documents based on their keywords. LSA extends the vector space model by modeling term-document relationships using a reduced approximation for the column and row space computed by the singular value decomposition of the term by document matrix. The strength of LSA lies in resolving the fundamental issues concerning the conventional lexical matching schemes namely, synonymy (similar meaning words) and polysemy (words with multiple meaning) [3].

Document-Term analysis refers to the application of LSA on the term-by-document (TD) matrix formed from the documents and their terms. The TD matrix has the following format:

  1. Columns are document number or id
  2. Rows are individual terms
  3. Cells of the matrix contain the number of times the particular term (say k-th term) occurs in a document (say m-th document)

Table 6: Term-Document matrix

 
m
Doc 1 Doc 2 ... Doc M
k        
Term 1 2 0    
Term 2 1 2    
:        
Term K        

Singular Value Decomposition (SVD) is a vectors-based model that is used to determine the important latent dimensions. SVD analysis is performed using the LSA SVDPACKC provided by M. Berry [2]. The SVDPACKC takes as input a similarity matrix in the format specified by the Harwell-Boeing format (hbf). Intermediate files provide important latent dimensions. Output of the pack is a similarity matrix that specifies similarity between the row and column entries. These values are normalized values (matrix values divided by the highest value).

Procedure: Data parsing, generation of unique terms and term vs. document frequency matrices, and similarity matrix computations were carried out using code available in the Information Visualization Repository at Indiana University. The LSA SVDPACKC provided by M. Berry [4] was applied to determine the most important latent dimensions. The most significant dimensions obtained for the three years are: 1994 (114 dimensions), 1997 (112 dimensions) and 2000 (114 dimensions).

Visualizations of the semantic relationships among similar documents were generated using the Pajek graph visualization software [1]. The Kamada Kawai algorithm [5] implemented in Pajek was used to layout the documents in a 2-dimensional space.

The parser get-records-batchwise (perl file) was used to extract the keywords from the DE and MI fields.

The semantic analysis was carried out in the following steps:

  1. Extract keywords into a semicolon (';') delimited file
  2. A Java parser was used to convert the data into a sparse format known as Harwell-Boeing format
  3. LSA was applied using the Java package available in SVDPACK to obtain a document-keyword similarity matrix
  4. The similarity matrix was converted into Pajek format file using the parse get-keydoc-to-pajek-format (perl file)
  5. The Pajek format input file was viewed using the Pajek browser
  6. Within the browser, the Kamada-Kawai algorithm was used to obtain the clusters of various documents represented by nodes. Nodes were connected if they were similar based on the keywords as determined by LSA
  7. Initially, owing to very high similarity, the nodes were highly interconnected and resulted in a highly dense network of associations. In order to view only the most prominent clusters of nodes (documents), a weight or threshold value was used to eliminate edges below this threshold-value. This resulted in clusters of nodes

Results: In the 1994 dataset, three clusters were identified, see figure . The first cluster (blue background) deals with documents dealing with parental behavior. The second cluster in red covers animal behavior and learning research. The gray cluster contains documents on feeding behavior.
The 1997 dataset contains three main clusters, containing documents on aggressive social behavior, sexual selection and mating behavior and sexual behavior see figure .
In 2000 only two clusters were identified: mating and foraging behavior as well as nesting behavior, see figure .

The document space was also analyzed using the words derived from the title (TI) field of the records. Titles-words were extracted using the parser get-title-keywords (perl file). Titles include several commonly used words like the articles, etc., which must be excluded from the analysis. These words were removed by referring to the list of stop-words, which is available at this link (text file) . The title analysis was carried out for two spans of years namely 1972-77 and 1996-2001.


IX. Analysis of Co-word Occurrence Keyword Space

Description: Co-word or Term-Term or co-Term analysis focuses on similarity between keyword pairs based on how frequently two or more keywords occur together. Term-Term (TT) similarity matrix is obtained by considering all the unique keywords and the words or terms they appear with in the documents.

The TT matrix has the following format:

  1. Column headers are unique Terms or keywords
  2. Row headers are unique Terms or keywords
  3. Cell values indicate the number of times the k-th term occurs with the m-th term

Table 7: Term-Document matrix

 
m
Term 1 Term 2 ... Term M
k        
Term 1 2 0    
Term 2 1 2    
:        
Term K        

In order to analyze the change in the topics covered by the journals within the various groups, a keyword-keyword matrix was built using words that occurred together more the once. The similarity matrix was then converted into Pajek format using the parser get-keydoc-to-pajek-format (perl file) and visualized within the Pajek browser. The Fruchterman-Reingold 2D-algorithm [6] was employed to visualize the network of keywords. Edges represented associations between keywords that occurred together. A threshold was used to obtain the prominent clusters.

Results: The keyword, BEHAVIOR emerged as the central, highly interconnected node, bridging different characteristics of animals like Aggression, Mating, Welfare in all the three time slices.

For the year 1994, the most dominant areas of study were identified by: Animal Communication, Evolution and Foraging, see figure .

Similarly, for the year 1997, the most dominant areas of study were identified by: Reproduction and Evolution, see figure .

In 2000, the study of Sexual Selection dominated the research. Some of the co-occurring keywords also described the study of Natural Selection, Mating Success, and Courtship, see figure .


X. Burst Analysis

Description: Burst analysis [7] is a technique that aims to analyze documents to find features that have high intensity over finite/limited duration of time periods . Rather than using plain frequencies of the occurrences of words, the algorithm employs a probabilistic automaton whose states correspond to the frequencies of individual words. State transitions correspond to points in time around which the frequency of the word changes significantly. Details of using this algorithm are available at this link (text file) .

Procedure: The terms occurring in the title were used for burst analysis. The terms were isolated using this parser p_extract_burst_title_kywds (perl file). As applied here, the burst detection algorithm focuses on the temporal intervals between repeated appearances of the same term. When a term is popular, it will be used frequently and the time intervals between repeated appearances will be short. The two-state form of the burst detection algorithm finds the model that best describes the data as a collection of temporal strings of high (i.e., bursts) and low episodes of popularity for each of the terms studied. ‘Weights’ are also calculated to allow for direct comparison among bursts for the same and different words in terms of their relative prominence.

Words such as “the” and “an” (and other such words given in this link (text file) ) were excluded to obtain a total of 24,850 unique title-words. We further focused our attention on the 739 title-words that appear at least 100 times in the dataset. To these, we applied the burst detection algorithm, looking across the full 35 years of publications, to identify rapid increases and decreases in popularity (bursts) for each term through time.

The input for the burst detection algorithm was generated p_extract_burst_input (perl file). Other parsers were used to generate keyword and year-wise frequencies. These parsers are available in this folder .

Results: the burst detection algorithm identified 506 bursts of popular title-words across 35 years. Bursts were regularly spaced, lasting a median of 4 years (mean±SE = 5.6±0.19). There were 470 title-words such as “effect”, “role” or “difference” that are difficult to interpret further. If we focus on the remaining 269 terms, there appears to be three vocabulary periods: pre-1985, 1985-1995, and post-1995, see figure .

Of the 269 potentially meaningful terms, 200 reflect major topics of interest in animal behavior research (e.g., “operant”, “evolution”, “predation”). The words occurring within each of the three time periods cross disciplinary boundaries, indicating the continuing diversity of animal behavior research throughout the history of our field. The early periods, for example, show bursts from “shock”, “reinforcement”, “natal” and “testosterone”. “Guarding”, “genetic”, “anxiety”, and “opioid” all burst during the transition time period (1985-1995). “Receptor”, “anxiety”, “paternity” and “mate” all burst in the most recent time interval.

The 69 remaining terms refer to a type of animal (e.g., “rats” appears 5,550 times; see figure ), and also reflect some meaningful shifts over the years. Before 1985, virtually all animal terms undergoing bursts of popularity are model organisms, including cats, monkeys, squirrels (which could also be ‘squirrel monkeys’), and chickens (Fig. 3b). In the 1985-1995 transition period, there are several bursts referring to insects (especially hymenoptera (e.g., bees, wasps, and ants) and orthoptera (e.g., crickets, grasshoppers and katydids)) that were not abundant earlier. In the mid-1990s, there is a sudden surge of interest in a more diverse group of domesticated animals and animals of economic importance (e.g., dogs, cows, deer).


XI. Geographical and Institutional Patterns

Description: This section of work focused on most current publishing trends across the world. By focusing on contents published in most popular journals of the animal behavior domain, the goal was to identify if there exists any common pattern in the research at a global level. Further the study was extended to identify most active institutions of North America in animal behavior domain.

Procedure: Latest subset of data namely for years 2001-2002 was used for data analysis.  Further all publication records were split into two datasets: 13 core-journals and 25 all-journals. Both datasets, one comprising of core-journals (670 records) while the other consisting of all 25 journals (1806 records) were analyzed separately to identify global contribution towards journals. Using five digit zip-code pattern available for North America as markers, the paper coverage in both datasets was initially split into two categories: 1) US and 2) Non-US publications. Excluding records with missing country name abbreviations, all other country names from Non-US dataset were split into seven identifiable regions on a global topology. So for both datasets, eight geographical locations (including North America) identified are shown here,

The 13 core-journals and 25 all journal datasets was individually split into these eight geographical locations. Keywords from these individual regions were used to identify research trends across these eight identified regions. Further zip-code was used as tags to identify US institutions contributing to core-journals and all 25 journals within the animal behavior domain. Frequency count of zip-codes was calculated to determine top 25 US institutions from both the datasets. A similar analysis could not be replicated at a global level, as zip (or post) codes in many countries are less likely to be specific to particular institutions. In addition, variation in abbreviated names of institutional names and its location within the address string proved to be difficult to be isolated for the study.

Parsers were used to isolate records for US-based institutions and obtain information pertaining to specific zip-codes and/or institutions. The parsers are available in this folder . The parser p_usa_inst.pl(perl file) was used to segregate records based on Us or Non-Us origin. Zip code frequencies were determined using the parser p_zipCodeFreq.pl (perl file).

Results:
Region-wise distribution
: In 2001–2002, animal behavior publications were produced by researchers affiliated with institutions in every region of the world, see figure , with North America and Western Europe being the primary producers of animal behavior research. Keywords used by North America, Western Europe and Australia/New Zealand were remarkably similar, reflecting global agreement on popular topic areas in animal behavior, see figure .

The relatively few contributions from remaining regions (South America, Eastern Europe, Asia and Africa) shared an emphasis on animal learning and domesticated animals, indicating strong interest in applied animal behavior research. Relative representation by South America and Asia was larger when considering all 25 journals, whereas Africa, Middle East and Eastern Europe were better represented when only core ABA serials were considered.

Zipcode-wise distribution (US only): In the United States alone, we counted 1806 publications from first authors located at more than 487 different zip codes during the 2001–2002 period. Over 100 zip codes tallied five or more animal behavior publications, although it is possible that some zip codes do not identify unique institutions (e.g. some could be personal residences). Also, the major representation of some institutions may result from an unusually large number of contributions by single investigators in this 2-year period. The number of documents associated with each zip code also varied between the two lists (13 core journals and complete list of 25). Nevertheless, several institutions known to have larger graduate programs or a greater number of animal behavior researchers account for a disproportionate share of publications by geographical location within the U.S., statistics available at this link.


References

  1. Batagelj, V. & A. Mrvar Pajek: Program Package for Large Network Analysis. University of Ljubljana. Slovenia
  2. Landauer, T.K., Foltz, P.W. And Laham, D. 1998. Introduction To Latent Semantic Analysis. Discourse Processes, 25, 259-284
  3. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K. And Harshman, R. 1990. Indexing By Latent Semantic Analysis. Journal Of The American Society For Information Science, 41 (6), 391-407
  4. Berry, M. 1993. Svdpackc (Version 1.0) User's Guide, University Of Tennessee Tech. Available Online At Http://Www.Netlib.Org/Svdpack/Index.Html
  5. Kamada, T. And Kawai, S. 1989. An Algorithm For Drawing General Undirected Graphs. Information Processing Letters, 31 (1), 7-15
  6. Thomas M. J. Fruchterman And Edward M. Reingold. 1991. Graph Drawing By Force Directed Placement. Software: Practice And Experience, 21, (11)
  7. J. Kleinberg. Bursty and Hierarchical Structure in Streams. Proc. 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2002