Non-Curated Distributed Databases for Experimental Data and Models
Most databases currently under development in neuroscience are heavily curated, in that substantial effort is required by database builders to read the literature and enter the distilled information in a database. This provides a valuable resource, but may be too labor intensive to scale up to the huge volumes of data being generated at present. Here we consider the opposite end of the databasing spectrum at which there is no curation, minimal standardization, and where data remains under the control (intellectual and physical) of those who collected it. The Web in conjunction with a search engine such as Google takes non-curated distributed databasing to the extreme. Our approach is to impose slightly stricter constraints, specifically the requirement for some form of machine readable records for each data file, in order to improve the efficiency of cataloging and searching. The first goal is to find a level of standardization which is adequate for automated cataloging and searching, without being so strict as to discourage potential data providers. The second goal is to design the system in such a way that intellectual property remains clearly in the hands of the data providers, and is not transferred to database maintainers. Most of the features commonly associated with curated databases, such as quality control and security can also be developed for non-curated databases though by quite different mechanisms. Here we describe the design and implementation of a pilot scheme focussed on models, code modules and already standardized data such as neuronal morphology files, but with applications to much more heterogeneous data sets