OAS: Observed Antibody Space

About OAS

The Observed Antibody Space database (OAS) collates over a billion unique antibody sequences from over 80 different studies. We make the data available for bulk download or you can filter the sequences with respect to certain metadata parameters using our search form, which can be found here.

OAS can be filtered according to attributes such as species, isotype, chain type, etc. The fields are non-exclusive, meaning that the user could choose a combination of fields that does not exist in our database (for instance specifying isotype and light chain, which is impossible).

Data and Downloads

Data in the OAS database is organized into studies that are in turn sub-divided into data-units. A single data-unit is a set of sequences uniquely identified by its metadata. The range of meta-parameters are:

Author First author and date of publication.
Link Link to the publication with the study.
Run Run ID the sequence is derived from.
Subject Indicates whether the B-cells can be tracked back to a particular individual.
Species Organism of the B-cell donor.
Chain Heavy/light chain annotation.
Isotype Identified isotype information.
Age Information on age of the human B-cell donors.
Disease Indicates whether the donor was sick at the time of B-cell extraction.
Vaccine Indicates whether the B-cell donor was purposely immunized prior to B-cell extraction.
B-cell subset Indicates whether a particular B-cell subset was sorted for Ig-seq.
B-cell source Which organ/tissue the B-cells were extracted from.
Longitudinal If the study is conducted over a period of time, indicates the particular timepoint when B-cells were sourced.
Total sequences Number of redundant sequences in the data-unit.
Unique sequences Number of non-redundant sequences in the data-unit.

After searching, there is an option to download all data-units that match your criteria. This is in the form of a shell script containing consecutive wget commands. To download all the data you can run the following command (you might want to download sets of data-units at a time, as the total size can exceed 700GB):

chmod u+rx bulk_download.sh
./bulk_download.sh

A series of .csv.gz files will then be downloaded to your current directory. Each .csv.gz file contains as the first line the metadata for the data-unit and the following lines each sequence and its annotations.

The contents of each data-unit file can look as below:

sequence	locus	stop_codon	vj_in_frame	productive	rev_comp	v_call	d_call	j_call	sequence_alignment	germline_alignment	sequence_alignment_aa	germline_alignment_aa	v_alignment _start	v_alignment _end	d_alignment _start	d_alignment _end	j_alignment _start	j_alignment _end	v_sequence_alignment	v_sequence_alignment_aa	v_germline_alignment	v_germline_alignment_aa	d_sequence_alignment	d_sequence_alignment _aa	d_germline_alignment	d_germline_alignment _aa	j_sequence_alignment	j_sequence_alignment_aa	j_germline_alignment	j_germline_alignment_aa	fwr1	fwr1_aa	cdr1	cdr1_aa	fwr2	fwr2_aa	cdr2	cdr2_aa	fwr3	fwr3_aa	cdr3	cdr3_aa	junction	junction _length	junction_aa	junction_aa _length	v_score	d_score	j_score	v_cigar	d_cigar	j_cigar	v_support	d_support	j_support	v_identity	d_identity	j_identity	v_sequence _start	v_sequence _end	v_germline _start	v_germline _end	d_sequence _start	d_sequence _end	d_germline _start	d_germline _end	j_sequence _start	j_sequence _end	j_germline _start	j_germline _end	fwr1_start	fwr1_end	cdr1_start	cdr1_end	fwr2_start	fwr2_end	cdr2_start	cdr2_end	fwr3_start	fwr3_end	cdr3_start	cdr3_end	np1	np1_length	np2	np2_length	c_region	Redundancy	ANARCI_numbering	ANARCI_status
ACGGAGGTTTCT...	IGH	F	T	T	F	IGHV9-3*01	IGHD2-4*01	IGHJ3*01	CAGATCCAGTTGG...	CAGATCCAGTTG...	QIQLVQSGPELKKPG...	QIQLVQSGPELKKPG...	1.0	290.0	295.0	303.0	306.0	349.0	CAGATCCAGTTGGT...	QIQLVQSGPELKKPGETV...	CAGATCCAGTTGGT...	QIQLVQSGPELKKPGE...	GATTACGAC	DYD	GATTACGAC	DYD	GTTTGCTTACTGGG...	FAYWGQGTLVTVSA	GTTTGCTTACTGGG...	FAYWGQGTLVTVSA	CAGATCCAG...	QIQLVQ...	GGGTATAC...	GYTFTTYG	ATGAGCT...	MSWVKQ...	ATAAAC...	INTYSGVP	ACATATG...	TYVDDFK...	GCCCCC...	APDYDEFAY	TGTGCCC...	33.0	CAPDYDE...	11.0	450.572	17.992	85.286	309S290M150S4N	603S8N9M137S	614S4N44M91S	2.040000e-128	0.839300	1.019000e-20	99.655	100.0	100.0	310.0	599.0	1.0	290.0	604.0	612.0	9.0	17.0	615.0	658.0	5.0	48.0	310.0	384.0	385.0	408.0	409.0	459.0	460.0	483.0	484.0	597.0	598.0	624.0	CCCC	4.0	GA	2.0	GAGCGCGCG	7	{'fwh1': {'2': 'I', '3': 'Q' ...}, 'cdrh1': {'27': 'G', '28': 'Y',...} ...'cdrh3': {'105': 'A', '106': 'P', ...},	\|\|\|\|Shorter than IMGT defined: fw1, fw4\|
TGAAAACAACCT...	IGH	F	T	T	F	IGHV1-81*01	IGHD1-1*01	IGHJ1*03	CAGGTTCAGCTGC...	CAGGTTCAGCT...	QVQLQQSGAELARP...	QVQLQQSGAELARP...	1.0	294.0	305.0	318.0	322.0	373.0	CAGGTTCAGCTGC...	QVQLQQSGAELARPGAS...	CAGGTTCAGCTGCA...	QVQLQQSGAELARPGA...	TTTATTACTAC...	YYYG	TTTATTACTACGGT	YYYG	TACTGGTACTTCGAT...	YWYFDVWGTGTTVTVSS	TACTGGTACTTCGA...	YWYFDVWGTGTTVTVSS	CAGGTTCAG...	QVQLQ...	GGCTACAC...	GYTFTSYG	ATAAGCT...	ISWVKQR...	ATTTATC...	IYLRSGNT	TACTACA...	YYNEKFK...	GCAAGA...	ARWERFY...	TGTGCAA...	57.0	CARWERFYY...	19.0	456.805	27.605	100.667	115S294M170S	419S14M146S9N	436S1N52M91S	2.073000e-130	0.000825	1.897000e-25	99.660	100.0	100.0	116.0	409.0	1.0	294.0	420.0	433.0	1.0	14.0	437.0	488.0	2.0	53.0	116.0	190.0	191.0	214.0	215.0	265.0	266.0	289.0	290.0	403.0	404.0	454.0	TGGGAG...	10.0	TCT	3.0	GAGCGCGCG	2223	{'fwh1': {'1 ': 'E', '2 ': 'V', '3 ': 'Q', '4': ... }, 'cdrh1': {'27': 'G', '28': 'Y', '29': ...} ... cdrh3': {'105': 'A', '106': 'R', '107': ...}	\|\|\|\|Shorter than IMGT defined: fw4\|

A convenient way to parse the csv.gz files, is to use the following code to extract the metadata as a json object and the sequences as a pandas object.

Downloading using Safari: In some cases Safari automatically unzips the .csv.gz files when downloading. This automatic unzipping can cause problems with the downloaded file and it is therefore recommended to either turn off automatic unzipping or use a different browser (e.g. Firefox or Chrome).

Contact

If you would like to contact us about anything related to OAS, please drop an email to oas_opig@stats.ox.ac.uk.

Unpaired OAS Help

About OAS

Data and Downloads

Contact