
후 자동 로그아웃됩니다.
세션을 초기화하시겠습니까?

본문으로 바로가기 본문으로 바로가기 주메뉴 바로가기

해당 데이터에 오류가 발견되면 오류신고해주세요.
데이터 정보
오류신고를 진행 하실 데이터 정보를 담은 표입니다.
제목(Main) My Complete Genome
저자 Zeeshan-ul-hassan Usmani;
제공처 국가연구데이터플랫폼 
리포지터리 국가연구데이터플랫폼 
접수 정보
오류신고 접수 정보를 담은 표이며, 메일주소, 오류내용을 입력합니다.
오류신고 접수 정보를 담은 표이며, 메일주소, 오류내용을 입력합니다.
오류 구분
  • 개인정보 노출방지를 위해 개인정보 내용은 가급적 자제하여 주시기 바랍니다.
  • 일방적인 욕설 및 부정적인 내용 작성시 원작자의 판단에 따라 신고자에게 피해가 발생할 수 있습니다. 깨끗하고 청렴한 서비스 문화를 위해 필요한 정보만 기재해주시면 감사하겠습니다.
    2021 해외 공개 CC-BY English

My Complete Genome

My Complete Genome Zeeshan-ul-hassan Usmani;
Zeeshan-ul-hassan Usmani’s Genome Phenotype SNPs Raw Data

Genomics is a branch of molecular biology that involves structure, function, variation, evolution and mapping of genomes. There are several companies offering next generation sequencing of human genomes from complete 3 billion base-pairs to a few thousand Phenotype SNPs. I’ve used 23andMe (using Illumina HumanOmniExpress-24) for my DNA’s Phenotype SNPs. I am sharing the entire raw dataset here for the international research community for following reasons:

I am a firm believer in open dataset, transparency, and the right to learn, research, explores, and educate. I do not want to restrict the knowledge flow for mere privacy concerns. Hence, I am offering my entire DNA raw data for the world to use for research without worrying about privacy. I call it copyleft dataset.

Most of available test datasets for research come from western world and we don’t see much from under-developing countries. I thought to share my data to bridge the gap and I expect others to follow the trend.

I would be the happiest man on earth, if a life can be saved, knowledge can be learned, an idea can be explore, or a fact can be found using my DNA data. Please use it the way you will

Name: Zeeshan-ul-hassan Usmani

Age: 38 Years

Country of Birth: Pakistan

Country of Ancestors: India (Utter Pradesh - UP)

File: GenomeZeeshanUsmani.csv

Size: 15 MB

Sources: 23andMe Personalized Genome Report

The research community is still progressively working in this domain and it is agreed upon by professionals that genomics is still in its infancy. You now have the chance to explore this novel domain via the dataset and become one of the few genomics early adopters.

The data-set is a complete genome extracted from www.23andme.com and is represented as a sequence of SNPs represented by the following symbols: A (adenine), C (cytosine), G (guanine), T (thymine), D (base deletions), I (base insertions), and '_' or '-' if the SNP for particular location is not accessible. It contains Chromosomes 1-22, X, Y, and mitochondrial DNA.

A complete list of the exact SNPs (base pairs) available and their data-set index can be found at

For more information about how the data-set was extracted follow https://api.23andme.com/docs/reference/#genomes

Moreover, for a more detailed understanding of the data-set content please acquaint yourself with the description of https://api.23andme.com/docs/reference/#genotypes

Users are allowed to use, copy, distribute and cite the dataset as follows: “Zeeshan-ul-hassan Usmani, Genome Phenotype SNPS Raw Data File by 23andMe, Kaggle Dataset Repository, Jan 25, 2017.”

Useful Links
You may use the following human genome database sites for help:

GenBank - https://www.ncbi.nlm.nih.gov/genbank/
The Human Genome Project - https://www.genome.gov/hgp/
Genomes OnLine Database (GOLD) - https://gold.jgi.doe.gov
Complete Genomics - http://www.completegenomics.com/public-data/

Some ideas worth exploring:

Is the individual in question more susceptible to cancer?
Does he tend to gain weight?
Where is his place of origin?
Which gene determines certain biological feature (cancer susceptibility, fat generation rate, hair color etc.
How does this phenotype SNPs compare with other similar datasets from the western-world?
What would be the likely cause of death for this person?
What are the most likely diseases/illnesses this person is going to face in lifetime?
What is unique about this dataset?
What else you can extract from this dataset when it comes to personal trait, intelligence level, ancestry and body makeup?

Sample Reports
Please check out following reports to understand what can be done with this data

Ancestry -

Weight Report -
  • #biology
  • #genetics

데이터 생성 이력정보

  • 데이터등록일 : 2021-10-28
  • 엠바고일 : ~ 2022-02-11

특성 정보

  • 주제분류 = 생명과학
특성정보는 제공처로부터 수집된 데이터이며, DataON에서 제공하는 이외의 정보를 담고 있습니다.

데이터셋 의미 관계 정보

의미관계가 형성된 정보를 클릭하면 통합검색 결과로 이동합니다.
본 서비스는 크로미움(Chromium)기반의 브라우저에서만 제공됩니다.

관련 과제/논문 정보

관련 과제/논문 정보는 데이터 등록자 또는 이용자가 추천한 정보가 제공됩니다.
  • 유발연구데이터가 유발된 과제/논문 정보입니다.
  • 관련연구데이터 생산에 참고된 관련 과제/논문 정보입니다.
과제명, 논문명을 클릭하면 해당 과제와 논문의 상세정보를 확인할 수 있습니다.
  • 875 조회수
  • 0 다운로드수
  • 추천수 0
  • 공유수 0
  • 인용횟수 0
  자세히 보기 복사