세션만료알림

후 자동 로그아웃됩니다.
세션을 초기화하시겠습니까?

본문으로 바로가기 본문으로 바로가기 주메뉴 바로가기

KAGGLE

검색어 선택항목

결과내 키워드 검색

검색어

등록년도

  • ~

자료유형

국내외구분

데이터 제공처

데이터 리포지터리

주제분류

+ 더보기
주제분류
  •  
  •  
적용

접근권한

파일 다운로드 방법

라이센스유형

작성언어

검색결과 영역

검색어  > 

검색 결과를 데이터셋, 표/그림, 소프트웨어로 구분하여 확인할 수 있습니다.

데이터셋

66건
  • 2021 해외 공개 English
    Pakistan Drone Attacks
    • 데이터 제공처 국가연구데이터플랫폼
    • 데이터 리포지터리
    • 생성자 Zeeshan-ul-hassan Usmani;
    • 과제명
    • 과제책임자
    • 과제수행기관
    • 부처
    • 라이센스유형 CC-BY;
    • 주제분류 정치/행정; 지리/지역/관광; 사회/인류/복지/여성;

    Context Pakistan Drone Attacks (2004-2016) The United States has targeted militants in the Federally Administered Tribal Areas [FATA] and the province of Khyber Pakhtunkhwa [KPK] in Pakistan via its Predator and Reaper drone strikes since year 2004. Pakistan Body Count (www.PakistanBodyCount.org) is the oldest and most accurate running tally of drone strikes in Pakistan. The given database (PakistanDroneAttacks.CSV) has been populated by using majority of the data from Pakistan Body Count, and building up on it by canvassing open source newspapers, media reports, think tank analyses, and personal contacts in media and law enforcement agencies. We provide a count of the people killed and injured in drone strikes, including the ones who died later in hospitals or homes due to injuries caused or aggravated by drone strikes, making it the most authentic source for drone related data in this region. We will keep releasing the updates every quarter at this page. Content Geography: Pakistan Time period: 2004-2016 Unit of analysis: Attack Dataset: The dataset contains detailed information of 397 drone attacks in Pakistan that killed an estimated 3,558 and injured 1,333 people including 2,539 civilians. Variables: The dataset contains Serial No, Incident Day & Date, Approximate Time of the attack, Specific Location, City, Province, Number of people killed who claimed to be from Al-Qaeeda, Number of people killed who claimed to be from Taliban, minimum and maximum count of foreigners killed, minimum and maximum count of civilians killed, minimum and maximum count of civilians injured, special mention (more details) and comments about the attack, longitude and latitude of the location. Sources: Unclassified media articles, hospital reports, think tank analysis and reports, and government official press releases. Acknowledgements & References Pakistan Body Count has been leveraged extensively in scholarly publications, reports, media articles and books. The website and the dataset has been collected and curated by the founder Zeeshan-ul-hassan Usmani. Users are allowed to use, copy, distribute and cite the dataset as follows: “Zeeshan-ul-hassan Usmani, Pakistan Body Count, Drone Attacks Dataset, Kaggle Dataset Repository, Jan 25, 2017.” Past Research Zeeshan-ul-hassan Usmani and Hira Bashir, “The Impact of Drone Strikes in Pakistan”, Cost of War Project, Brown University, December 16, 2014 Inspiration Some ideas worth exploring: • How many people got killed and injured per year in last 12 years? • How many attacks involved killing of actual terrorists from Al-Qaeeda and Taliban? • How many attacks involved women and children? • Visualize drone attacks on timeline • Find out any correlation with number of drone attacks with specific date and time, for example, do we have more drone attacks in September? • Find out any correlation with drone attacks and major global events (US funding to Pakistan and/or Afghanistan, Friendly talks with terrorist outfits by local or foreign government?) • The number of drone attacks in Bush Vs Obama tenure? • The number of drone attacks versus the global increase/decrease in terrorism? • Correlation between number of drone strikes and suicide bombings in Pakistan Questions? For detailed visit www.PakistanBodyCount.org Or contact Pakistan Body Count staff at info@pakistanbodycount.org

  • 2021 해외 공개 English
    120 years of Olympic history: athletes and results
    • 데이터 제공처 국가연구데이터플랫폼
    • 데이터 리포지터리
    • 생성자 Randi H Griffin;
    • 과제명
    • 과제책임자
    • 과제수행기관
    • 부처
    • 라이센스유형 CC-BY;
    • 주제분류 NONE;

    Context This is a historical dataset on the modern Olympic Games, including all the Games from Athens 1896 to Rio 2016. I scraped this data from www.sports-reference.com in May 2018. The R code I used to scrape and wrangle the data is on GitHub. I recommend checking my kernel before starting your own analysis. Note that the Winter and Summer Games were held in the same year up until 1992. After that, they staggered them such that Winter Games occur on a four year cycle starting with 1994, then Summer in 1996, then Winter in 1998, and so on. A common mistake people make when analyzing this data is to assume that the Summer and Winter Games have always been staggered. Content The file athlete_events.csv contains 271116 rows and 15 columns. Each row corresponds to an individual athlete competing in an individual Olympic event (athlete-events). The columns are: ID - Unique number for each athlete Name - Athlete's name Sex - M or F Age - Integer Height - In centimeters Weight - In kilograms Team - Team name NOC - National Olympic Committee 3-letter code Games - Year and season Year - Integer Season - Summer or Winter City - Host city Sport - Sport Event - Event Medal - Gold, Silver, Bronze, or NA Acknowledgements The Olympic data on www.sports-reference.com is the result of an incredible amount of research by a group of Olympic history enthusiasts and self-proclaimed 'statistorians'. Check out their blog for more information. All I did was consolidated their decades of work into a convenient format for data analysis. Inspiration This dataset provides an opportunity to ask questions about how the Olympics have evolved over time, including questions about the participation and performance of women, different nations, and different sports and events.

  • 2021 해외 공개 English
    FitBit Fitness Tracker Data
    • 데이터 제공처 국가연구데이터플랫폼
    • 데이터 리포지터리
    • 생성자 Mobius;
    • 과제명
    • 과제책임자
    • 과제수행기관
    • 부처
    • 라이센스유형 CC-BY;
    • 주제분류 보건의료; 문화/예술/체육;

    Pattern recognition with tracker data: : Improve Your Overall Health Content This dataset generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. Individual reports can be parsed by export session ID (column A) or timestamp (column B). Variation between output represents use of different types of Fitbit trackers and individual tracking behaviors / preferences. Starter Kernel(s) - Julen Aranguren: https://www.kaggle.com/julenaranguren/bellabeat-case-study - Anastasiia Chebotina: https://www.kaggle.com/chebotinaa/bellabeat-case-study-with-r Inspiration - Human temporal routine behavioral analysis and pattern recognition Acknowlegement Furberg, Robert; Brinton, Julia; Keating, Michael ; Ortiz, Alexa https://zenodo.org/record/53894#.YMoUpnVKiP9 Some readings - How I analyzed the data from my FitBit to improve my overall health(https://www.freecodecamp.org/news/how-i-analyzed-the-data-from-my-fitbit-to-improve-my-overall-health-a2e36426d8f9/) - How can data from fitness trackers be obtained and analyzed with a forensic approach?(https://conferences.computer.org/eurosp/pdfs/EuroSPW2020-7k9FlVRX4z43j4uE2SeXU0/859700a499/859700a499.pdf)

  • 2021 해외 공개 English
    My Complete Genome
    • 데이터 제공처 국가연구데이터플랫폼
    • 데이터 리포지터리
    • 생성자 Zeeshan-ul-hassan Usmani;
    • 과제명
    • 과제책임자
    • 과제수행기관
    • 부처
    • 라이센스유형 CC-BY;
    • 주제분류 NONE;

    Context Zeeshan-ul-hassan Usmani’s Genome Phenotype SNPs Raw Data Genomics is a branch of molecular biology that involves structure, function, variation, evolution and mapping of genomes. There are several companies offering next generation sequencing of human genomes from complete 3 billion base-pairs to a few thousand Phenotype SNPs. I’ve used 23andMe (using Illumina HumanOmniExpress-24) for my DNA’s Phenotype SNPs. I am sharing the entire raw dataset here for the international research community for following reasons: I am a firm believer in open dataset, transparency, and the right to learn, research, explores, and educate. I do not want to restrict the knowledge flow for mere privacy concerns. Hence, I am offering my entire DNA raw data for the world to use for research without worrying about privacy. I call it copyleft dataset. Most of available test datasets for research come from western world and we don’t see much from under-developing countries. I thought to share my data to bridge the gap and I expect others to follow the trend. I would be the happiest man on earth, if a life can be saved, knowledge can be learned, an idea can be explore, or a fact can be found using my DNA data. Please use it the way you will Content Name: Zeeshan-ul-hassan Usmani Age: 38 Years Country of Birth: Pakistan Country of Ancestors: India (Utter Pradesh - UP) File: GenomeZeeshanUsmani.csv Size: 15 MB Sources: 23andMe Personalized Genome Report The research community is still progressively working in this domain and it is agreed upon by professionals that genomics is still in its infancy. You now have the chance to explore this novel domain via the dataset and become one of the few genomics early adopters. The data-set is a complete genome extracted from www.23andme.com and is represented as a sequence of SNPs represented by the following symbols: A (adenine), C (cytosine), G (guanine), T (thymine), D (base deletions), I (base insertions), and '_' or '-' if the SNP for particular location is not accessible. It contains Chromosomes 1-22, X, Y, and mitochondrial DNA. A complete list of the exact SNPs (base pairs) available and their data-set index can be found at https://api.23andme.com/res/txt/snps.b4e00fe1db50.data For more information about how the data-set was extracted follow https://api.23andme.com/docs/reference/#genomes Moreover, for a more detailed understanding of the data-set content please acquaint yourself with the description of https://api.23andme.com/docs/reference/#genotypes Acknowledgements Users are allowed to use, copy, distribute and cite the dataset as follows: “Zeeshan-ul-hassan Usmani, Genome Phenotype SNPS Raw Data File by 23andMe, Kaggle Dataset Repository, Jan 25, 2017.” Useful Links You may use the following human genome database sites for help: GenBank - https://www.ncbi.nlm.nih.gov/genbank/ The Human Genome Project - https://www.genome.gov/hgp/ Genomes OnLine Database (GOLD) - https://gold.jgi.doe.gov Complete Genomics - http://www.completegenomics.com/public-data/ Inspiration Some ideas worth exploring: Is the individual in question more susceptible to cancer? Does he tend to gain weight? Where is his place of origin? Which gene determines certain biological feature (cancer susceptibility, fat generation rate, hair color etc. How does this phenotype SNPs compare with other similar datasets from the western-world? What would be the likely cause of death for this person? What are the most likely diseases/illnesses this person is going to face in lifetime? What is unique about this dataset? What else you can extract from this dataset when it comes to personal trait, intelligence level, ancestry and body makeup? Sample Reports Please check out following reports to understand what can be done with this data Ancestry - https://www.23andme.com/published-report/eeb4f9bbd6b5474f/?share_id=f6c5562848e84586 Weight Report - https://you.23andme.com/published/reports/65c9af9f8223456d/?share_id=0126f129e4f3458b

  • 2021 해외 공개 English
    Quality Prediction in a Mining Process
    • 데이터 제공처 국가연구데이터플랫폼
    • 데이터 리포지터리
    • 생성자 EduardoMagalhãesOliveira;
    • 과제명
    • 과제책임자
    • 과제수행기관
    • 부처
    • 라이센스유형 CC-BY;
    • 주제분류 에너지/자원;

    Explore real industrial data and help manufacturing plants to be more efficient Context It is not always easy to find databases from real world manufacturing plants, specially mining plants. So, I would like to share this database with the community, which comes from one of the most important parts of a mining process: a flotation plant! PLEASE HELP ME GET MORE DATASETS LIKE THIS FILLING A 30s SURVEY: https://airtable.com/shrJM8TYzNEMNALCv The main goal is to use this data to predict how much impurity is in the ore concentrate. As this impurity is measured every hour, if we can predict how much silica (impurity) is in the ore concentrate, we can help the engineers, giving them early information to take actions (empowering!). Hence, they will be able to take corrective actions in advance (reduce impurity, if it is the case) and also help the environment (reducing the amount of ore that goes to tailings as you reduce silica in the ore concentrate). Content The first column shows time and date range (from march of 2017 until september of 2017). Some columns were sampled every 20 second. Others were sampled on a hourly base. The second and third columns are quality measures of the iron ore pulp right before it is fed into the flotation plant. Column 4 until column 8 are the most important variables that impact in the ore quality in the end of the process. From column 9 until column 22, we can see process data (level and air flow inside the flotation columns, which also impact in ore quality. The last two columns are the final iron ore pulp quality measurement from the lab. Target is to predict the last column, which is the % of silica in the iron ore concentrate. Inspiration I have been working in this dataset for at least six months and would like to see if the community can help to answer the following questions: - Is it possible to predict % Silica Concentrate every minute? - How many steps (hours) ahead can we predict % Silica in Concentrate? This would help engineers to act in predictive and optimized way, mitigatin the % of iron that could have gone to tailings. - Is it possible to predict % Silica in Concentrate whitout using % Iron Concentrate column (as they are highly correlated)?

표/그림

0건


검색결과가 없습니다.

소프트웨어

0건


검색결과가 없습니다.