검색결과 영역
- 검색어 >
검색 결과를 데이터셋, 표/그림, 소프트웨어로 구분하여 확인할 수 있습니다.
데이터셋
68건-
2020
해외
공개
Korean
COVID-19 in India- 데이터 제공처 코비드-19
- 데이터 리포지터리
- 생성자
- 과제명
- 과제책임자
- 과제수행기관
- 부처
- 라이센스유형 CC-BY;
- 주제분류 보건의료;
- 인용횟수 0
Dataset on novel Covid-19 in India Coronaviruses are a large family of viruses which may cause illness in animals or humans. In humans, several coronaviruses are known to cause respiratory infections ranging from the common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS). The most recently discovered coronavirus causes coronavirus disease COVID-19 - World Health Organization -
2021
해외
공개
English
Pakistan Drone Attacks- 데이터 제공처 국가연구데이터플랫폼
- 데이터 리포지터리
- 생성자 Zeeshan-ul-hassan Usmani;
- 과제명
- 과제책임자
- 과제수행기관
- 부처
- 라이센스유형 CC-BY;
- 주제분류 정치/행정; 지리/지역/관광; 사회/인류/복지/여성;
- 인용횟수 0
Context Pakistan Drone Attacks (2004-2016) The United States has targeted militants in the Federally Administered Tribal Areas [FATA] and the province of Khyber Pakhtunkhwa [KPK] in Pakistan via its Predator and Reaper drone strikes since year 2004. Pakistan Body Count (www.PakistanBodyCount.org) is the oldest and most accurate running tally of drone strikes in Pakistan. The given database (PakistanDroneAttacks.CSV) has been populated by using majority of the data from Pakistan Body Count, and building up on it by canvassing open source newspapers, media reports, think tank analyses, and personal contacts in media and law enforcement agencies. We provide a count of the people killed and injured in drone strikes, including the ones who died later in hospitals or homes due to injuries caused or aggravated by drone strikes, making it the most authentic source for drone related data in this region. We will keep releasing the updates every quarter at this page. Content Geography: Pakistan Time period: 2004-2016 Unit of analysis: Attack Dataset: The dataset contains detailed information of 397 drone attacks in Pakistan that killed an estimated 3,558 and injured 1,333 people including 2,539 civilians. Variables: The dataset contains Serial No, Incident Day & Date, Approximate Time of the attack, Specific Location, City, Province, Number of people killed who claimed to be from Al-Qaeeda, Number of people killed who claimed to be from Taliban, minimum and maximum count of foreigners killed, minimum and maximum count of civilians killed, minimum and maximum count of civilians injured, special mention (more details) and comments about the attack, longitude and latitude of the location. Sources: Unclassified media articles, hospital reports, think tank analysis and reports, and government official press releases. Acknowledgements & References Pakistan Body Count has been leveraged extensively in scholarly publications, reports, media articles and books. The website and the dataset has been collected and curated by the founder Zeeshan-ul-hassan Usmani. Users are allowed to use, copy, distribute and cite the dataset as follows: “Zeeshan-ul-hassan Usmani, Pakistan Body Count, Drone Attacks Dataset, Kaggle Dataset Repository, Jan 25, 2017.” Past Research Zeeshan-ul-hassan Usmani and Hira Bashir, “The Impact of Drone Strikes in Pakistan”, Cost of War Project, Brown University, December 16, 2014 Inspiration Some ideas worth exploring: • How many people got killed and injured per year in last 12 years? • How many attacks involved killing of actual terrorists from Al-Qaeeda and Taliban? • How many attacks involved women and children? • Visualize drone attacks on timeline • Find out any correlation with number of drone attacks with specific date and time, for example, do we have more drone attacks in September? • Find out any correlation with drone attacks and major global events (US funding to Pakistan and/or Afghanistan, Friendly talks with terrorist outfits by local or foreign government?) • The number of drone attacks in Bush Vs Obama tenure? • The number of drone attacks versus the global increase/decrease in terrorism? • Correlation between number of drone strikes and suicide bombings in Pakistan Questions? For detailed visit www.PakistanBodyCount.org Or contact Pakistan Body Count staff at info@pakistanbodycount.org -
2021
해외
공개
English
Quality Prediction in a Mining Process- 데이터 제공처 국가연구데이터플랫폼
- 데이터 리포지터리
- 생성자 EduardoMagalhãesOliveira;
- 과제명
- 과제책임자
- 과제수행기관
- 부처
- 라이센스유형 CC-BY;
- 주제분류 에너지/자원;
- 인용횟수 0
Explore real industrial data and help manufacturing plants to be more efficient Context It is not always easy to find databases from real world manufacturing plants, specially mining plants. So, I would like to share this database with the community, which comes from one of the most important parts of a mining process: a flotation plant! PLEASE HELP ME GET MORE DATASETS LIKE THIS FILLING A 30s SURVEY: https://airtable.com/shrJM8TYzNEMNALCv The main goal is to use this data to predict how much impurity is in the ore concentrate. As this impurity is measured every hour, if we can predict how much silica (impurity) is in the ore concentrate, we can help the engineers, giving them early information to take actions (empowering!). Hence, they will be able to take corrective actions in advance (reduce impurity, if it is the case) and also help the environment (reducing the amount of ore that goes to tailings as you reduce silica in the ore concentrate). Content The first column shows time and date range (from march of 2017 until september of 2017). Some columns were sampled every 20 second. Others were sampled on a hourly base. The second and third columns are quality measures of the iron ore pulp right before it is fed into the flotation plant. Column 4 until column 8 are the most important variables that impact in the ore quality in the end of the process. From column 9 until column 22, we can see process data (level and air flow inside the flotation columns, which also impact in ore quality. The last two columns are the final iron ore pulp quality measurement from the lab. Target is to predict the last column, which is the % of silica in the iron ore concentrate. Inspiration I have been working in this dataset for at least six months and would like to see if the community can help to answer the following questions: - Is it possible to predict % Silica Concentrate every minute? - How many steps (hours) ahead can we predict % Silica in Concentrate? This would help engineers to act in predictive and optimized way, mitigatin the % of iron that could have gone to tailings. - Is it possible to predict % Silica in Concentrate whitout using % Iron Concentrate column (as they are highly correlated)? -
2021
해외
공개
English
Synthetic Financial Datasets For Fraud Detection- 데이터 제공처 국가연구데이터플랫폼
- 데이터 리포지터리
- 생성자 Edgar Lopez-Rojas;
- 과제명
- 과제책임자
- 과제수행기관
- 부처
- 라이센스유형 CC-BY-SA;
- 주제분류 경제/경영;
- 인용횟수 0
Context There is a lack of public available datasets on financial services and specially in the emerging mobile money transactions domain. Financial datasets are important to many researchers and in particular to us performing research in the domain of fraud detection. Part of the problem is the intrinsically private nature of financial transactions, that leads to no publicly available datasets. We present a synthetic dataset generated using the simulator called PaySim as an approach to such a problem. PaySim uses aggregated data from the private dataset to generate a synthetic dataset that resembles the normal operation of transactions and injects malicious behaviour to later evaluate the performance of fraud detection methods. Content PaySim simulates mobile money transactions based on a sample of real transactions extracted from one month of financial logs from a mobile money service implemented in an African country. The original logs were provided by a multinational company, who is the provider of the mobile financial service which is currently running in more than 14 countries all around the world. This synthetic dataset is scaled down 1/4 of the original dataset and it is created just for Kaggle. -
2021
해외
공개
English
120 years of Olympic history: athletes and results- 데이터 제공처 국가연구데이터플랫폼
- 데이터 리포지터리
- 생성자 Randi H Griffin;
- 과제명
- 과제책임자
- 과제수행기관
- 부처
- 라이센스유형 CC-BY;
- 주제분류
- 인용횟수 0
Context This is a historical dataset on the modern Olympic Games, including all the Games from Athens 1896 to Rio 2016. I scraped this data from www.sports-reference.com in May 2018. The R code I used to scrape and wrangle the data is on GitHub. I recommend checking my kernel before starting your own analysis. Note that the Winter and Summer Games were held in the same year up until 1992. After that, they staggered them such that Winter Games occur on a four year cycle starting with 1994, then Summer in 1996, then Winter in 1998, and so on. A common mistake people make when analyzing this data is to assume that the Summer and Winter Games have always been staggered. Content The file athlete_events.csv contains 271116 rows and 15 columns. Each row corresponds to an individual athlete competing in an individual Olympic event (athlete-events). The columns are: ID - Unique number for each athlete Name - Athlete's name Sex - M or F Age - Integer Height - In centimeters Weight - In kilograms Team - Team name NOC - National Olympic Committee 3-letter code Games - Year and season Year - Integer Season - Summer or Winter City - Host city Sport - Sport Event - Event Medal - Gold, Silver, Bronze, or NA Acknowledgements The Olympic data on www.sports-reference.com is the result of an incredible amount of research by a group of Olympic history enthusiasts and self-proclaimed 'statistorians'. Check out their blog for more information. All I did was consolidated their decades of work into a convenient format for data analysis. Inspiration This dataset provides an opportunity to ask questions about how the Olympics have evolved over time, including questions about the participation and performance of women, different nations, and different sports and events.