IC500 – Database Management Systems



CS 474 Text MiningFall 2017Instructor:Prof. Myaeng, Sung Hyon (???)Office: #604, N1 (IT????)Phone:+82-42-350-3553Fax: +82-42-350-3510E-mail: HYPERLINK "mailto:myaeng@kaist.ac.kr" myaeng@kaist.ac.kr Home Page: Hours:T 4:00-5:30pm or by appointment TA:Taewon Yoon, Byunggyu Ahn, Jeongju SohnClass Hours: M & W 14:30 – 15:50 Classroom:#2445, E3-1Textbook:S. Weiss, N. Indurkhya, and T. Zhang (2015). Fundamentals of Predictive Text Mining (2nd Edition), Springer. [Main textbook]C. Manning, P. Raghavan, & H. Schutze (2008). Introduction to Information Retrieval, Cambridge University Press. [For greater details of some topics such as classification]3. I. Witten, E. Frank & M. Hall (2011). Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition, Morgan Kaufmann Publishers. [For data mining algorithms and tools]References:1. M. Konchady (2006). Text Mining Application Programming, Charles River Media. [Well balanced coverage with a sufficient depth]2. C. Manning & H. Schutze (1999). Foundations of Statistical Natural Language Processing, MIT Press. [For comprehensive coverage and yet in-depth knowledge in statistical NLP]3. C. Aggarwal & C. Zhai (editors) (2012). Mining Text Data, Springer. [A collection of chapters written by researchers; in-depth coverage in each chapter with highly technical treatment]4. J. Silge & D. Robinson (2017).Text Mining with R. O’Reilly Media. [For hands-on experiences]5. Selected research papers (TBA)Pre-requisites:Data Structures, Basic Linear Algebra, Basic Probability & StatisticsGrading Policy: Exams:35%Assignments/Quiz:20%Term Project:40%Class Attendance & Participation:5%Course Schedule (subject to change)IntroductionText Mining (TM) processes, key issues, & applicationsText Processing IBuilding blocks & statistical propertiesText Processing II Basic natural language processing (NLP) techniques Information Retrieval & Text MiningText representation, ranking, & Web searchText CategorizationNa?ve Bayes & k-nearest neighbor classifiers & evaluationsText RepresentationDecision-tree classifiers & feature selection/extraction for dimensionality reductionText Mining Tools and ProjectsMidterm ExamApplication I: Sentiment analysisDocument ClusteringSimilarity measures, k-means & hierarchical clustering, EM algorithm, evaluationTopic ModelingMatrix factorization & LDAInformation Extraction IEntity extraction, sequential tagging, CRFInformation Extraction IIRelation extraction, open IEApplication IIQuestion Answering (QA), knowledge base construction/augmentation Neural Approaches to TM ProcessesText embedding methods, deep learning classifiers, learning to rankFinal ExamCourse Policies:Classroom Courtesy:Turn off your smart phone (no course material reading!)Laptops are allowed but neither messenger-like nor social media apps.Lecture Materials & Announcements: Please check KLMS (KAIST Learning Management System) regularly for course materials and major announcements. Lecture slides will be uploaded at least 30 minutes before the class. Missed Quizzes and Exams: Missed or late quizzes cannot be made up under any circumstances but, with a good cause and adequate notice, an early quiz may be given. Assignments: All assignments are due at the beginning of class on the date due and supposed to be submitted to the KLMS unless arranged differently. Late submission of assignments will be assessed with a penalty of 10% per day. No exceptions are made. Academic Dishonesty: Plagiarism and cheating are serious offenses and may be punished by failure on exam or assignments; failure in course; and/or expulsion from the University.Need for Assistance: If you have any condition, such as a physical or learning disability, which will make it difficult for you to carry out the work as I have outlined, or which will require academic accommodations, please notify me as soon as possible. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download