Pacilng 2013

Pacling 2013: http://pacling.nak.ics.keio.ac.jp/

My Presentation:
Paper: https://docs.google.com/file/d/0B2CohiuaB08MZlhHMkFqN0VSZ2s/edit?usp=sharing
Slide: https://docs.google.com/file/d/0B2CohiuaB08MU1VxSWxDQ0FEaEk/edit?usp=sharing
Comments and Questions:
 * What is the definition of "common sense knowledge"
    -> We define the predicate (verb, adjective, verbal noun) as common sense knowledge.
 * What corpus did you use?
    -> We used the Japanese Google 7-gram data.
 * How did you evaluate the results?
    -> We manually evaluated whether assigned predicates are correct or not.
 * What applications do you want to utilize common sense knowledge base for?
    -> We want to utilize common sense knowledge for various NLP applications such as a conversation system.
 * You can try using the case frame corpus constructed by Kyoto University: http://nlp.ist.i.kyoto-u.ac.jp/index.php?%E4%BA%AC%E9%83%BD%E5%A4%A7%E5%AD%A6%E6%A0%BC%E3%83%95%E3%83%AC%E3%83%BC%E3%83%A0

2013.09.02
 
Interest presentation
Two Issues in Syntactic Parsing
memo
- Coordinate Structure: This information helps improve parsing accuracy
   -> DP matching method for alignment (path-based method)
   -> Parse trees produced by the grammar rules (tree-based method)
       * Tree structure can represent coordinate structure as a tree
       * Sum of all the scores of COORD/COORD nodes in the tree
- Grammatical Units
 * Multiword Expressions (MWE)
     - Lexicalized phrases & Institutional phrases: collocations, named entity
     ? How to construct MWE Lexicon -> from Wikitionary
     ? How to construct MWE annotated corpus -> annotation of Penn Treebank
     ? What to do
        -> Dictionary of semi-fixed and syntactically flexible MWEs
        -> MWE annotated corpus construction
        -> Parsing with MWE dictionary
 * Complex sentence pattern -> Joing processing
    Investigation of clause pattern variations around "SBAR" pattern
    Extraction of SBAR patterns in auto-parsed English corpus and grouping them
        Corpus data: Hiragana Times (http://www.hiraganatimes.com/en/)
Coordinate Structure: Did you try to a statistical approach? -> Yes, but we couln't get fine resutls.

Thematic Representation of Short Text Messages with Latent Topics: Application in the Twitter context
URL: http://mohamedmorchid.fr/articles/pacling2013_mohamed_morchid.pdf
memo
 What's a merit of your output.
Example: input and output
Input: just a tweet
Output: Add information which relate to a tweet in order to understand the tweet.
   e.g. In 1954, the NBA had no health benefits no penston plan, no minimum salary , and the average players salary was 8000$ a season
How do you hack the Wikipedia article. -> time up

Bursty Topics in Time Series Japanese / Chinese News Streams and their Cross-Lingual Alignment
memo
I want to see a recall evaluation.
-> future plans
We think that the birthday topic is a important event between Japan and Chinese.


2013.09.03
 
Interest Presentation
Using Heterogeneous Features for Scientific Citation Classification
memo

Researchers are faced with ever increasing literature in all fields
 -> Help researchers to more efficiently distill knowledge from scientific citation networks

Design of a Web-scale Japanese Corpus
memo 
NICT: Japanese Syntactic Dependency Database Version 1.1.
  - 480 million sysntactic dependency relations in 600 million pages and 43 billion sentences
Kyoto University: Kyoto-U Case Frames (Version 1.0) in 2009
Tsukuba-U: Tsukuba Web Corpus
NDL: Web Archive Project
JpTenTen11
Yata: Japanese Web Corpus 2010

Heritrix Crawler (Version 3.1)
  - Developed by Internet Archive (United State)
  - Used by national libraries (e.g. NDL in Japan)
NWC (Nihongo Web Corpus) Toolkit
Masuoka-Takubo POS target
Kokugo-ken Short Unit & Kokubo-ken Long Unit (Chunker CRF++)

I hope this corpus is published
One issue is the copyright of original

2013.09.04
 
Interest Presentation
Extraction of Drug Information using Clue Words from Japanese Blogs
memo 
Extraction of medical information from patient's blogs
 * To get answers fro these questions
 * To help decision making
TOBYO-jiten
Okusuri110ban (NGO website with drugs information)
  --- 12,170 illness related nouns automatically retrieved.Does the recall of results is increased If you collect more examples?
   -> We think the precision is more important than the recall (Because we want to use this system for participants) So, the small collect data is  also more important than big noisy data.
Their approach accepts the negative expressions.

Dependency-Based Method for Extracting Causes of Emotions
memo
Challenge: Recognition of implicit emotions from text
Novel method for extraction of emotion causes from sentences

How can I use the technology everyday in my life -> analysis for blogs, news and forums (for example, marketing emotions might be useful)
in other language -> similar system is developed

Comments