Category Archives: Research

ACL 2013, Day 1

Association for Computational Linguistic (ACL) conference is one of the top ranked conferences in the field of natural language processing. This year I am attending ACL 2013 (http://www.acl2013.org/site/) in Sofia, Bulgaria. Surprisingly, the main conference sponsor is Baidu. My presentation is scheduled on Friday, 9. Aug 2013 at 16:30 (GMT+2) within the BioNLP Workshop, Gene Regulation Network Shared Task, which Marinka Žitnik and I won.

Yesterday I flew via Vienna to International Airport Sofia and took a cab to William Gladstone Street 44, where is my hotel – Art’Otel and I am staying here until saturday. The hotel is nothing special, but it is good enough (In my opinion not worth of all ****) and quite close to the conference venue – 10minute walk.

First impressions about Sofia: I thought Bulgaria was in very bad condition, but it is not. Cars are normal, people look European-like, streets are clean. I also like the peace in the city. There is no rush, not so car-overcrowded and there are exactly enough people on the streets.  The only thing one would notice is that most of the buildings are older. For instance, the National Center of Culture (NDK) is enormous building, very nice, but should be renovated to look more modern. The same goes for the park in front of it, etc.. So to conclude, the only thing Sofia would need in my opinion is buildings renovation.

Today, Sun, August 4th, there was a tutorial day at the conference. There were four parallel tutorial in the morning and another four in the afternoon.

In the morning I attended the tutorial Variational Inference for Structured NLP Models by David Burkett and Dan Klein.

The tutorial was very informative and greatly presented. The focus of the talk was how to efficiently implement inference over already given factor graph with static structure. The intro started with introduction into HMM and then into different CRF types (linear, arbitrary, tree-like). Firstly, we were introduced to inference using Mean Field and then its approximation when trying to learn two interdependent labeling tasks. We continued with the problem of joint parsing and alignment. Lastly, we were talking about (“loopy”) belief propagation and using it for inference of dependency parsing.

During the lunch break I went to Boom – this appears to be the best place to eat Burger in Sofia. I got this inside info by my friend Didka (like there will be many other tips during my stay in Bulgaria :)).

In the afternoon I attended the tutorial Robust Automated Natural Language Processing with Multiword Expressions and Collocations by Valia Kordoni and Markus Egg.

The talk was about identifying multiword expressions, for example “take the clothes off”, which means the same as “undress”. I saw no technical information of algorithms, approaches, …, just raw history of research in this fields, so therefore I went to another tutorial session after the coffee break, even though I had not apply for it.

I moved to the Exploiting Social Media for Natural Language Processing: Bridging the Gap between Language-centric and Real-world Applications by Simone Paolo Ponzetto and Andrea Zielinksi.

This tutorial was a bit more interesting, but kept on a very general level. Friends later told me that the first part was better as there were more technical details given. The second part was a review of work in entity and event extraction from twitter along with some practical systems presentations. For example, the talk focused into extraction of person names, e.g. “Steve Jobs” and events, e.g. “DEATH”. Two interesting systems were about earthquake reporting and location-based disease information aggregation.

In the evening there was Welcome reception at Sky Plaza – on the top of NDK. There we got Bulgarian food, drinks and some live music. After few hours of mingling I went back to the hotel and here I am writing this post …

Optilab SCI Talk S02E01 – An Introduction into Entity Detection

I am posting a first lecture of second season of Optilab’s Science Talks. The recording was a pilot project, but from now on, all lectures will be professionally recorded and published.

The aim of this talk is to give a brief introduction into basic data mining methods, present the problem and lastly explain the current solution to uncover entites from text. The next lecture will continue the topic by presenting the Hidden Markov Models algorithm and will be aired in May, 2013.

Slides:

Google Hangout recording on Youtube:

Marinka Žitnik (sestra) – zagovor diplomske naloge

Danes je Marinka Žitnik zagovorila svojo diplomsko nalogo z naslovom “Pristop matrične faktorizacije za gradnjo napovednih modelov iz heterogenih podatkovnih virov” (A Matrix Factorization Approach for Inference of Prediction Models from Heterogeneous Data Sources), zaradi česar ji iskreno čestitam!!!

Še posebej velja poudariti, da je na dodiplomskem Interdisciplinarnem študiju računalništva in matematike na Fakulteti za računalništvo in informatiko in Fakulteti za matematiko in fiziko Univerze v Ljubljani dosegla skupaj z diplomo povprečno oceno 10.0, poleg tega pa je z vsemi vmesnimi uspehi študij zaključila v manj kot 4 letih.

Seveda sem bil na zagovoru diplome prisoten in sem ga tudi posnel (Chrome 6, Safari :), isti fullHD posnetek je tudi dostopen na: http://zitnik.si/temp/ZagovorDiplome_MarinkaZitnik_20_07_2012.mp4):


Nekaj slik z mentorjem (prof. dr. Blažem Zupanom) in komisijo:

Optilab Tech Talk – “Ontology as NoSQL Database Schema”

Today I presented a HOT topic about Ontologies and NoSQL as a Tech Talk at Optilab d.o.o.. At this company I work as a Junior Researcher and Tech Talks are our internal lectures to other co-workers. Typical lessons normally consist of something that one of us works on or he would just like to share knowledge.

The main problem I tried to address in my talk was:

  • WHAT NoSQL IS MISSING FOR GENERAL USE
  • HOW ONTOLOGY CAN HELP SOLVE THE PROBLEM

I see an ontology as an additional layer over NoSQL database. It can provide nice runtime-customizable schema and SPARQL/Update language to easily manipulate data. I believe this is especially important when combining data from different sources – after some time no one will know what relation or concept types the database contains. Another thing is SPARQL support – through an endpoint a user can run some analysis. Furthermore, when having data represented by an ontology, we can quickly change the database to another appropriate store, which cannot be so easily done between raw NoSQL – for example: try to straightforwardly transfer data from key-value to graph data store :).

CAiSE Conference, Days 3-5

In the following days the main conference took place.

It began on wednesday with Prof. Michele Missikof’s talk “Looking at Future Research Challenges in Enterprise Information Systems“. He presented his work in the European Commission and pointed out two notions:

  • Liquid Enterprise: The company everyone works for (e.g. developing applications for company’s market).
  • Glocal: “Think Global, Act Local”.

Second keynote speech was given by Dr. Krzysztof Kurowski on Thursday – “Challenges for Future Platforms, Services and Networked Applications“. He is Polish researcher and gave us a wonderful talk from past to future. His main research is Networking and therefore he presented the development of optical connectivity throughout the Poland. His main conclusion was that network is nothing without users / applications aka. Information Systems.

Some interesting lectures I remembered:

  • H. Leopold: Generating Natural Language Texts from Business Process Models: They implemented system that generated textual description of selected process model. At first they identify verb in an activity description and then use some predefined schema to build a sentence. According to activity level and concurrency the text is also accordingly indented.
  • E. Daskalaki: OtO Matching System: A multi-strategy approach to Instance Matching: The author presented an ontology oriented matching system that leverages the usage of known matching algorithms and multiple metrics. The presented system looks similar to ours Data Matching and Merging platform.
  • M.A. Kabir: SCIMS: A Social Context Information Management System fo Socially-Aware Applications: The author presented a system that is gathering data from multiple sources (e.g. LinkedIn, Facebook). They annotated the source data using an underlying ontology. To show usefulness of the proposed system they implemented an Android application that helps user whether it is a good time to call a party or not.

On thursday there was also a poster session. I remember a guy that was presenting an app for connection text and accompanying process model. The next interesting poster was about building and validation business process models by simulating the process in a 3D model – like SecondLife.

Throughout the conference I have seen no methods to generate BPMs from texts and that may be my topic for next year.

The conference ended on Friday at about 2 p.m. with some facts and announcement of CAiSE 2013 which will take place in Valencia, Spain. Submission deadlines are already published, so it is time to write. My mentor prof. dr. Marko Bajec  was also this year’s Workshop chair:

After closing session we collected our luggage and came to Slovenia at 9:30 p.m..

Some pictures:

CAiSE Conference, Day 2

Today I woke up early to finish my presentation I had at 9:30 at the Doctoral Consortium – Collective Ontology-based Information Extraction using Probabilistic Graphical Models (Slavko Zitnik, University of Ljubljana, Faculty of Computer and Information Science, Slovenia):

I also listened to other PhD candidates later:

  • Participatory Quality Management of Ontologies in Enterprise Modelling (Nadejda Alkhaldi, VUB university of Brussels, Belgium)
  • Towards a new generation of security requirements definition methodology using ontologies (Amina Souag, Université de Paris1 Panthéon-Sorbonne, France)
  • Towards Automation of Enterprise Architecture Model Maintenance (Matthias Farwick, University of Innsbruck, Austria)
  • Software Component Allocation in Distributed Development Settings (Tommi Kramer, University of Mannheim, Germany)
  • Inconsistency Detection and Repair for Heterogeneous Multimodels (Reza Gorgan MohammadiAUT, Islamic Republic of Iran)

For the session after lunch I took ONTOSE’s Workshop because there were more relevant presentations to my research area:

  • Ontology Evolution with Semantic Wikis (Mauro Dragoni, Chiara Ghidini) They presented a new type of Wiki – the evolution of such research from Italian FBK, where an ontology is represented as a Wiki site.
  • Using Open Information Extraction and LOD towards Ontology enrichment and alignment (Antonis Koukourikos, Pythagoras Karampiperis, George Vouros, Vangelis Karkaletsis)
  • Modeling the context of scientific information: mapping VIVO and CERIF (Leonardo Lezcano, Brigitte Jörg, Miguel-Angel Sicilia): VIVO is an american ontology and CERIF is european. They present a mapping between the two and point out problems. For example mapping to multiple possible concepts or non-existing concept on the other side.

Lastly I attended last PhD consortium session. Two PhD candidates presented their ideas for managing clinical paths, diagnosis in Medial Informatics.

In the evening an Old Town visit was organized.

 

CAiSE Conference, Day 1

Today we started at 9 o’clock at Gdansk University. I attended IWSSA Workshop (International Workshop on System/Software Architectures).

There were some lectures regarding IS Architectures, Multi-threading, but similar to my topic of research, the most important talks were:

  • An Architecture for Efficient Web Crawling (Inma Hernandez, Carlos R. Rivero, David Ruiz and Rafael Corchuelo, University of Seville, Spain)

They introduce term “Virtual Integration” – to be able to work with a website as it is a database. The idea is to first use search engines to get hubs of pages and crawl them, then keep only relevant and then extract information from them and present structured data to user.

A hub is page that has more links to relevant pages and many links to irrelevant ones.

Requirements for crawling:

  • Retrieve of relevant pages
  • Deep Web Access
  • Efficiency (not to DoS the servers)
  • Unsupervision (to scale well – provide only link to a main page and then work autnomously)

Crawling types: Crawling, Focused Crawling, Classifier-based Crawler = the most efficient crawler.

In their work they try to validate the page based on the URL.

“Patritia tree” – tree of website structure

Evaluation: They used top 41 sites on the Internet and 4 academic sites (DBLP, Google Scholar, MS Academic Search and ?). They have collected tiny fractions of the whole sites = 100 hubs. They were trying to discover types of pages: e.g. Products, Reviews and Authors for Amazon. On all sites they achieved 95+-3% F score.

They have online demo: CALA Demo.

  • A Reference Architecture to Devise Web Information Extractors (Hassan A. Sleiman and Rafael Corchuelo, University of Seville, Spain)

Idea is that pages are encoded into HTML Problems for IE:

  • Lots of techniques
  • None universally applicable
  • No development support tools
  • No validation consesus
  • No reference architecture

-> They propose reference architecture. I believe it is nothing new. The presenter said it is different and GATE, UIMA or CoreNLP tools are not useful because they work on semi-structured data.

Of course I used the time  after last session for tomorrow presentation making (I surely follow Bill Gates’ words: “To be a good professional engineer, always start to study late for exams cause it teaches you how to manage time and tackle emergencies.”) – postpone presentation making for later, you may get some new ideas 🙂

In the evening we went to gala dinner to Sheraton hotel in Gdansk downtown. After dinner we also “touched” the Baltic sea :).