Category Archives: Computer engineering

Optilab SCI Talk S02E01 – An Introduction into Entity Detection

I am posting a first lecture of second season of Optilab’s Science Talks. The recording was a pilot project, but from now on, all lectures will be professionally recorded and published.

The aim of this talk is to give a brief introduction into basic data mining methods, present the problem and lastly explain the current solution to uncover entites from text. The next lecture will continue the topic by presenting the Hidden Markov Models algorithm and will be aired in May, 2013.

Slides:

Google Hangout recording on Youtube:

How to integrate two independent jQuery libraries within a single page?

It has been long since my last post. That is not because I have nothing useful to write, but more likely that I am lazy to document interesting stuff I do…. Anyway, let’s go to the point!

Recently I had to integrate Lightbox 2 into a website that is run by a CMS. There was no possibility of FTP access, PHP source code access, … The only thing I could use was CMS’s static content form.

Surprisingly I could add the following code as a HTML form content:

<script type="text/javascript" src="http://zitnik.si/temp/lightbox/js/jquery-1.7.2.min.js"></script> <script type="text/javascript"> // <![CDATA[ document.write("<link href='http://zitnik.si/temp/lightbox/css/lightbox.css' rel='stylesheet' />"); // ]]> </script> <script type="text/javascript" src="http://zitnik.si/temp/lightbox/js/lightbox.js"></script>

These lines just load appropriate jQuery library (currently 1.7.2, used by Lightbox 2), CSS styles and finally Lightbox’s JavaScript code. Due to the fact I could not insert link tags directly into the content, I accomplished this by printing the code using JavaScript. The latter does not influence on having multiple jQuery libraries on a page, but it needed to be done in my case and seems a nice workaround :).

The problem was that CMS is using jQuery of version 1.5.2, but Lightbox needed 1.7.2. Because I could not upgrade 1.5.2 version to the latest, I had to separate these two libraries. I also could not just simply override the old one because other parts of CMS generated page stopped working. This can be achieved by loading the second library into a variable. Into the upper javascript I added the following:
var $jq172 = jQuery.noConflict(true);
As you maybe know, the jQuery functions can be called by $(). After this command, the formerly loaded jQuery can be accessed through $() and last added library as $jq172(). If the parameter to the noConflict function is true, the previous library is intact, otherwise it is overwritten.

Lastly I needed to apply minor change to lightbox.js to instruct the script to use 1.7.2 library. Due to the nice coding style I just needed to change the line 46 into
$ = $jq172;
By applying all these I was able to have working Lightbox 2, working previous CMS scripts and having done everything without ant CMS code change. The Lightbox library can be then used completely normally.

YouTrack4 installation on Ubuntu 12.04

YouTrack4 seems free alternative to Atlassian’s JIRA. I use JIRA on production projects and on first sight it seems far better than YouTrack4. The missing feature I immediately noticed is task time tracking and few other minor things.

Both YouTrack4 and JIRA can be hosted, but dowload versions are cheaper. The minimum version for both has upper limit defined by 10 users. YouTrack4 version is free, but JIRA costs 10$ (still very little). All bigger packages cost more money – see pricing pages.

 

As I will soon start a simple home project, I gave YouTrack4 a try. Further in this post I will describe hot to install it as a service on Ubuntu 12.04.

STEP 1: Download the complete JAR bundle from http://www.jetbrains.com/youtrack/download/get_youtrack.html.

STEP 2: Run bundle and test YouTrack4.

I copied the downloaded JAR to ~/startup/youtrack-4.02/youtrack-4.0.2.jar. Then I created script ~/startup/runYoutrack4.sh, which starts integrated Jetty webserver on port 8082:

#!/bin/bash
cd /home/slavkoz/startup/youtrack-4.0.2
java -Xmx512m -Djava.awt.headless=true -jar youtrack-4.0.2.jar 8082

STEP 3: Run script runYoutrack4.sh as root. If you can use YouTrack4 at http://localhost:8082, then continue.

STEP 4: Create init script /etc/init.d/youtrack4:

#!/bin/sh
### BEGIN INIT INFO
# Provides:          youtrack4
# Required-Start:    $remote_fs $syslog
# Required-Stop:     $remote_fs $syslog
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: Start daemon at boot time
# Description:       Enable service provided by daemon.
### END INIT INFO
#############################################################
# Init script for YouTrack4
#############################################################
# Defaults
SCRIPTNAME=/home/slavkoz/startup/runYouTrack4.sh
case "$1" in
start)
sudo $SCRIPTNAME start
;;
*)
echo "Usage: $0 start" >&2
exit 3
;;
esac
exit 0

STEP 5: Test service by issuing the command /etc/init.d/youtrack start. If YouTrack4 starts, continue.

STEP 6: Set service to automatically start at system startup using command:

sudo update-rc.d youtrack4 defaults

STEP 7: If everything went well, restart yout computer and youtrack should be accessible at http://localhost:8082.

Enjoy managing your projects …

Optilab Tech Talk – “Ontology as NoSQL Database Schema”

Today I presented a HOT topic about Ontologies and NoSQL as a Tech Talk at Optilab d.o.o.. At this company I work as a Junior Researcher and Tech Talks are our internal lectures to other co-workers. Typical lessons normally consist of something that one of us works on or he would just like to share knowledge.

The main problem I tried to address in my talk was:

  • WHAT NoSQL IS MISSING FOR GENERAL USE
  • HOW ONTOLOGY CAN HELP SOLVE THE PROBLEM

I see an ontology as an additional layer over NoSQL database. It can provide nice runtime-customizable schema and SPARQL/Update language to easily manipulate data. I believe this is especially important when combining data from different sources – after some time no one will know what relation or concept types the database contains. Another thing is SPARQL support – through an endpoint a user can run some analysis. Furthermore, when having data represented by an ontology, we can quickly change the database to another appropriate store, which cannot be so easily done between raw NoSQL – for example: try to straightforwardly transfer data from key-value to graph data store :).

FRI Summer School – “How to make your own Facebook”

I attended FRI Summer School “How to make your own Facebook” from 9th-13th July 2012.

The school was mainly by best Slovene open-source developers: Aleš Justin, Marko Lukša, Tomaž Cerar and Marko.

Initial project is available on GitHub: https://github.com/openblend. Throughout the week they presented us programming in Java EE on JBoss server 7. At first they introduced us Git versioning system, then IntellijIDEA IDE, Maven build tool. Lessons continued with CDI, JSF, Servlets, EJB3, JPA and integrated H2 SQL data store, debugging and testing techniques and lots of tricks…

At the end I also published app on OpenShift. This is scalable PaaS with three free instances (JBoss, Database, Other environment) for your own application.

The ones who have stayed at lessons until the end got free ticket for OpenBlend conference. This is Slovene Java Programming conference that will happen on 20th September 2012. I hope I will have time to attend to it.

Justin also intrduced us his work on a new language Ceylon (runs on JVM) and Google AppEngine implementation for JBoss – CapeDwarf. These are some new toys I need to try.

Android Tutorial: “Android: From Idea to Application”

This week I am going to attend “Swing Morja in Sonca 2012” – Summer Swing dance workshops and parties in Crveni Vrh, Croatia.

Usually attendees can cooperate by showing their work, knowledge, hobbies, etc. The organizers wished to have a bit more technical workshop, so I prepared an Android tutorial.

The presentation is seen below:

I hope the listeners will not fall asleep :).

I will also present a “Hello World” app on Android and then the attendees will in form of groups design their own SMS Applications (sketching on papers).

The idea of an easy application I will also hands-on develop is to parse memories from SMS 2012 official site and show them (http://swingmorjainsonca.com/zapisi-spomine/):

Now I need to hurry, I am already a bit late…..

Some pictures:

My phone usage history and back to … Android!

In this post I will briefly present all the mobile phones I owned and will point out pros/cons of iPhone 4S. I believe Android phones provide best user experience and features combined with great hardware.

My phone history:

  1. Ericsson GA-628 (Mobi Reglja) (1999-2000) My first phone I got from my parents. It got stolen in school.
  2. Sagem RC-815 (Mobi Čuk) (2000-2004) Second phone, cheap to use (I believe the price was about 17.000SIT < 100EUR).
  3. Siemens CX-65 (2004 – Summer 2006) Nice phone that had a camera. It shot pictures at 640×480 resolution. It also featured IR sensor I used to transfer data. At the end I bought a USB cable and refactored it to fill the battery over it. The phone was then attached to a computer using Si.Mobil’s Blackberry package to provide internet to other computers in the house (max speed was up to 250kbit/s) – see some first posts in this blog about this.
  4. Motorola E1070 (Summer 2006 – Summer 2009) I bought this phone in my first year at college. I featured UMTS connectivity, mini USB connection with charging, micro SD card slot (I used 1GB), 1.3 Mpix camera and manual SyncML sync to Google. It was great phone with small outside screen.
  5. Nokia N95-8GB (Summer 2009 – Summer 2010) I bought this phone for 220EUR and sold it a year later for 160EUR. It was my first smartphone and featured all functions I still need: call recording, auto profiles, auto mail, calendar & contacts sync, 5Mpix camera with DVD quality recording, nokia maps navigation (one of the greatest navigation systems!!). I sold it because the navigation needed to be paid -> It was free on newer phones.
  6. HTC Wildfire (Summer 2010) I bought this phone mainly to try Android and start some development projects – MobiFoto. The phone did not impress me, but was cheap.
  7. Nokia 5230 (Summer 2010-May 2011) I bought this phone to compete at NokiaAppForum and to have free navigation. It featured free maps and resistive touch screen. I gave this phone to my mom and she still uses it.
  8. Nokia E7 (May 2011-July 2011) I got this phone for free as a Developer Gift from Nokia. It was great phone, had all the features I needed. The only downside for me was no auto-focus and a very enthusiastic Android community.
  9. HTC Desire (July 2011-10.6.2012) I decided to exchange Nokia E7 for this phone. At first I regret it a little, but after I discovered all posibillities of rooted Android and MIUI ROM, it was more than enough for me – online with everything I neede for 2 days. Some applications I used (RadioAlarm, CallRecorder, Tasker, EasyMoney, Foldersync, …). After I bought next phone, I gave it to my dad.
  10. iPhone 4S 16GB (10.6.2012-17.6.2012) Some friends of mine (maybe it is important to say that no one has used Android before) had been even more enthusiastic about iPhone than Android users. As I wanted to buy it few times before, had no annex at Si.Mobil and had birthday, decided to give it a try. It sure is nice phone but lacks some features Android phones can provide. (of course I also jailbreaked it 😉
  11. HTC One X (17.6. 2012 -*) As I was not happy enought with my iPhone, I put an ad to bolha.com and exchange it for this phone in a day ;). (from the expense point of view, I made 59EUR profit as HTC One X costs 60EUR with the same package I needed to have to buy iPhone)
  12. TBD (expected in 2014-2016)

So, why did I go back to Android. I will list just some advantages and disadvantages over HTC Desire about iPhone 4S that I have in mind:

Advantages:

  • All sounds (navigation guidance) are forwarded to bluetooth when connected in car. Also music is faded out when other sound is played.
  • Siri understands a lot of commands
  • It is easy to jailbreak it and a lot cracked apps can be tried using Installous.
  • TuneIn application really works and speakers are great.
  • It has sleek design.
  • All apps look very similar to each other and very smooth.

Disadvantages:

  • There is no auto profiles (Tasker) – (main reason for change – I love auto silenced phone on meetings, near wifis,….. The iOS app AutoSilent I bought did not work as it should), call recording app. (for iOS 5.1.1)
  • The phone is not silenced on put off.
  • It has really small screen.
  • It does not have standard microUSB connectivity.
  • No “normal” disk drive connection.
  • No widgets.

Simple tutorial: Linked Data – the NoSQL driver?

In this demo I will show how semantic data can be easily created and manipulated using ontologies and NoSQL datastores. In the tutorial I will use:

  • Protege 4.2 beta: The tool to design (test) ontologies.
  • Fuseki: SPARQL endpoint within Apache Jena project. To completely run on “real” NoSQL database, use Jena TDB store (will be covered in further tutorials).
  • RelFinder: Simple tool for searching connections between instances in provided semantic sources.

Prior knowledge to follow: Some basics about ontologies and NoSQL.

First we will design an ontology and fill it with some initial instances. Then we will save output as RDF/XML format and import it to Fuseki. There we will run some simple queries. At last we will connect RelFinder to our Fuseki instance and search for relations between two instances.

Let’s begin…

1. Ontology development

We will develop simple ontology schema that defines Person and Car classes. Person can sell or drive a car. Some basic data properties we will include are name, address, type, etc. The ontology namespace will be http://zitnik.si/ontos/owl/ss2demo.

Some of the instances we will fill into the ontology:

  • Janice_Dickinson sells BMW_X6
  • Janice_Dickinson drives Volkswagen_Lupo
  • Janice_Dickinson Name “Janice”
  • Slavko_Zitnik sells Volkswagen_Lupo
  • Slavko_Zitnik Name “Slavko”

An example RDF/XML output from Protege can be downloaded here.

2. SPARQL endpoint setup

At this step we will need to download Fuseki and run it in the following way:

./fuseki-server --update --mem /ds

 

With this command, we have used settings for “ds” dataset in config.ttl, enable updating the database with new data and create empty memory-based store. If everyhing goes well, we should have our server running at http://localhost:3030/. At that site you should choose Control panel link and then select “/ds” source. In the File Upload section, select .owl file from previous step and upload it to server. Now the database should be filled with some triples.

To view all the triples, go back to SPARQL Query form and run:

SELECT * { ?x ?y ?z}

 

To find out who sells the BMW_X6, run:

PREFIX ss2: <http://zitnik.si/ontos/owl/ss2demo#>
SELECT * { ?x ss2:sells ss2:BMW_X6}

 

Play a little bit with the queries. If you are new to SPARQL search the internet for some tutorials – there are lots of them.

3. RelFinder demo

Now we will connect to our database via RelFinder.

  1. Go to http://www.visualdataweb.org/relfinder/relfinder.php.
  2. Click propertis button and remove all predefined sources.
  3. Create new config, named SS2:
    • Endpoint URI: http://YOUR_PUBLIC_IP:3030/ds/query (enable port forwarding on a router if you are behind NAT)
    • Check Don’t append /sparql
    • Check Use proxy
    • Select GET method
    • Add autocomplete URI: http://zitnik.si/ontos/owl/ss2demo#Name (The fields that will be supported by autocompletion on query input)
    • Click OK and test connection with Query tool.

Now if you choose input “Janice” and “Slavko”, you should get the following result:

Play a little bit with visualization options. Nicer output can be achieved by hiding type connections. Therefore we see that Slavko is selling the car that Janice has been driving.

Continue playing with semantics, ontologies and stay tuned until next tutorial in which we will cover ontology manipulation via Jena API with TDB disk store.

Apache Nutch 1.4 – Form authentication [SOLVED]

As I have been searching over the internet, I found out lots of people having problems with Form-based authentication when using Apache Nutch crawler. All posts I have found ended with no solution, so I am giving you one option here.

By default, Nutch uses protocol-http plugin to retrieve pages. The plugin protocol-httpclient supports several HTTP authentication schemes out of the box and uses (still :() HttpClient v3.x (to use this plugin, you will need to update conf/nutch-site.xml plugin.includes properties). Credentials for specific hosts are read from conf/httpclient-auth.xml file.

Good option is to define xml for saving forms credentials. I for example, used:

<credentials username="myUsn" password="myPass">
      <formscope loginPage="httpMethodPageUrl" 
                 className="si.zitnik.pathToClassName"
                 port="portNum" />
</credentials>

Then I edited setCredentials method inside Http class in protocol-httpclient plugin to read new type of credentials. In the method resolveCredentials I instantiate class given by className and call the login function (build your prefered way of abstract classes/interfaces to make the procedure as generic as possible). In the plugin, httpclient uses BROWSER_COMPATIBILITY Cookie policy, so we need no further changes.

The last thing is writing your own login class that accepts previously read parameters and authenticates to the page. The easiest way is to write it directly inside protocol-httpclient plugin. (If you want to write it somewhere else, you will need to modify dependencies in plugin’s build xmls).

After that enjoy crawling!