Skip to content

Commit

Permalink
Add README and modify license field in setup.py
Browse files Browse the repository at this point in the history
  • Loading branch information
fangpenlin committed Apr 15, 2011
1 parent fac0962 commit 5e010c3
Show file tree
Hide file tree
Showing 2 changed files with 96 additions and 1 deletion.
95 changes: 95 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
What is loso?
=============

loso is a Chinese segmentation system written in Python. It was developed by Victor Lin ([email protected]) for Plurk Inc.

Copyright & Licnese
===================

Copyright of loso owns by Plurk Inc. It is an open source under BSD license.

Setup loso
==========

To install loso, clone the repo and run following command

::

cd loso
python setup.py develop

Also, you need to run a redis_ database for storing the lexicon database. Also, you need to copy configuration template and modify it.

::

cp default.yaml myconf.yaml
vim myconf.yaml

To use your configuration, you have to set the configuration environment variable LOSO_CONFIG_FILE. For example:

::

LOSO_CONFIG_FILE=myconfig.yaml python setup.py server

.. _redis: https://redis.io/

Use loso
========

Loso determines segmentation according to the lexicon database, and the algorithm is based on Hidden Makov Model, therefore, it is not possible to use the service before building a lexicon database.

To feed a text file to the database, here you can run

::

python setup.py feed -f /home/victorlin/plurk_src/realtime_search/word_segment/sample_data/sample_tr_ch


To clean the database, you can run

::

python setup.py reset

To interact and test for splitting terms, here you can run

::

python setup.py interact


For example

::

Text: 留下鉅細靡遺的太空梭發射影片,供世人回味
....
留下 鉅細靡遺 的 太空梭 發射 影片 供 世人 回味


To use the segmentation service as XMLRPC service, here you can run


::

python setup.py serve


Following is a simple Python program for showing how to use it


:: code-block:: python

import xmlrpclib

proxy = xmlrpclib.ServerProxy("https://localhost:5566/")

terms = proxy.splitTerms(u'留下鉅細靡遺的太空梭發射影片,供世人回味')
print ' '.join(terms)

And the output should be


::

留下 鉅細靡遺 的 太空梭 發射 影片 供 世人 回味
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
setup(
name='Plurk_Loso',
version='0.1',
license='MIT',
license='BSD',
author='Plurk Inc.',
author_email='[email protected]',
description='Chinese segmentation library',
Expand Down

0 comments on commit 5e010c3

Please sign in to comment.