Liping's Blog

翁曰:「無他,但手熟爾。」

python import sge库文件

在python编程过程中,有些时候由于重用代码或者性能之类的原因,需要直接调用c语言写的库文件。python支持Cython、swig、ctypes、cffi等不同方式来调用c的代码。下面以ctypes为例,介绍怎么在python中使用sge的库文件。

下载最新的Son of Grid Engine,链接https://gitlab.com/loveshack/sge (注:本文档以9baeb84cdc883c4ba8b38cfc89ab8262e6d4e1d9这个commit的版本为例)

# tar zxf sge-master.tar.gz
# cd sge-master/source
# sh scripts/bootstrap.sh

根据实际需求可以打开一些程序或库的debug选项,方便gdb进行跟踪。比如test_eval_expression程序,修改libs/sgeobj/Makefile下面这行:

test_eval_expression: test_eval_expression.o $(SGEOBJLIB) $(SGEOBJDLIB) $(CULLLIB) $(UTILIB) $(WINGRIDLIB_DEP) $(COMMLIB) $(COMMLISTSLIB)
        $(LD_WRAPPER) $(CC) $(CFLAGS) -o test_eval_expression $(LFLAGS) test_eval_expression.o $(SGEOBJLIB) $(SGEOBJDLIB) $(CULLLIB) $(COMMLIB) $(COMMLISTSLIB) $(UTILIB) $(WINGRIDLIB) $(DLLIB) $(SECLIB) $(LIBS)

加上-g参数:

test_eval_expression: test_eval_expression.o $(SGEOBJLIB) $(SGEOBJDLIB) $(CULLLIB) $(UTILIB) $(WINGRIDLIB_DEP) $(COMMLIB) $(COMMLISTSLIB)
        $(LD_WRAPPER) $(CC) $(CFLAGS) -g -o test_eval_expression $(LFLAGS) test_eval_expression.o $(SGEOBJLIB) $(SGEOBJDLIB) $(CULLLIB) $(COMMLIB) $(COMMLISTSLIB) $(UTILIB) $(WINGRIDLIB) $(DLLIB) $(SECLIB) $(LIBS)

aimk选项根据实际需要选择,不清楚是什么问题,-shared-libs选项编译很多地方通不过。下面是一些修改记录,可能不全,请根据编译时实际错误信息进行修改。

aimk文件需要修改如下:

$ diff aimk.orig aimk
2126a2127,2131
>    if ( $SHAREDLIBS == 1 ) then
>       set SHARED_PATH_NAME = `dist/util/arch -lib`
>       setenv $SHARED_PATH_NAME ${SOURCE}/${COMPILE_ARCH}
>    endif
>
2221a2227,2230
>    if ( $SHAREDLIBS == 1 ) then
>       unsetenv $SHARED_PATH_NAME
>    endif
>

很多Makefile文件需要修改

## libs/Makefile
        $(SHAREDLD) $(SHARED_LFLAGS) -o libsge$(SHAREDEXT) $(SCHEDLIB_OBJS) $(MIRLIB_OBJS) $(EVCLIB_OBJS) $(GDILIB_OBJS) $(SGEOBJLIB_OBJS) $(SGEOBJDLIB_OBJS) $(KRBLIBS) $(COMMLIB_OBJS) $(COMMLISTSLIB_OBJS) $(CULLLIB_OBJS) $(UTILIB_OBJS) sig_handlers.o $(LOADAVGLIBS) $(LIBS) -lc $(SECLIB)


## libs/sgeobj/Makefile
libsgeobj$(SHAREDEXT): $(SGEOBJLIB_OBJS) $(SGEOBJDLIB) $(COMMLIB) $(CULLLIB) $(UTILIB) version.o sge_gdi_packet.o sge_gdi2.o sge_security.o sge_gdi_packet_internal.o sge_gdi_packet_pb_cull.o sge_gdi_ctx.o qm_name.o
        $(SHAREDLD) $(SHARED_LFLAGS) -o libsgeobj$(SHAREDEXT) $(SGEOBJLIB_OBJS) version.o sge_gdi_packet.o sge_gdi2.o sge_security.o sge_gdi_packet_internal.o sge_gdi_packet_pb_cull.o sge_gdi_ctx.o qm_name.o -lsgeobjd -lcomm -lcommlists -lcull -luti $(LIBS) -lc

libsgeobjd$(SHAREDEXT): $(SGEOBJDLIB_OBJS) $(CULLLIB) $(UTILIB) version.o sge_gdi_packet.o sge_gdi2.o
        $(SHAREDLD) $(SHARED_LFLAGS) -o libsgeobjd$(SHAREDEXT) $(SGEOBJDLIB_OBJS) version.o sge_gdi_packet.o sge_gdi2.o -lcull -luti $(LIBS) -lc


## libs/comm/Makefile
        $(SHAREDLD) $(SHARED_LFLAGS) -o libcomm$(SHAREDEXT) $(COMMLIB_OBJS) -luti $(DLLIB) $(LIBS) -lcrypto -lssl


## libs/uti/Makefile
        $(SHAREDLD) $(SHARED_LFLAGS) -o libuti$(SHAREDEXT) $(UTILIB_OBJS) $(LOADAVGLIBS) $(LIBS) -lc -ldl -lsgeobj


## libs/spool/Makefile
test_sge_spooling_utilities: test_sge_spooling_utilities.o $(SPOOLING_DEPS) $(SGEOBJLIB) $(SGEOBJDLIB) $(MIRLIB) $(EVCLIB) $(GDILIB) $(SCHEDLIB) $(CULLLIB) $(COMMLIB) $(COMMLISTSLIB) $(UTILIB) $(WINGRIDLIB_DEP) sig_handlers.o
        $(LD_WRAPPER) $(CC) $(CFLAGS) -o test_sge_spooling_utilities $(LFLAGS) test_sge_spooling_utilities.o $(SPOOLING_LIBS) $(SCHEDLIB)  $(MIRLIB) $(EVCLIB) $(GDILIB) $(SGEOBJLIB) $(SGEOBJDLIB) $(COMMLIB) $(COMMLISTSLIB) $(CULLLIB) $(UTILIB) $(WINGRIDLIB) $(SECLIB) $(SLIBS) $(LIBS) $(DLLIB) sig_handlers.o

test_spooling_mt: test_spooling_mt.o $(SPOOLING_DEPS) $(SGEOBJLIB) $(SGEOBJDLIB) $(MIRLIB) $(EVCLIB) $(GDILIB) $(SCHEDLIB) $(CULLLIB) $(COMMLIB) $(COMMLISTSLIB) $(UTILIB) $(WINGRIDLIB_DEP) sig_handlers.o
        $(CC) $(CFLAGS) -o test_spooling_mt $(LFLAGS) test_spooling_mt.o $(SPOOLING_LIBS) $(SCHEDLIB)  $(MIRLIB) $(EVCLIB) $(GDILIB) $(SGEOBJLIB) $(SGEOBJDLIB) $(COMMLIB) $(COMMLISTSLIB) $(CULLLIB) $(UTILIB) $(WINGRIDLIB) $(SECLIB) $(SLIBS) $(LIBS) $(DLLIB) sig_handlers.o


## 3rdparty/tacc_pam_sge/Makefile
$(TACCFOO): $(TACCFOO_OBJS)
        $(CC) $(CFLAGS) -o $(TACCFOO) $(TACCFOO_OBJS) -lsge -lsched -levc -lgdi -lsgeobj -lsgeobjd -lcull -lcomm -lcommlists -luti -lc -ldl  -lm -lpthread

$(TACCLIB)$(SHAREDEXT): $(TACCLIB_OBJS) $(SGEOBJDLIB) $(COMMLIB) $(COMMLISTSLIB) $(CULLLIB) $(UTILIB)
        $(SHAREDLD) $(SHARED_LFLAGS) -o $(TACCLIB)$(SHAREDEXT) $(TACCLIB_OBJS) -lsge -lcull -luti -ldl -lpthread


## libs/cull/Makefile
example1: $(EXAMPLE1_DEPS) sge_dlopen.o
        $(LD_WRAPPER) $(CC) $(CFLAGS) -o example1 $(LFLAGS) $(EXAMPLE1_OBJS) $(LIBS) sge_dlopen.o


## libs/cull/Makefile
example1: $(EXAMPLE1_DEPS)
        $(LD_WRAPPER) $(CC) $(CFLAGS) -o example1 $(LFLAGS) $(EXAMPLE1_OBJS) $(LIBS) -luti

然后编译sge:

# ./aimk -no-java -no-gui-inst -debug -gprof -shared-libs -no-remote -no-qtcsh

编译完成后,有个以架构命令的目录,比如LINUXAMD64,生成的有用文件都在目录下。把该目录加到环境变量LD_LIBRARY_PATH中,在python里就可以用ctypes模块去调用sge的模块了。

最简单的使用方式是像下面这样的:

>>> import ctypes
>>> libsge = ctypes.CDLL('/path/to/libsge.so')
>>> libsge.sge_eval_expression(6, "a*", "A")
0
>>> libsge.sge_eval_expression(6, "a*", "b")
1

对应的c函数代码是:

int sge_eval_expression(u_long32 type, const char *expr, const char *value, lList **answer_list)

复杂一点特殊字符会出问题,下面的应该返回0,而不是错误:

>>> libsge.sge_eval_expression(6, '(sol-*64|linux|hp*)&!sol-sparc', 'hp11', None)
error: Parse error on position 1 of the expression "(".
-1

改为用byte的方式就好了:

>>> libsge.sge_eval_expression(6, b'(sol-*64|linux|hp*)&!sol-sparc', b'hp11', None)
0

但这样也不代表没问题了,再试试TYPE_HOST类型的表达式,Segmentation fault了:

>>> libsge.sge_eval_expression(7, b'Latte*', b'latte3.czech.sun.com', None)
Segmentation fault

用gdb来debug看,出问题的是sge_hostmatch()

Program received signal SIGSEGV, Segmentation fault.
bootstrap_get_ignore_fqdn () at ../libs/uti/sge_bootstrap.c:188
188        return bootstrap->get_ignore_fqdn(bootstrap);


#0  bootstrap_get_ignore_fqdn () at ../libs/uti/sge_bootstrap.c:188
#1  0x00007ffff1218d75 in sge_hostcpy (dst=0x7fffffffc050 " \301\377\377\377\177", raw=0x7fffffffc960 "latte*")
    at ../libs/uti/sge_hostname.c:1167
#2  0x00007ffff1218f0b in sge_hostmatch (h1=<value optimized out>, h2=0x7fffffffc160 "latte3.czech.sun.com")
    at ../libs/uti/sge_hostname.c:1291
#3  0x00007ffff11beb18 in MatchPattern (token_p=<value optimized out>, skip=<value optimized out>) at ../libs/sgeobj/sge_eval_expression.c:409
#4  0x00007ffff11bec0d in SimpleExpression (token_p=0x7fffffffd160, skip=false) at ../libs/sgeobj/sge_eval_expression.c:376
#5  0x00007ffff11bec2e in AndExpression (token_p=0x7fffffffd160, skip=false) at ../libs/sgeobj/sge_eval_expression.c:343
#6  0x00007ffff11beca9 in OrExpression (token_p=0x7fffffffd160, skip=false) at ../libs/sgeobj/sge_eval_expression.c:323
#7  0x00007ffff11bee30 in sge_eval_expression (type=7, expr=<value optimized out>, value=0x7ffff196ab78 "latte3.czech.sun.com",
    answer_list=<value optimized out>) at ../libs/sgeobj/sge_eval_expression.c:167

sge_hostmatch()代码里有解释,这个函数相当于fnmatch(),但是会根据配置来决定怎么去比较hostname是否一样。

// source/libs/uti/sge_hostname.c

/****** uti/hostname/sge_hostmatch() ********************************************
*  NAME
*     sge_hostmatch() -- fnmatch() for hostnames
*
*  SYNOPSIS
*     int sge_hostmatch(const char *h1, const char*h2)
*
*  FUNCTION
*     fnmatch() for hostnames. Honours some configuration values:
*        - Domain name may be ignored
*        - Domain name may be replaced by a 'default domain'
*        - Hostnames may be used as they are.
*
*  INPUTS
*     const char *h1 - 1st hostname
*     const char *h2 - 2nd hostname
*
*  RESULT
*     int - 0, 1 or -1

具体错误是在执行return bootstrap->get_ignore_fqdn(bootstrap)的时候出错了,这跟sge_bootstrap(5)ignore_fqdn参数有关。

// source/libs/uti/sge_bootstrap.c

bool bootstrap_get_ignore_fqdn(void)
{
   sge_bootstrap_state_class_t* bootstrap = NULL;
   GET_SPECIFIC(sge_bootstrap_thread_local_t, handle, bootstrap_thread_local_init, sge_bootstrap_thread_local_key, 
                "bootstrap_get_ignore_fqdn");
   bootstrap = handle->current;            
   return bootstrap->get_ignore_fqdn(bootstrap); // <= 这里报错
}

#define GET_SPECIFIC(type, variable, init_func, key, func_name) \
   type *variable = pthread_getspecific(key); \
   if(variable == NULL) { \
      int ret; \
      variable = sge_malloc(sizeof(type)); \
      init_func(variable); \
      ret = pthread_setspecific(key, (void*)variable); \
      if (ret != 0) { \
         fprintf(stderr, "pthread_setspecific(%s) failed: %s\n", func_name, strerror(ret)); \
         abort(); \
      } \
   }

sge_hostmatch()的代码里需要读ignore_fqdndefault_domain参数,这些参数只能在安装的时候设置,已经在运行的系统是不能修改这两个参数的。

主要问题是这个函数要在pthread线程中执行,在sge中可以通过bootstrap_mt_init()feature_mt_init()来初始化,而在python中没有初始化相关线程就直接调用sge_eval_expression(),在获取线程信息的时候就会出错。

默认设置下,sge_hostmatch()比较的时候用的还是fnmatch(),只是有以下部分的特殊处理,可以考虑用其他类型来代替。

void sge_hostcpy(char *dst, const char *raw)
{
   bool ignore_fqdn = bootstrap_get_ignore_fqdn();  // <= 这里报错
   bool is_hgrp = is_hgroup_name(raw);
   const char *default_domain;

   if (dst == NULL || raw == NULL) {
      return;
   }
   if (is_hgrp) {  // 如果是hostgroup,直接对比,不做处理
      /* hostgroup name: not in FQDN format, copy the entire string*/
      sge_strlcpy(dst, raw, CL_MAXHOSTLEN);
      return;
   } 
   if (ignore_fqdn) {
      char *s = NULL;
      /* standard: simply ignore FQDN */

      sge_strlcpy(dst, raw, CL_MAXHOSTLEN);
      if ((s = strchr(dst, '.'))) {  // compute-0-0.hpc.cn 只返回compute-0-0进行对比
         *s = '\0';
      }
      return;
   }

  /* ... skipped ... */
}

当然,也可以把代码改一下,直接返回结果,不用去获取值。

# diff ./source/libs/uti/sge_hostname.c.orig source/libs/uti/sge_hostname.c
1167c1167
<    bool ignore_fqdn = bootstrap_get_ignore_fqdn();
---
>    bool ignore_fqdn = 1;

这样修改编译完,再用相同的调用就不会报错了。

同时通过测试也可以看到,相同的表达式,type参数不同,结果是不一样的。TYPE_HOST类型和别的类型最大的区别是在默认sge配置下是不比较域名部分的。

>>> libsge.sge_eval_expression(7, b'Latte*', b'latte3.czech.sun.com', None)
0


>>> libsge.sge_eval_expression(7, b'Latte*.hpc.cn', b'latte3.czech.sun.com', None)
0
>>> libsge.sge_eval_expression(2, b'Latte*.hpc.cn', b'latte3.czech.sun.com', None)
1
>>> libsge.sge_eval_expression(6, b'Latte*.hpc.cn', b'latte3.czech.sun.com', None)
1

如果更严谨点,可以指定函数接口类型,具体参考python ctypes官方文档

>>> class lList(ctypes.Structure):
...     pass
...
>>> libsge = ctypes.CDLL('libsge.so')
>>> libsge.sge_eval_expression.argtypes = [ctypes.c_long, ctypes.c_char_p, ctypes.c_char_p, ctypes.POINTER(ctypes.POINTER(lList))]
>>>
>>> TYPE_INT = 1
>>> TYPE_FIRST = TYPE_INT
>>> TYPE_STR = 2
>>> TYPE_TIM = 3
>>> TYPE_MEM = 4
>>> TYPE_BOO = 5
>>> TYPE_CSTR = 6
>>> TYPE_HOST = 7
>>> TYPE_DOUBLE = 8
>>> TYPE_RESTR = 9
>>> TYPE_CE_LAST = TYPE_RESTR
>>>
>>> libsge.sge_eval_expression(TYPE_CSTR, b'(sol-*64|linux|hp*)&!sol-sparc', b'hp11', None)
0
>>> libsge.sge_eval_expression(TYPE_CSTR, b"a*", b"A", None)
0
>>> libsge.sge_eval_expression(TYPE_STR, b"a&", b"a", None)
error: Parse error on position 2 of the expression "a&".
-1
>>> libsge.sge_eval_expression(TYPE_CSTR, b"a*", b"A", None)
1
>>> libsge.sge_eval_expression(TYPE_HOST, b'Latte*', b'latte3.czech.sun.com', None)
1

Comments