cc.mallet.types.Alphabet类的使用及代码示例

x33g5p2x  于2022-01-16 转载在 其他  
字(6.7k)|赞(0)|评价(0)|浏览(109)

本文整理了Java中cc.mallet.types.Alphabet类的一些代码示例,展示了Alphabet类的具体用法。这些代码示例主要来源于Github/Stackoverflow/Maven等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。Alphabet类的具体详情如下:
包路径:cc.mallet.types.Alphabet
类名称:Alphabet

Alphabet介绍

[英]A mapping between integers and objects where the mapping in each direction is efficient. Integers are assigned consecutively, starting at zero, as objects are added to the Alphabet. Objects can not be deleted from the Alphabet and thus the integers are never reused.

The most common use of an alphabet is as a dictionary of feature names associated with a cc.mallet.types.FeatureVector in an cc.mallet.types.Instance. In a simple document classification usage, each unique word in a document would be a unique entry in the Alphabet with a unique integer associated with it. FeatureVectors rely on the integer part of the mapping to efficiently represent the subset of the Alphabet present in the FeatureVector.
[中]整数和对象之间的映射,其中每个方向上的映射都是有效的。当对象添加到字母表中时,从零开始连续分配整数。不能从字母表中删除对象,因此永远不会重用整数。
字母表最常见的用途是作为与cc关联的要素名称的字典。木槌类型。cc中的特征向量。木槌类型。例子在一个简单的文档分类用法中,文档中的每个唯一单词都是字母表中唯一的条目,并与之关联一个唯一的整数。特征向量依赖于映射的整数部分来有效地表示特征向量中存在的字母表的子集。

代码示例

代码示例来源:origin: de.julielab/jcore-mallet-2.0.9

public String[] getVocabulary() {
  String[] vocab = new String[ alphabet.size() ];
  for (int type = 0; type < numTypes; type++) {
    vocab[type] = (String) alphabet.lookupObject(type);
  }
  return vocab;
}

代码示例来源:origin: cc.mallet/mallet

/** Create a dummy alphabet with <code>n</code> dimensions */
public static Alphabet alphabetOfSize (int n) {
  Alphabet alphabet = new Alphabet();
  for (int i = 0; i < n; i++) {
    alphabet.lookupIndex("d" + i);
  }
  return alphabet;
}

代码示例来源:origin: de.julielab/jcore-mallet-2.0.9

/**
 *    A symmetric Dirichlet with alpha_i = <code>alpha</code> and the 
 *    number of dimensions of the given alphabet.
 */
public Dirichlet (Alphabet dict, double alpha)
{
  this(dict.size(), alpha);
  this.dict = dict;
  dict.stopGrowth();
}

代码示例来源:origin: cc.mallet/mallet

public Alphabet getPrunedAlphabet(int minDocs, int maxDocs, int minCount, int maxCount) {
  Alphabet inputAlphabet = instances.getDataAlphabet();
  Alphabet outputAlphabet = new Alphabet();
  for (int inputType = 0; inputType < numFeatures; inputType++) {
    if (featureCounts[inputType] >= minCount && featureCounts[inputType] <= maxCount && documentFrequencies[inputType] >= minDocs && documentFrequencies[inputType] <= maxDocs) {
      outputAlphabet.lookupIndex(inputAlphabet.lookupObject(inputType));
    }
  }
  
  return outputAlphabet;
}

代码示例来源:origin: com.github.steveash.mallet/mallet

public void testNotFound ()
{
 Alphabet dict = new Alphabet ();
 dict.lookupIndex ("TEST1");
 dict.lookupIndex ("TEST2");
 dict.lookupIndex ("TEST3");
 assertEquals (-1, dict.lookupIndex ("TEST4", false));
 assertEquals (3, dict.size());
 assertEquals (3, dict.lookupIndex ("TEST4", true));
}

代码示例来源:origin: com.github.steveash.mallet/mallet

public String[] getWeightNames (int index) {
  int[] indices = this.weightsIndices[index];
  String[] ret = new String[indices.length];
  for (int i=0; i < ret.length; i++)
    ret[i] = crf.parameters.weightAlphabet.lookupObject(indices[i]).toString();
  return ret;
}

代码示例来源:origin: de.julielab/jcore-mallet-2.0.9

TestResults (Instance inst)
{
 alphabet = new Alphabet ();
 setupAlphabet (inst);
 numClasses = alphabet.size ();
 confusion = new int [numClasses][numClasses];
 precision = new double [numClasses];
 recall = new double [numClasses];
 f1 = new double [numClasses];
}

代码示例来源:origin: cc.mallet/mallet

/** Current size of the Vocabulary */
  public int size() 
  {
    return dataAlphabet.size();
  }
}

代码示例来源:origin: com.github.steveash.jg2p/jg2p-core

public void trainFor(Collection<Alignment> inputs) {
 // this pipe is the default pipe with new alphabet
 Stopwatch watch = Stopwatch.createStarted();
 trainRound(inputs, new Alphabet(), 0);
 crf.getInputAlphabet().stopGrowth();
 crf.getOutputAlphabet().stopGrowth();
 watch.stop();
 log.info("Training took " + watch);
}

代码示例来源:origin: cc.mallet/mallet

/** Change the default Pipe associated with InstanceList.
 * This method is very dangerous and should only be used in extreme circumstances!! */
public void setPipe(Pipe p) {
  assert (Alphabet.alphabetsMatch(this, p));
  pipe = p;
}

代码示例来源:origin: cc.mallet/mallet

/** Construct a new empty Factors with a new empty weightsAlphabet, 0-length initialWeights and finalWeights, and the other arrays null. */
public Factors () {
  weightAlphabet = new Alphabet();
  initialWeights = new double[0];
  finalWeights = new double[0];
  // Leave the rest as null.  They will get set later by addState() and addWeight()
  // Alternatively, we could create zero-length arrays
}

代码示例来源:origin: de.julielab/jcore-jtbd-ae

public void readModel(InputStream is) throws IOException, ClassNotFoundException {
  final GZIPInputStream gin = new GZIPInputStream(is);
  final ObjectInputStream ois = new ObjectInputStream(gin);
  model = (CRF) ois.readObject();
  trained = true;
  model.getInputPipe().getDataAlphabet().stopGrowth();
  ois.close();
}

代码示例来源:origin: cc.mallet/mallet

public Alphabet (Object[] entries) {
  this (entries.length);
  for (Object entry : entries)
    this.lookupIndex(entry);
}

代码示例来源:origin: de.julielab/julielab-topic-modeling

public Object[] getVocabulary(Model model) {
    ParallelTopicModel malletModel = model.malletModel;
    Alphabet alphabet = malletModel.getAlphabet();
    Object[] alphabetArray = alphabet.toArray();
    return alphabetArray;
  }
}

代码示例来源:origin: com.github.steveash.mallet/mallet

public Alphabet getPrunedAlphabet(int minDocs, int maxDocs, int minCount, int maxCount) {
  Alphabet inputAlphabet = instances.getDataAlphabet();
  Alphabet outputAlphabet = new Alphabet();
  for (int inputType = 0; inputType < numFeatures; inputType++) {
    if (featureCounts[inputType] >= minCount && featureCounts[inputType] <= maxCount && documentFrequencies[inputType] >= minDocs && documentFrequencies[inputType] <= maxDocs) {
      outputAlphabet.lookupIndex(inputAlphabet.lookupObject(inputType));
    }
  }
  
  return outputAlphabet;
}

代码示例来源:origin: cc.mallet/mallet

public void testNotFound ()
{
 Alphabet dict = new Alphabet ();
 dict.lookupIndex ("TEST1");
 dict.lookupIndex ("TEST2");
 dict.lookupIndex ("TEST3");
 assertEquals (-1, dict.lookupIndex ("TEST4", false));
 assertEquals (3, dict.size());
 assertEquals (3, dict.lookupIndex ("TEST4", true));
}

代码示例来源:origin: de.julielab/jcore-mallet-2.0.9

public void print () {
  System.out.println ("Dirichlet:");
  for (int j = 0; j < partition.length; j++)
    System.out.println (dict!= null ? dict.lookupObject(j).toString() : j + "=" + magnitude * partition[j]);
}

代码示例来源:origin: com.github.steveash.mallet/mallet

TestResults (Instance inst)
{
 alphabet = new Alphabet ();
 setupAlphabet (inst);
 numClasses = alphabet.size ();
 confusion = new int [numClasses][numClasses];
 precision = new double [numClasses];
 recall = new double [numClasses];
 f1 = new double [numClasses];
}

代码示例来源:origin: cc.mallet/mallet

public boolean isDataAlphabetSet() 
{
  if (dataAlphabet != null && dataAlphabet.size() > 0)
    return true;
  return false;
}

代码示例来源:origin: de.julielab/jcore-mallet-2.0.9

/** Change the default Pipe associated with InstanceList.
 * This method is very dangerous and should only be used in extreme circumstances!! */
public void setPipe(Pipe p) {
  assert (Alphabet.alphabetsMatch(this, p));
  pipe = p;
}

相关文章