Change search
ReferencesLink to record
Permanent link

Direct link
AlgExt: an Algorithm Extractor for C Programs
Umeå University, Faculty of Science and Technology, Department of Computing Science.
2001 (English)Report (Other academic)
Abstract [en]

ALGEXT is a program that extracts strategic/block comments from C source files to improve maintainability and to keep documentation consistent with source code. This is done by writing the comments in the source code in what we call extractable algorithms, describing the algorithm used in the functions.

ALGEXT recognizes different kinds of comments:

  • Strategic comments are comments that proceed a block of code, with only whitespace preceding it on the line,
  • Tactical comments are comments that describes the code that precedes it on the same line,
  • Function comments are comments immediately preceding a function definition, describing the function,
  • File comments are comments at the head of the file, before any declarations of functions and variables, and finally
  • Global comments are comments within the global scope, but not associated with a function.

Only strategic comment are used as basis for algorithm extraction in ALGEXT.

The paper discusses the rationale for ALGEXT and describes its implementation and usage. Examples are presented for clarification of what can be done with ALGEXT.

Our experience shows that students who use ALGEXT for preparing theirassignments tend to write about 66% more comments than non-ALGEXT users.

Place, publisher, year, edition, pages
Umeå: Department of Computing Science, Umeå University , 2001. , 15 p.
Report / UMINF, ISSN 0348-0542 ; 2001:11
Keyword [en]
Extractable algorithms, Embedded information, C
National Category
Computer Science
Research subject
Computing Science
URN: urn:nbn:se:umu:diva-22350OAI: diva2:214630
Institutionen för datavetenskap, 90187, Umeå
Available from: 2009-05-06 Created: 2009-05-06 Last updated: 2013-10-09Bibliographically approved
In thesis
1. Finding, extracting and exploiting structure in text and hypertext
Open this publication in new window or tab >>Finding, extracting and exploiting structure in text and hypertext
2009 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Att finna, extrahera och utnyttja strukturer i text och hypertext
Abstract [en]

Data mining is a fast-developing field of study, using computations to either predict or describe large amounts of data. The increase in data produced each year goes hand in hand with this, requiring algorithms that are more and more efficient in order to find interesting information within a given time.

In this thesis, we study methods for extracting information from semi-structured data, for finding structure within large sets of discrete data, and to efficiently rank web pages in a topic-sensitive way.

The information extraction research focuses on support for keeping both documentation and source code up to date at the same time. Our approach to this problem is to embed parts of the documentation within strategic comments of the source code and then extracting them by using a specific tool.

The structures that our structure mining algorithms are able to find among crisp data (such as keywords) is in the form of subsumptions, i.e. one keyword is a more general form of the other. We can use these subsumptions to build larger structures in the form of hierarchies or lattices, since subsumptions are transitive. Our tool has been used mainly as input to data mining systems and for visualisation of data-sets.

The main part of the research has been on ranking web pages in a such a way that both the link structure between pages and also the content of each page matters. We have created a number of algorithms and compared them to other algorithms in use today. Our focus in these comparisons have been on convergence rate, algorithm stability and how relevant the answer sets from the algorithms are according to real-world users.

The research has focused on the development of efficient algorithms for gathering and handling large data-sets of discrete and textual data. A proposed system of tools is described, all operating on a common database containing "fingerprints" and meta-data about items. This data could be searched by various algorithms to increase its usefulness or to find the real data more efficiently.

All of the methods described handle data in a crisp manner, i.e. a word or a hyper-link either is or is not a part of a record or web page. This means that we can model their existence in a very efficient way. The methods and algorithms that we describe all make use of this fact.

Abstract [sv]

Informationsutvinning (som ofta kallas data mining även på svenska) är ett forskningsområde som hela tiden utvecklas. Det handlar om att använda datorer för att hitta mönster i stora mängder data, alternativt förutsäga framtida data utifrån redan tillgänglig data. Eftersom det samtidigt produceras mer och mer data varje år ställer detta högre och högre krav på effektiviteten hos de algoritmer som används för att hitta eller använda informationen inom rimlig tid.

Denna avhandling handlar om att extrahera information från semi-strukturerad data, att hitta strukturer i stora diskreta datamängder och att på ett effektivt sätt rangordna webbsidor utifrån ett ämnesbaserat perspektiv.

Den informationsextraktion som beskrivs handlar om stöd för att hålla både dokumentationen och källkoden uppdaterad samtidigt. Vår lösning på detta problem är att låta delar av dokumentationen (främst algoritmbeskrivningen) ligga som blockkommentarer i källkoden och extrahera dessa automatiskt med ett verktyg.

De strukturer som hittas av våra algoritmer för strukturextraktion är i form av underordnanden, exempelvis att ett visst nyckelord är mer generellt än ett annat. Dessa samband kan utnyttjas för att skapa större strukturer i form av hierarkier eller riktade grafer, eftersom underordnandena är transitiva. Det verktyg som vi har tagit fram har främst använts för att skapa indata till ett informationsutvinningssystem samt för att kunna visualisera indatan.

Huvuddelen av den forskning som beskrivs i denna avhandling har dock handlat om att kunna rangordna webbsidor utifrån både deras innehåll och länkarna som finns mellan dem. Vi har skapat ett antal algoritmer och visat hur de beter sig i jämförelse med andra algoritmer som används idag. Dessa jämförelser har huvudsakligen handlat om konvergenshastighet, algoritmernas stabilitet givet osäker data och slutligen hur relevant algoritmernas svarsmängder har ansetts vara utifrån användarnas perspektiv.

Forskningen har varit inriktad på effektiva algoritmer för att hämta in och hantera stora datamängder med diskreta eller textbaserade data. I avhandlingen presenterar vi även ett förslag till ett system av verktyg som arbetar tillsammans på en databas bestående av “fingeravtryck” och annan meta-data om de saker som indexerats i databasen. Denna data kan sedan användas av diverse algoritmer för att utöka värdet hos det som finns i databasen eller för att effektivt kunna hitta rätt information.

Place, publisher, year, edition, pages
Umeå: Umeå Universitet, 2009. 217 p.
Report / UMINF, ISSN 0348-0542 ; 09.12
Automatic propagation, CHiC, Data mining, Discrete data, Extraction, Hierarchies, ProT, Rank distribution, S²ProT, Spatial linking, Web mining, Web searching
National Category
Computer Science
Research subject
Computing Science
urn:nbn:se:umu:diva-22352 (URN)978-91-7264-799-2 (ISBN)
Public defence
2009-06-05, MA 121, MIT, Umeå Universitet, Umeå, 13:15 (English)
AlgExt, CHiC, ProT
Available from: 2009-05-14 Created: 2009-05-06 Last updated: 2009-06-01Bibliographically approved

Open Access in DiVA

No full text

Other links

ALGEXT - an ALGorithm EXTractor for C Programs

Search in DiVA

By author/editor
Ågren, Ola
By organisation
Department of Computing Science
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 149 hits
ReferencesLink to record
Permanent link

Direct link