Hacker News new | past | comments | ask | show | jobs | submit login
GoSTL: Algorithm and datastructure library for Go similar to C++ STL (github.com/liyue201)
89 points by gmgn on Jan 30, 2022 | hide | past | favorite | 47 comments



The design decision of providing thread safe operations on data structures by default is a pretty poor one. In my experience this is the wrong level of abstraction on which to perform mutual exclusion of certain operations. Normally you want locks around a transaction-like operation that might involve multiple operations on a data structure (or multiple data structures). When just using concurrency safe containers on their own, subtle consistency bugs can be introduced and you end up writing a second layer of locks around the higher level code anyways, now paying the performance overhead of locking twice, once for your logic and then for the container.


I agree, while 'lock data not code' sounds good(that's what Rust does I think), container is a different story, it's by design mutable and unfit to lock as a data structure, container is not a pure data structure, it's a ADT that really works with data structure and its methods(algorithms) hands in hands, hard to split by any lock.


There's a lot of good work put into this.

If it is intended as a tool to help port C++ code out of C++ into Go, it looks very useful.

If it is intended as a tool to help Go programmers, it has made a common, but regrettably very serious mistake, that programmers make when porting code: It has precisely copied the original API, despite the fact the original API is quite unidiomatic in the new language.

I don't want a package of all this stuff with the original API of C++; I want the same capabilities, but for them to be idiomatic in the new language. It is quite common that the resulting port will both be missing some things no longer necessary in the new language, but may also require things that weren't necessary in the host language.

As a nice little bite-sized example, consider some of the vector methods. PushBack takes a single element. It really ought to take a slice, probably via the "..." notation to make it a variable-argument function. That's idiomatic in Go. It is also idiomatic to offer that functionality and not worry too much about the fact it will be ever so slightly slower than not taking a slice. Go is generally fast, but if you care that much about speed to the n'th degree, you picked the wrong language. "Reserve" is in the original C++ API because C++ is not a garbage collected language. Go has the ability to specify sizes up front with slices, and that's generally enough. ShrinkToFit is generally again for a non-GC'd language... it's not useless in Go but again, if you're really paranoid about that being available you probably picked the wrong language.

Examples of this sort are pervasively shot throughout this API, from what a scan of it shows me.

A non-trivial amount of the complexity of the STL interface is related to manual memory management. In Go it results in APIs where even if you can't quite point at what the problem is, they're just generally overcomplicated in a GC language.

I expect something much simpler to implement, much simpler to use, and probably also faster to execute could be written if one wrote a list of the desirable capabilities down, and then ignored the C++ API and implemented the capabilities directly in idiomatic Go. Probably after crossing off a couple of things in the capabilities related to C++ being manually-memory-managed, and perhaps after adding some capabilities related to cheap threads being available, e.g., a parallel map operation that works across these data structures or something. Idiomatic, not just blindly copied.

(I comment on this because I do see this a lot. Since these sorts of libraries rarely take off it can be a fatal mistake made on a lot of effort.)


This example

    func isEven(iter iterator.ConstIterator) bool {
        return iter.Value().(int)%2 == 0
    }
    ...
    algorithm.CountIf(a.Begin(), a.End(), isEven)
is certainly off-putting. It starts out with a "deque", which is basically []any (in modern Go), needs Begin and End pointers, and then even doesn't pass the actual value, but the iterator (which can be modified in the call). I'd rather see something more idiomatic like this

    CountIf(len(a), func(i int) bool {return a[i] % 2 == 0 })
As you say, it might be helpful if you're porting something that relies heavily on STL, but only if you don't understand the code, and are willing to take the risk of subtle differences between the C++ and Go implementations.


The whole point of the Standard Template Library was generics. That's the good idea behind the STL, and as I understand it this doesn't support generics, so it's missing this crucial concept.

But yes I generally agree with your point. The X language version of a Y language library ought to work the way somebody who has never used the Y language but is familiar with X expects it to work.

That's across the board. Go and C++ agree that a "string" might be just any bunch of bytes, who knows how they're encoded, that's somebody else's problem so APIs which do care need to be very explicit, maybe even mentioning it in method names. In Python that's very strange, the strings are always Unicode and so a function name_in_cp1252() is awkward to use, probably just offer name() and cope with the encoding.

There are some subtle but important affordances which a non-native user may not even consider adding because their preferred language just omits the necessary feature. For example Rust's From/ Into/ TryFrom/ TryInto cascade or C++ 20 spaceship operator are things native speakers of those languages expect in a good library but aren't going to fall out naturally from converting a Python API.


Indeed. Might be easier after go1.18 with generics is out to create a STL for Go. STL could be so useful for Go if done right.


> It has precisely copied the original API, despite the fact the original API is quite unidiomatic in the new language.

STL isn't even idiomatic in C++ ;)


I really wish someone takes the work of ditching the STL as a whole and make an unofficial “version 2” of the same standard library. I mean, IO sucks ass, std::string really sucks, containers like unordered_map are really slow (because of the pointer stability requirements), custom allocators are cumbersome, the regex library is a mess, why are string_view and span different things, yada yada. (And darn those long compile times and horrendous debug-mode performance!)

Basically, something like a more updated version of EASTL (https://github.com/electronicarts/EASTL) might be really useful.


> IO sucks ass, std::string really sucks, containers like unordered_map are really slow (because of the pointer stability requirements), custom allocators are cumbersome, the regex library is a mess, why are string_view and span different things

Nitpick: the STL (https://en.wikipedia.org/wiki/Standard_Template_Library#Cont...) contains neither of these.

> and make an unofficial “version 2” of the same standard library.

Part of the C++ standard library in some sense _is_ version 2 of the STL.


EASTL still adheres to the same API as the proper standard library, doesn't it?

Anyway, API-wise, you at least have ranges in C++20, which affords you some syntactic convenience in the ability to pass ranges rather than pairs of iterators. But I definitely agree that both form-wise and impl-design-wise, a change would be welcome.


“This Videogame Programmer Used the STL and You Will Never Guess What Happened Next” - CppCon 2019

https://www.youtube.com/watch?v=6hC9IxqdDDw

Ergo, not everyone agrees with that opinion.


It was good enough for ATLAS experiments, but what does CERN understand about HPC anyway.


CERN and HEP (high energy physics) in general care more about HTC (high throughput computing). In general, event processing is pleasantly parallel since each event can be processed independently of others. A lot of the computing is really similar to a world wide distributed map reduce cluster. This significantly reduces a lot of the complexity that comes up when working with a workflow that might have thousands of threads interacting with each other.


So much words and nothing about iostreams or STL.

I know how ATLAS HLT TDAQ works, you will find my name on 2003 - 2004 papers.


unordered_map is hash based. It's as fast as any other hash backed data structure. Hash performance is well defined and widely understood.


There are dozens of hash map implementations for C++, and while they all have O(1) asymptotic complexity, the runtime performance can be vastly different depending on the implementation choices. The design space of a hash based data structure is immense, but std::unordered_map is generally always avoided unless your main goal is either ease-of-use or beginner-friendliness or you simply don't care about performance. But if any of these is true for your context, then C++ is probably not the right tool for the job anyway.


In many cases C++ is the only tool for the job, unless one wants to have fun doing their own toolchain on top of the platform SDKs, or drop down to C, even worse.


Point taken.


> As a nice little bite-sized example, consider some of the vector methods. PushBack takes a single element.

Also, push_back has always been an awkward name choice. Porting would be a good opportunity to rename it to append.


Someone (I) should fork this and upgrade it to use generics.



Absolutely, looking for that.

STL in Go with generics will be super useful. Can't wait.


I believe we’re only one month away from a stable Go release with generics. This might become a fair bit more interesting at that point.


its not even close. its a wrapper around something the language doesn't support. trying to make it support something like iterators and ranges, plus its not generic is just asking for code bloat and spaghetti. we all seen libraries like this before, they fail cause the language doesn't have support for it.

Until it does, keep the source in line with current best practices.


I hope that iterations can get some language level treatment via a mechanism to hook into the ‘range’ operator on custom containers.


I care less about hooking into range and more about the performance. I would really like for iterators to be zero-cost abstractions, but right now a SliceIter is waaaayyy slower than looping over a slice (with range or otherwise). The Go compiler isn’t smart enough to realize that a loop over a SliceIter can be reduced to a loop over a slice, perhaps because it can’t know for certain that the loop counter variable (e.g. the `counter` field in `type SliceIter[T any] struct { slice []T; counter int }`) is only being accessed by the loop itself (this might be where Rust’s ownership semantics would be useful in Go?).


Will wait for a Generics version. This is version is so not worth it!


Off topic: is there still no alternative for putting random URLs directly in a source code in order to use some libs?


It's not random. It's what you as a developer wanted as a dependency, and it's cryptographically pinned to the exact version you wanted.


This is how imports work in go. You make it look like a url, but then can choose in the mod (the module) file if it is local or to keep the url. This makes it quite clean because files have the same import url even as relative paths change, so you can move files all around and the import statements can stay the same. It also has the added advantage that there is no single module repository like in npm. So any url / GitHub repo works so no module can permanently own a name.


> This makes it quite clean because files have the same import url even as relative paths change, so you can move files all around and the import statements can stay the same.

That only works if no one is using your module as a dependency.

> So any url / GitHub repo works so no module can permanently own a name.

Only if go mod understands the source control mechanism, and only if the HTTP server in front of that speaks the survival go mod protocol. It's a horrible system that the go team doesn't use internally.


Why do you have such strong opinions on things you clearly don't use or understand?


I do use and understand these things. I have fought several times with go mod and know a few of its quirks.

Go mod works by cloning the remote repository you import (unless the module it is working on has a replace directive). It does this via HTTPS or Git ssh or other source control mechanisms, based on the form of the import link and your GOPROXY and GOPRIVATE environment variables and their various flags and options. For each source control mechanism, it has some specific way to decide exactly what version to sync to (such as git tags to chose a specific commit).

It also depends on how you have configured your source control in your local environment, as that is what will ultimately download the code - if you want to download modules from repos that require various forms of authentication, it's up to every dev to configure credentials for each of these (or theoretically someone could mirror them to a single repo with common auth).

It's still important to note that replace directives are only used when building that particular module. Say module A depends on module B. Module B has a dependency on module github.com/proj/C, but locally adds a replace directive to replace github.com/proj/C with bitbucket.com/proj/C. When running go mod in B's folder, it will download the version of C from BitBucket. But, when building module A, it will download module C from GitHub. Replace directives are just for local builds, your dependents don't look at them.


You can use replace directive in go mod then you'd be able to load library from local path


what do you mean by "random urls"?


They are confused about import notation.


Having `import "github.com/liyue201/gostl/ds/array"` in codebase looks weird and unsafe. Like, who's liyue201 and what exactly I'm importing?

Vendoring helps a bit, but it's still ugly.

Same problems exist in other languages, although "import numpy as np" doesn't explicitly say that you're importing a random head from someone's master.


On the contrary, I absolutely want to quickly and easily know if my corporate codebase is depending on github/<random user>.

Golang has a pretty solid standard library and process to extend it. The 'x' packages are all reasonably vetted and close to official libraries: https://pkg.go.dev/golang.org/x You could easily make a decree in a codebase that you won't depend on random github code and only use stuff from trusted sources like standard library, x, etc.


Frankly if it imparts that kind of concern, that is the dependency system working properly.

‘require “leftpad”’ may look safer, but is in fact not, and unless you happen to know what is in the standard library of Node.js off the top of your head, provides not so much as a clue that the library is even third party.


It's not much different from

    import com.mysql.cj.jdbc


I’ve never thought about that but yeah you’re right, that’s cool. I’m not a fan of Go’s imports either but this makes it make a little more sense to me.


> Same problems exist in other languages, although "import numpy as np" doesn't explicitly say that you're importing a random head from someone's master.

Sure, python is just implicitly doing so, is that any better? Lets say you know nothing about python or numpy. You see "import numpy". What/who is numpy? How is it provisioned? Is there only one thing called numpy? How do you know it's installed locally/site-packages/editable/PYTHONPATH? You can answer these questions, but not immediately. Numpy is a poor example because it's so well know, but the other day I was trying to sort out "from pylab import *". Can't "pip install pylab". Turns out, matplotlib provides it, but for weird legacy reasons, pylab is a top level namespace.

The whole point is, if you don't trust "liyue201", then you can immediately know not to import that package. It's completely explicit in what it's doing, and no one is making you install libs from github.


What namespacing do you envision? On the one side you don't like short names that are installed locally and on the other you seem to not like the full url noting where to retrieve the library.


Great thing about the import notation is it points you in the direction to look. Where’s the numpy code base? Your go.mod should specify the version of the library you’re using.


You could setup your build to vendor. The import would be the same.


> explicitly say that you're importing a random head from someone's master

You don't import HEAD from master. Do you even understand how go.mod, go.sum and the checksum database work?


What are the go libraries such as this that can benefit from genetics ?




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: